- 1. Downloading Players on Court Data
- 2. Interpolate the win probability
- 3. Predict with linear regression model

```
exec(open("Function.py").read())
%matplotlib inline
```

Here we create an instance Model from the class. We are only interested in working with data for 2018. Therefore, we can pull up only the data for Season 2018. We then build the dataframe we will be working with by calling the BuildDF function.

```
Model=NBAWinProbability(Seasons=[2018])
Model.BuildDF()
Model.TrainTestSplit(train_fraction=0)
```

```
Build DF Completed!
Train-Test Split Completed!
```

Here we load the `.pkl`

model for the 2012-17 seasons trained model. Note: we are not traning on 2018 and throught this part we are evaluating 2018 as a test perditction. All of the work we do here is assuming the model has never seen this data prior.

```
Model.loadModelPersist('2012-17Model.pkl')
```

```
Model.DownloadPlayerOnCourtInfo()
```

```
Dowloading On Court Player information Completed!
```

The way we evaluate players perfomance is based on the model prediction for win probability.

In order to obtain the players on court data we must do some web-scraping and interpreting of tables. We go through all of the games of the 2018 season and scape player on-court information from each game from websites like these: https://www.basketball-reference.com/boxscores/plus-minus/201711290LAL.html

From a website like this, we go through each div class of a row and obtain the **pixel** information of how large a given range a player was on court. Then, we convert this pixel infromation into a linear map and interpolate between points to obtain the best continuous function for the game time we can get. We then save that into a .csv file that contains each players amount of time on court for each game. More specifically in this .csv file we store the players on court every **5 second** for every player for every game. The player features are designed as -1 if they are playing in the home side team and 1 if they are playing in the away team.

Furthermore, we obtain the data on which players are on the court at all times.

We already have the predicted win probability for each game at each event. But the time interval between each event does not much the time interval for the players on court data we have. Therefore we interpolate the win probability array for each game to obtain the win probablity at every 5 second interval, so WP matches with play on court data frame.

The player information is not used in the prediction for the win probability. Instead we join this information from both sources to make up the ranking of each of the players. We obtain the change in win probability for each time and check which players are on court and attribute the change in winning probability to the change in players on the court. We learn a linear regression model on the winning probability output of the logistic regression model and output. The linear regression model takes as input the one-hot vectors of which players are on the court at any given time. It is important to note that the player features are designed as -1 if they are playing in the home side team and 1 if they are playing in the away team. If a player is not playing at an instance of time, their value will be denoted by 0. By doing this we are considering the strength of one team’s opponent. The linear regression is learned using this scheme and the **coefficents of the predictors** indicate how valuable the player is in changing the win probability. We extract the values of the coeficients for the linear regression model and rank them based on magnitude. It is also worth mentioning that the code is smart enough to consider overtime correctly.

To achive the above goal we call on the function `WP_list()`

to generate the win probability prediction for all time of all games in 2018. We then generate the player ranking using the `player_ranking()`

function which generates a ranked DataFrame containing all players and their ranked performance using the metric mentioned above.

```
Model.WP_list()
rank=Model.player_ranking()
```

If we look at the Dataframe, we can see the top most valuable players including LeBron James and Giannis Antetokounmpo. This is indicative that when they are on court, their actions largely contribute to increasing the winning probability of the team. Note that here the criterion of measure a players ability is through the changes of WP influenced by that player. In another word, it measures a player’s ability to turn the process of a game. It might be successful screens, high level covering defense, or just boost morale of his teammates. None of these can be captured by existing tranditional stats, because they are not and not possible to be recorded.

```
rank.head(20)
```

coef | time | total | name | Rank | |
---|---|---|---|---|---|

0 | 0.002734 | 8502.0 | 23.246689 | Jeff Teague | 1.0 |

1 | 0.001728 | 8870.0 | 15.326443 | Otto Porter | 2.0 |

2 | 0.002961 | 5149.0 | 15.246939 | Tyus Jones | 3.0 |

3 | 0.001942 | 7071.0 | 13.731552 | Joel Embiid | 4.0 |

4 | 0.001295 | 10572.0 | 13.687375 | LeBron James | 5.0 |

5 | 0.001475 | 9272.0 | 13.673057 | Giannis Antetokounmpo | 6.0 |

6 | 0.001330 | 9946.0 | 13.225438 | Bradley Beal | 7.0 |

7 | 0.001338 | 9854.0 | 13.185781 | Taj Gibson | 8.0 |

8 | 0.001371 | 9416.0 | 12.909097 | James Harden | 9.0 |

9 | 0.001432 | 8874.0 | 12.711584 | Al Horford | 10.0 |

10 | 0.001430 | 8618.0 | 12.322240 | Robert Covington | 11.0 |

11 | 0.001777 | 6823.0 | 12.122346 | Garrett Temple | 12.0 |

12 | 0.001292 | 9218.0 | 11.912959 | Brandon Ingram | 13.0 |

13 | 0.001570 | 7106.0 | 11.156157 | Kristaps Porzingis | 14.0 |

14 | 0.001794 | 6192.0 | 11.106574 | Enes Kanter | 15.0 |

15 | 0.001040 | 10376.0 | 10.788628 | Jrue Holiday | 16.0 |

16 | 0.001612 | 6685.0 | 10.779440 | Wesley Johnson | 17.0 |

17 | 0.001171 | 9123.0 | 10.686375 | LaMarcus Aldridge | 18.0 |

18 | 0.001281 | 8340.0 | 10.680582 | Gary Harris | 19.0 |

19 | 0.001428 | 7465.0 | 10.657868 | Eric Gordon | 20.0 |