Player Performace Evaluation

Contents

exec(open("Function.py").read())
%matplotlib inline

Here we create an instance Model from the class. We are only interested in working with data for 2018. Therefore, we can pull up only the data for Season 2018. We then build the dataframe we will be working with by calling the BuildDF function.

Model=NBAWinProbability(Seasons=[2018])
Model.BuildDF()
Model.TrainTestSplit(train_fraction=0)
Build DF Completed!
Train-Test Split Completed!

Here we load the .pkl model for the 2012-17 seasons trained model. Note: we are not traning on 2018 and throught this part we are evaluating 2018 as a test perditction. All of the work we do here is assuming the model has never seen this data prior.

Model.loadModelPersist('2012-17Model.pkl')
Model.DownloadPlayerOnCourtInfo()
Dowloading On Court Player information Completed!

The way we evaluate players perfomance is based on the model prediction for win probability.

1. Downloading Players on Court Data

In order to obtain the players on court data we must do some web-scraping and interpreting of tables. We go through all of the games of the 2018 season and scape player on-court information from each game from websites like these: https://www.basketball-reference.com/boxscores/plus-minus/201711290LAL.html

title

From a website like this, we go through each div class of a row and obtain the pixel information of how large a given range a player was on court. Then, we convert this pixel infromation into a linear map and interpolate between points to obtain the best continuous function for the game time we can get. We then save that into a .csv file that contains each players amount of time on court for each game. More specifically in this .csv file we store the players on court every 5 second for every player for every game. The player features are designed as -1 if they are playing in the home side team and 1 if they are playing in the away team.

Furthermore, we obtain the data on which players are on the court at all times.

2. Interpolate the win probability

We already have the predicted win probability for each game at each event. But the time interval between each event does not much the time interval for the players on court data we have. Therefore we interpolate the win probability array for each game to obtain the win probablity at every 5 second interval, so WP matches with play on court data frame.

3. Predict with linear regression model

The player information is not used in the prediction for the win probability. Instead we join this information from both sources to make up the ranking of each of the players. We obtain the change in win probability for each time and check which players are on court and attribute the change in winning probability to the change in players on the court. We learn a linear regression model on the winning probability output of the logistic regression model and output. The linear regression model takes as input the one-hot vectors of which players are on the court at any given time. It is important to note that the player features are designed as -1 if they are playing in the home side team and 1 if they are playing in the away team. If a player is not playing at an instance of time, their value will be denoted by 0. By doing this we are considering the strength of one team’s opponent. The linear regression is learned using this scheme and the coefficents of the predictors indicate how valuable the player is in changing the win probability. We extract the values of the coeficients for the linear regression model and rank them based on magnitude. It is also worth mentioning that the code is smart enough to consider overtime correctly.

To achive the above goal we call on the function WP_list() to generate the win probability prediction for all time of all games in 2018. We then generate the player ranking using the player_ranking() function which generates a ranked DataFrame containing all players and their ranked performance using the metric mentioned above.

Model.WP_list()
rank=Model.player_ranking()

If we look at the Dataframe, we can see the top most valuable players including LeBron James and Giannis Antetokounmpo. This is indicative that when they are on court, their actions largely contribute to increasing the winning probability of the team. Note that here the criterion of measure a players ability is through the changes of WP influenced by that player. In another word, it measures a player’s ability to turn the process of a game. It might be successful screens, high level covering defense, or just boost morale of his teammates. None of these can be captured by existing tranditional stats, because they are not and not possible to be recorded.

rank.head(20)
coef time total name Rank
0 0.002734 8502.0 23.246689 Jeff Teague 1.0
1 0.001728 8870.0 15.326443 Otto Porter 2.0
2 0.002961 5149.0 15.246939 Tyus Jones 3.0
3 0.001942 7071.0 13.731552 Joel Embiid 4.0
4 0.001295 10572.0 13.687375 LeBron James 5.0
5 0.001475 9272.0 13.673057 Giannis Antetokounmpo 6.0
6 0.001330 9946.0 13.225438 Bradley Beal 7.0
7 0.001338 9854.0 13.185781 Taj Gibson 8.0
8 0.001371 9416.0 12.909097 James Harden 9.0
9 0.001432 8874.0 12.711584 Al Horford 10.0
10 0.001430 8618.0 12.322240 Robert Covington 11.0
11 0.001777 6823.0 12.122346 Garrett Temple 12.0
12 0.001292 9218.0 11.912959 Brandon Ingram 13.0
13 0.001570 7106.0 11.156157 Kristaps Porzingis 14.0
14 0.001794 6192.0 11.106574 Enes Kanter 15.0
15 0.001040 10376.0 10.788628 Jrue Holiday 16.0
16 0.001612 6685.0 10.779440 Wesley Johnson 17.0
17 0.001171 9123.0 10.686375 LaMarcus Aldridge 18.0
18 0.001281 8340.0 10.680582 Gary Harris 19.0
19 0.001428 7465.0 10.657868 Eric Gordon 20.0