Player Performace Evaluation

1. Downloading Players on Court Data
2. Interpolate the win probability
3. Predict with linear regression model

exec(open("Function.py").read())
%matplotlib inline

Here we create an instance Model from the class. We are only interested in working with data for 2018. Therefore, we can pull up only the data for Season 2018. We then build the dataframe we will be working with by calling the BuildDF function.

Model=NBAWinProbability(Seasons=[2018])
Model.BuildDF()
Model.TrainTestSplit(train_fraction=0)

Build DF Completed!
Train-Test Split Completed!

Here we load the .pkl model for the 2012-17 seasons trained model. Note: we are not traning on 2018 and throught this part we are evaluating 2018 as a test perditction. All of the work we do here is assuming the model has never seen this data prior.

Model.loadModelPersist('2012-17Model.pkl')

Model.DownloadPlayerOnCourtInfo()

Dowloading On Court Player information Completed!

The way we evaluate players perfomance is based on the model prediction for win probability.

1. Downloading Players on Court Data

In order to obtain the players on court data we must do some web-scraping and interpreting of tables. We go through all of the games of the 2018 season and scape player on-court information from each game from websites like these: https://www.basketball-reference.com/boxscores/plus-minus/201711290LAL.html

title

From a website like this, we go through each div class of a row and obtain the pixel information of how large a given range a player was on court. Then, we convert this pixel infromation into a linear map and interpolate between points to obtain the best continuous function for the game time we can get. We then save that into a .csv file that contains each players amount of time on court for each game. More specifically in this .csv file we store the players on court every 5 second for every player for every game. The player features are designed as -1 if they are playing in the home side team and 1 if they are playing in the away team.

Furthermore, we obtain the data on which players are on the court at all times.

2. Interpolate the win probability

We already have the predicted win probability for each game at each event. But the time interval between each event does not much the time interval for the players on court data we have. Therefore we interpolate the win probability array for each game to obtain the win probablity at every 5 second interval, so WP matches with play on court data frame.

3. Predict with linear regression model

The player information is not used in the prediction for the win probability. Instead we join this information from both sources to make up the ranking of each of the players. We obtain the change in win probability for each time and check which players are on court and attribute the change in winning probability to the change in players on the court. We learn a linear regression model on the winning probability output of the logistic regression model and output. The linear regression model takes as input the one-hot vectors of which players are on the court at any given time. It is important to note that the player features are designed as -1 if they are playing in the home side team and 1 if they are playing in the away team. If a player is not playing at an instance of time, their value will be denoted by 0. By doing this we are considering the strength of one team’s opponent. The linear regression is learned using this scheme and the coefficents of the predictors indicate how valuable the player is in changing the win probability. We extract the values of the coeficients for the linear regression model and rank them based on magnitude. It is also worth mentioning that the code is smart enough to consider overtime correctly.

To achive the above goal we call on the function WP_list() to generate the win probability prediction for all time of all games in 2018. We then generate the player ranking using the player_ranking() function which generates a ranked DataFrame containing all players and their ranked performance using the metric mentioned above.

Model.WP_list()
rank=Model.player_ranking()

If we look at the Dataframe, we can see the top most valuable players including LeBron James and Giannis Antetokounmpo. This is indicative that when they are on court, their actions largely contribute to increasing the winning probability of the team. Note that here the criterion of measure a players ability is through the changes of WP influenced by that player. In another word, it measures a player’s ability to turn the process of a game. It might be successful screens, high level covering defense, or just boost morale of his teammates. None of these can be captured by existing tranditional stats, because they are not and not possible to be recorded.

rank.head(20)

	coef	time	total	name	Rank
0	0.002734	8502.0	23.246689	Jeff Teague	1.0
1	0.001728	8870.0	15.326443	Otto Porter	2.0
2	0.002961	5149.0	15.246939	Tyus Jones	3.0
3	0.001942	7071.0	13.731552	Joel Embiid	4.0
4	0.001295	10572.0	13.687375	LeBron James	5.0
5	0.001475	9272.0	13.673057	Giannis Antetokounmpo	6.0
6	0.001330	9946.0	13.225438	Bradley Beal	7.0
7	0.001338	9854.0	13.185781	Taj Gibson	8.0
8	0.001371	9416.0	12.909097	James Harden	9.0
9	0.001432	8874.0	12.711584	Al Horford	10.0
10	0.001430	8618.0	12.322240	Robert Covington	11.0
11	0.001777	6823.0	12.122346	Garrett Temple	12.0
12	0.001292	9218.0	11.912959	Brandon Ingram	13.0
13	0.001570	7106.0	11.156157	Kristaps Porzingis	14.0
14	0.001794	6192.0	11.106574	Enes Kanter	15.0
15	0.001040	10376.0	10.788628	Jrue Holiday	16.0
16	0.001612	6685.0	10.779440	Wesley Johnson	17.0
17	0.001171	9123.0	10.686375	LaMarcus Aldridge	18.0
18	0.001281	8340.0	10.680582	Gary Harris	19.0
19	0.001428	7465.0	10.657868	Eric Gordon	20.0

Player Performace Evaluation

Contents

1. Downloading Players on Court Data

2. Interpolate the win probability

3. Predict with linear regression model