GitHub - stephematician/statsnbaR: stats.nba.com endpoint interface for R

statsnbaR

Stephen Wade 20/08/2017

R interface to stats.nba.com

This is a simple interface to stats.nba.com.

Before going into any further details of this package, there are some house-keeping tasks:

All the data from the website is Copyright (c) 2017 NBA Media Ventures, LLC. All rights reserved. When using this package you must agree to the Terms of Use of the website. All the terms are important and must be read and agreed to. Pay extra attention to: * Section 1 - Ownership and User Restrictions; * Section 9 - NBA Statistics; * Section 11--14 and 16--21; and * read all of it, really!
As this package sends http requests to stats.nba.com, you must read and agree to the terms of their Privacy Policy before using this package.
This code is licensed under the MIT license , and you may use this package strictly under those terms.

The details of the API end-points were manually sourced by the approach given in this blog post. In order to semi-future-proof the package, the queries to these end-points and the data extracted from them are evaluated through a fairly informal abstract data layer (ADL). The ADL is specified in internal data extracted from a YAML which can be viewed on github: http://www.github.com/stephematician/statsnbaR/tree/master/data-raw/ADL.yaml.

The package is split into player, team and game data. All the functions are fully documented, and so details can be found there.

For the sake of sales (hah), here's some example code to give you an idea how easy it is to use the package.

Demonstration

A demonstration of the data, which admittedly was rushed together as part of a Coursera, can be found at https://stephematician.shinyapps.io/nba_cluster.

Installation

Installation is performed via github using the devtools package.

library(devtools)
install_github('stephematician/statsnbaR')

Player data examples

Let's just have a look at the player data from the 2015-2016 season, and select the D-League players who are active:

library(statsnbaR)
library(dplyr)
dleague_players <- get_players(league='d-league')

head(filter(dleague_players, roster_status==TRUE) %>%
     select(person_id, first_name, last_name, team_id, team_city, team_name))

person_id	first_name	last_name	team_id	team_city	team_name
201861	Antoine	Agudio	1612709893	Canton	Charge
1627378	Mychal	Ammons	1612709903	Idaho	Stampede
203648	Thanasis	Antetokounmpo	1612709919	Westchester	Knicks
203951	Keith	Appling	1612709913	Erie	BayHawks
1626276	Darion	Atkins	1612709919	Westchester	Knicks
1627359	Eric	Atkins	1612709913	Erie	BayHawks

The person_id and team_id are important keys for other data such as shot charts.

Traditional statistics

We might also want to know their traditional stats for their last 10 games, in, say a previous season (just to show you how to do this kind of query)

filter_last10 = filter_per_player(league='d-league',
                                  last_n=10,
                                  season=2014)
ppd_last10_2014 = per_player_data(filter_last10)

ppd_last10_2014 = filter(ppd_last10_2014,
                         person_id %in% dleague_players$person_id) %>%
                  select(-c(person_id, team_id))

player_name	team_abbr	age	games	win	loss	mins	fgm	fga	fg3m	fg3a	ftm	fta	oreb	dreb	reb	ast	tov	stl	blk	blka	pf	pfd	points	plus_minus	dd2	td3
A.J. Davis	SXF	27	9	6	3	102.6350	6	29	1	14	5	10	0	16	16	3	7	2	2	2	5	9	18	-3	0	0
Aaron Craft	SCW	24	10	8	2	363.7950	36	77	9	23	29	38	7	48	55	56	19	19	4	5	27	31	110	94	2	1
Adonis Thomas	GRD	22	10	5	5	404.8900	85	183	25	73	25	28	10	57	67	13	18	10	4	4	13	24	220	-7	1	0
Adrian Thomas	BAK	28	10	5	5	182.2250	18	52	16	46	0	0	1	15	16	7	3	5	0	0	17	3	52	-87	0	0
Akeem Richmond	RGV	24	2	1	1	6.8500	0	4	0	4	0	0	0	0	0	0	0	0	0	0	0	0	0	-8	0	0
Akil Mitchell	RGV	23	10	5	5	217.0017	30	52	1	5	9	29	26	61	87	15	15	6	5	2	22	20	70	-10	1	0

We can see that Aaron Craft was ballin' in the last ten games of the 2014-15 D-League season. Did he ever play the NBA?

# 2015-16 is the default
nba_players = get_players(league='nba')
filter(nba_players, last_name=='Craft')

Aww, empty data.frame. Oh well!

Advanced statistics

You can also get more advanced statistics by specifying measurement='advanced' in the call to per_player_data.

ppad_last10_2014 = per_player_data(filter_last10,
                                   measurement='advanced')

Again selecting only the D-league players that played this year:

player_name	team_abbr	age	games	win	loss	mins	off_rtg	def_rtg	net_rtg	ast_pct	ast/tov	ast_ratio	oreb_pct	dreb_pct	reb_pct	tov_pct	EFG_pct	ts_pct	usg_pct	pace	PIE
A.J. Davis	SXF	27	9	6	3	11.4	107.0	106.0	1.0	0.041	0.43	6.9	0.000	0.165	0.089	16.1	0.224	0.269	0.181	95.97	0.000
Aaron Craft	SCW	24	10	8	2	36.4	111.6	100.5	11.1	0.196	2.95	33.2	0.021	0.122	0.075	11.3	0.526	0.587	0.126	104.73	0.098
Adonis Thomas	GRD	22	10	5	5	40.5	110.3	109.9	0.4	0.049	0.72	5.7	0.028	0.145	0.090	8.0	0.533	0.563	0.232	98.79	0.114
Adrian Thomas	BAK	28	10	5	5	18.2	94.1	116.4	-22.4	0.058	2.33	11.3	0.005	0.096	0.045	4.8	0.500	0.500	0.126	104.77	0.034
Akeem Richmond	RGV	24	2	1	1	3.4	66.7	115.1	-48.4	0.000	0.00	0.0	0.000	0.000	0.000	0.0	0.000	0.000	0.267	107.35	-0.160
Akil Mitchell	RGV	23	10	5	5	21.7	113.8	115.0	-1.2	0.088	1.00	15.8	0.112	0.289	0.196	15.8	0.587	0.540	0.141	107.48	0.093

Setting `per` game, possession, minute

You might notice that the results for the advanced stats appear to be in per-game format, whereas for the basic stats they were in totals. You can actually get per-game traditional stats, too, by setting per='game' in the filter. This is what the output would look like.

player_name	team_abbr	age	games	win	loss	mins	fgm	fga	fg3m	fg3a	ftm	fta	oreb	dreb	reb	ast	tov	stl	blk	blka	pf	pfd	points	plus_minus	dd2	td3
A.J. Davis	SXF	27	9	6	3	11.4	0.7	3.2	0.1	1.6	0.6	1.1	0.0	1.8	1.8	0.3	0.8	0.2	0.2	0.2	0.6	1.0	2.0	-0.3	0	0
Aaron Craft	SCW	24	10	8	2	36.4	3.6	7.7	0.9	2.3	2.9	3.8	0.7	4.8	5.5	5.6	1.9	1.9	0.4	0.5	2.7	3.1	11.0	9.4	2	1
Adonis Thomas	GRD	22	10	5	5	40.5	8.5	18.3	2.5	7.3	2.5	2.8	1.0	5.7	6.7	1.3	1.8	1.0	0.4	0.4	1.3	2.4	22.0	-0.7	1	0
Adrian Thomas	BAK	28	10	5	5	18.2	1.8	5.2	1.6	4.6	0.0	0.0	0.1	1.5	1.6	0.7	0.3	0.5	0.0	0.0	1.7	0.3	5.2	-8.7	0	0
Akeem Richmond	RGV	24	2	1	1	3.4	0.0	2.0	0.0	2.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	-4.0	0	0
Akil Mitchell	RGV	23	10	5	5	21.7	3.0	5.2	0.1	0.5	0.9	2.9	2.6	6.1	8.7	1.5	1.5	0.6	0.5	0.2	2.2	2.0	7.0	-1.0	1	0

You can see that the minutes value now matches up and looks like a per-game number of minutes.

There are a couple of things worth noticing about the data returned by the queries:

The data returned by JSON seems to be clearly character based, so total values might be more useful for any careful analysis as they won't suffer whatever rounding stats.nba.com offers
Also notice that percentages are given as not-percentages, e.g. they are given as the actual raw 'rates' between 0 and 1.
Wins, losses, etc - are not subject to the usual per='game' rules, i.e. these values always return totals.

In general, it is always worth looking at a few records of the returned data to get an idea what units the values are actually returning for different queries.

Team data examples

Game log examples

These are broken into two sets; the player game log or the team game log.

Player game log

Game log queries are relatively similar to getting the names and details of players via get_players().

gl_2013 = get_game_log(league='nba', season=2013)
head(select(gl_2013, -c(season, team_abbr, team_name, fgm:plus_minus)))

And the output produced looks something like:

person_id	player_name	game_id	game_date	matchup	win	mins	video
2546	Carmelo Anthony	21300640	2014-01-24	NYK vs. CHA	TRUE	39	TRUE
2544	LeBron James	21300893	2014-03-03	MIA vs. CHA	TRUE	41	TRUE
201142	Kevin Durant	21300592	2014-01-17	OKC vs. GSW	TRUE	44	TRUE
201147	Corey Brewer	21301183	2014-04-11	MIN vs. HOU	TRUE	45	TRUE
201142	Kevin Durant	21301024	2014-03-21	OKC @ TOR	TRUE	52	TRUE
203082	Terrence Ross	21300647	2014-01-25	TOR vs. LAC	FALSE	44	TRUE

The game_id is an important key for other data such as shot charts, and play-by-play data.
This is a large set of data, so if you only want game_ids, then team game logs is one easier approach.

Team game logs

This is the same as the player game log, as an example only, I've also selected a season_type here - which you may also do for player game logs.

This is a much smaller set of data, so if you only want game_ids this would be the way to retrieve them.

As an example, we just use the playoffs from the 2014-15 season and try to find the finals games as we know that the finals were between Cleveland and Golden State:

playoff_logs = team_game_logs(season=2014,
                              season_type='playoff')
cle_data = filter(playoff_logs, team_abbr=='CLE') %>%
                  select(game_id)
gsw_data = filter(playoff_logs, team_abbr=='GSW') %>%
                  select(game_id)
finals_data = filter(cle_data, game_id %in% gsw_data$game_id)

head(filter(playoff_logs,
            game_id %in% finals_data$game_id) %>%
     select(-c(season,fgm:video)) %>%
     arrange(game_id))

team_id	team_abbr	team_name	game_id	game_date	home	win	mins
1610612744	GSW	Golden State Warriors	41400401	2015-06-04	FALSE	TRUE	265
1610612739	CLE	Cleveland Cavaliers	41400401	2015-06-04	FALSE	FALSE	265
1610612739	CLE	Cleveland Cavaliers	41400402	2015-06-07	FALSE	TRUE	265
1610612744	GSW	Golden State Warriors	41400402	2015-06-07	FALSE	FALSE	265
1610612739	CLE	Cleveland Cavaliers	41400403	2015-06-09	FALSE	TRUE	240
1610612744	GSW	Golden State Warriors	41400403	2015-06-09	FALSE	FALSE	240

This shows how the team logs will admit two rows per game, with data for each team (I've omitted some of that output here via the select command).

Play-by-play examples

Shot chart examples

Notes

Using curl/httr

Timeouts can occur with some versions of curl due to incorrectly configured IPv6. One work-around is to pass a config argument on to httr, e.g.

get_players(league='d-league',
            config=config(ipresolve=1))

The correct value for the ipresolve argument can be found via with(curl::curl_symbols, value(name == 'CURL_IPRESOLVE_V4')).

Alternatives

An Alternative to this package is the more highly featured nbastatsR package, definitely worth a look.

This package was partly inspired by nba_py which, sadly, seems outdated.

Testing platform

This package was built and tested with the following R software:

> sessionInfo()
R version 3.2.3 (2015-12-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.4 LTS

other attached packages:
[1] statsnbaR_0.1 httr_1.1.0

loaded via a namespace (and not attached):
[1] R6_2.1.2        curl_0.9.6      jsonlite_0.9.19

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
R		R
data-raw		data-raw
man		man
tests		tests
.Rbuildignore		.Rbuildignore
.build_manual.R		.build_manual.R
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
NAMESPACE		NAMESPACE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

statsnbaR

R interface to stats.nba.com

Demonstration

Installation

Player data examples

Traditional statistics

Advanced statistics

Setting `per` game, possession, minute

Team data examples

Game log examples

Player game log

Team game logs

Play-by-play examples

Shot chart examples

Notes

Using curl/httr

Alternatives

Testing platform

About

Releases

Packages

Languages

License

stephematician/statsnbaR

Folders and files

Latest commit

History

Repository files navigation

statsnbaR

R interface to stats.nba.com

Demonstration

Installation

Player data examples

Traditional statistics

Advanced statistics

Setting per game, possession, minute

Team data examples

Game log examples

Player game log

Team game logs

Play-by-play examples

Shot chart examples

Notes

Using curl/httr

Alternatives

Testing platform

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Setting `per` game, possession, minute

Packages