The latest release of the baseballr
includes a function for acquiring player statistics from the NCAA’s website for baseball teams
across the three major divisions (I, II, III).
In order to look up teams, you can either load the teams for all
divisions from the baseballr-data
repository or access them
directly from the NCAA website for a given year and division.
Loading from the baseballr-data repository:
library(baseballr)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
ncaa_teams_df <- load_ncaa_baseball_teams()
From the NCAA website:
try(ncaa_teams(year = most_recent_ncaa_baseball_season(), division = "1"))
#> ── NCAA Baseball Teams data from stats.ncaa.org ───── baseballr 1.5.0 ──
#> ℹ Data updated: 2023-03-21 18:51:10 UTC
#> # A tibble: 305 × 8
#> team_id team_name team_url confe…¹ confe…² divis…³ year seaso…⁴
#> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <chr>
#> 1 140 Cincinnati /team/1… 823 AAC 1 2023 16340
#> 2 196 East Carolina /team/1… 823 AAC 1 2023 16340
#> 3 288 Houston /team/2… 823 AAC 1 2023 16340
#> 4 404 Memphis /team/4… 823 AAC 1 2023 16340
#> 5 651 South Fla. /team/6… 823 AAC 1 2023 16340
#> 6 718 Tulane /team/7… 823 AAC 1 2023 16340
#> 7 128 UCF /team/1… 823 AAC 1 2023 16340
#> 8 782 Wichita St. /team/7… 823 AAC 1 2023 16340
#> 9 67 Boston College /team/6… 821 ACC 1 2023 16340
#> 10 147 Clemson /team/1… 821 ACC 1 2023 16340
#> # … with 295 more rows, and abbreviated variable names ¹conference_id,
#> # ²conference, ³division, ⁴season_id
The function, ncaa_team_player_stats()
, requires the
user to pass values for three parameters for the function to work:
team_id
: numerical code used by the NCAA for each school
year
: a four-digit year type
: whether to pull
data for batters or pitchers
If you want to pull batting statistics for Florida State for the 2023 season, you would use the following:
team_id <- ncaa_teams_df %>%
dplyr::filter(.data$team_name == "Florida St.") %>%
dplyr::select("team_id") %>%
dplyr::distinct() %>%
dplyr::pull("team_id")
year <- most_recent_ncaa_baseball_season()
ncaa_team_player_stats(team_id = team_id, year = year, "batting")
#> ── NCAA Baseball Team Batting Stats data from stats.ncaa.org ───────────
#> ℹ Data updated: 2023-03-21 18:51:12 UTC
#> # A tibble: 37 × 35
#> year team_…¹ team_id confe…² confe…³ divis…⁴ playe…⁵ playe…⁶ playe…⁷
#> <int> <chr> <dbl> <int> <chr> <dbl> <int> <chr> <chr>
#> 1 2023 Florid… 234 821 ACC 1 2649339 http:/… Tibbs …
#> 2 2023 Florid… 234 821 ACC 1 2649334 http:/… Ferrer…
#> 3 2023 Florid… 234 821 ACC 1 2478605 http:/… Carrio…
#> 4 2023 Florid… 234 821 ACC 1 2468075 http:/… Vincen…
#> 5 2023 Florid… 234 821 ACC 1 2112619 http:/… De Sed…
#> 6 2023 Florid… 234 821 ACC 1 2649307 http:/… Rank, …
#> 7 2023 Florid… 234 821 ACC 1 2797459 http:/… Smith,…
#> 8 2023 Florid… 234 821 ACC 1 2649340 http:/… Bush, …
#> 9 2023 Florid… 234 821 ACC 1 2797428 http:/… Kamaka…
#> 10 2023 Florid… 234 821 ACC 1 2797465 http:/… Willia…
#> # … with 27 more rows, 26 more variables: Yr <chr>, Pos <chr>,
#> # Jersey <chr>, GP <dbl>, GS <dbl>, BA <dbl>, OBPct <dbl>,
#> # SlgPct <dbl>, R <dbl>, AB <dbl>, H <dbl>, `2B` <dbl>, `3B` <dbl>,
#> # TB <dbl>, HR <dbl>, RBI <dbl>, BB <dbl>, HBP <dbl>, SF <dbl>,
#> # SH <dbl>, K <dbl>, DP <dbl>, CS <dbl>, Picked <dbl>, SB <dbl>,
#> # RBI2out <dbl>, and abbreviated variable names ¹team_name,
#> # ²conference_id, ³conference, ⁴division, ⁵player_id, ⁶player_url, …
The same can be done for pitching, just by changing the
type
parameter:
ncaa_team_player_stats(team_id = team_id, year = year, "pitching")
#> ── NCAA Baseball Team Pitching Stats data from stats.ncaa.org ──────────
#> ℹ Data updated: 2023-03-21 18:51:14 UTC
#> # A tibble: 37 × 43
#> year team_…¹ team_id confe…² confe…³ divis…⁴ playe…⁵ playe…⁶ playe…⁷
#> <int> <chr> <dbl> <int> <chr> <dbl> <int> <chr> <chr>
#> 1 2023 Florid… 234 821 ACC 1 2649339 http:/… Tibbs …
#> 2 2023 Florid… 234 821 ACC 1 2649334 http:/… Ferrer…
#> 3 2023 Florid… 234 821 ACC 1 2478605 http:/… Carrio…
#> 4 2023 Florid… 234 821 ACC 1 2468075 http:/… Vincen…
#> 5 2023 Florid… 234 821 ACC 1 2112619 http:/… De Sed…
#> 6 2023 Florid… 234 821 ACC 1 2649307 http:/… Rank, …
#> 7 2023 Florid… 234 821 ACC 1 2797459 http:/… Smith,…
#> 8 2023 Florid… 234 821 ACC 1 2649340 http:/… Bush, …
#> 9 2023 Florid… 234 821 ACC 1 2797428 http:/… Kamaka…
#> 10 2023 Florid… 234 821 ACC 1 2797465 http:/… Willia…
#> # … with 27 more rows, 34 more variables: Yr <chr>, Pos <chr>,
#> # Jersey <chr>, GP <dbl>, App <dbl>, GS <dbl>, ERA <dbl>, IP <dbl>,
#> # H <dbl>, R <dbl>, ER <dbl>, BB <dbl>, SO <dbl>, SHO <dbl>,
#> # BF <dbl>, `P-OAB` <dbl>, `2B-A` <dbl>, `3B-A` <dbl>, Bk <dbl>,
#> # `HR-A` <dbl>, WP <dbl>, HB <dbl>, IBB <dbl>, `Inh Run` <dbl>,
#> # `Inh Run Score` <dbl>, SHA <dbl>, SFA <dbl>, Pitches <dbl>,
#> # GO <dbl>, FO <dbl>, W <dbl>, L <dbl>, SV <dbl>, KL <dbl>, and …
Now, the function is dependent on the user knowing the
team_id
used by the NCAA website. Given that, I’ve included
a ncaa_school_id_lu
function so that users can find the
team_id
they need.
Just pass a string to the function and it will return possible matches based on the school’s name:
ncaa_school_id_lu("Vand")
#> ───────────────────────────────────────────────────── baseballr 1.5.0 ──
#> # A tibble: 14 × 8
#> team_id team_name team_url confe…¹ confe…² divis…³ year seaso…⁴
#> <dbl> <chr> <chr> <dbl> <chr> <dbl> <dbl> <dbl>
#> 1 736 Vanderbilt /team/736/1… 911 SEC 1 2023 16340
#> 2 736 Vanderbilt /team/736/1… 911 SEC 1 2022 15860
#> 3 736 Vanderbilt /team/736/1… 911 SEC 1 2021 15580
#> 4 736 Vanderbilt /team/736/1… 911 SEC 1 2020 15204
#> 5 736 Vanderbilt /team/736/1… 911 SEC 1 2019 15204
#> 6 736 Vanderbilt /team/736/1… 911 SEC 1 2018 12973
#> 7 736 Vanderbilt /team/736/1… 911 SEC 1 2017 12560
#> 8 736 Vanderbilt /team/736/1… 911 SEC 1 2016 12360
#> 9 736 Vanderbilt /team/736/1… 911 SEC 1 2015 12080
#> 10 736 Vanderbilt /team/736/1… 911 SEC 1 2014 11620
#> 11 736 Vanderbilt /team/736/1… 911 SEC 1 2013 11320
#> 12 736 Vanderbilt /team/736/1… 911 SEC 1 2012 10942
#> 13 736 Vanderbilt /team/736/1… 911 SEC 1 2011 10561
#> 14 736 Vanderbilt /team/736/1… 911 SEC 1 2010 10240
#> # … with abbreviated variable names ¹conference_id, ²conference,
#> # ³division, ⁴season_id