Skip to contents

The latest release of the baseballr includes a function for acquiring player statistics from the NCAA’s website for baseball teams across the three major divisions (I, II, III).

The function, ncaa_scrape, requires the user to pass values for three parameters for the function to work:

school_id: numerical code used by the NCAA for each school year: a four-digit year type: whether to pull data for batters or pitchers

If you want to pull batting statistics for Vanderbilt for the 2013 season, you would use the following:

library(baseballr)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
ncaa_scrape(736, 2021, "batting") %>%
  select(year:OBPct)
#> ── NCAA Baseball Team Stats data from stats.ncaa.org ───────────────────
#>  Data updated: 2022-09-09 07:41:12 UTC
#> # A tibble: 41 × 12
#>     year school    confe…¹ divis…² Jersey Player Yr    Pos      GP    GS
#>    <int> <chr>     <chr>     <dbl> <chr>  <chr>  <chr> <chr> <dbl> <dbl>
#>  1  2021 Vanderbi… SEC           1 51     Bradf… Fr    OF       67    67
#>  2  2021 Vanderbi… SEC           1 25     Nolan… So    INF      66    66
#>  3  2021 Vanderbi… SEC           1 99     Gonza… So    INF      61    58
#>  4  2021 Vanderbi… SEC           1 9      Young… So    INF      61    61
#>  5  2021 Vanderbi… SEC           1 12     Keega… Jr    UT       60    60
#>  6  2021 Vanderbi… SEC           1 8      Thoma… Jr    OF       59    57
#>  7  2021 Vanderbi… SEC           1 5      Rodri… So    C        58    52
#>  8  2021 Vanderbi… SEC           1 16     Bulge… Fr    UT       50    41
#>  9  2021 Vanderbi… SEC           1 6      Kolwy… Jr    INF      43    39
#> 10  2021 Vanderbi… SEC           1 19     LaNev… So    OF       37    19
#> # … with 31 more rows, 2 more variables: BA <dbl>, OBPct <dbl>, and
#> #   abbreviated variable names ¹​conference, ²​division

The same can be done for pitching, just by changing the type parameter:

ncaa_scrape(736, 2021, "pitching") %>%
  select(year:ERA)
#> ── NCAA Baseball Team Stats data from stats.ncaa.org ───────────────────
#>  Data updated: 2022-09-09 07:41:14 UTC
#> # A tibble: 41 × 12
#>     year school    confe…¹ divis…² Jersey Player Yr    Pos      GP   App
#>    <int> <chr>     <chr>     <dbl> <chr>  <chr>  <chr> <chr> <dbl> <dbl>
#>  1  2021 Vanderbi… SEC           1 51     Bradf… Fr    OF       67    67
#>  2  2021 Vanderbi… SEC           1 25     Nolan… So    INF      66    66
#>  3  2021 Vanderbi… SEC           1 99     Gonza… So    INF      61    61
#>  4  2021 Vanderbi… SEC           1 9      Young… So    INF      61    61
#>  5  2021 Vanderbi… SEC           1 12     Keega… Jr    UT       60    60
#>  6  2021 Vanderbi… SEC           1 8      Thoma… Jr    OF       59    59
#>  7  2021 Vanderbi… SEC           1 5      Rodri… So    C        58    58
#>  8  2021 Vanderbi… SEC           1 16     Bulge… Fr    UT       50    50
#>  9  2021 Vanderbi… SEC           1 6      Kolwy… Jr    INF      43    43
#> 10  2021 Vanderbi… SEC           1 19     LaNev… So    OF       37    37
#> # … with 31 more rows, 2 more variables: GS <dbl>, ERA <dbl>, and
#> #   abbreviated variable names ¹​conference, ²​division

Now, the function is dependent on the user knowing the school_id used by the NCAA website. Given that, I’ve included a ncaa_school_id_lu function so that users can find the school_id they need.

Just pass a string to the function and it will return possible matches based on the school’s name:

ncaa_school_id_lu("Vand")
#> # A tibble: 10 × 6
#>    school     conference school_id  year division conference_id
#>    <chr>      <chr>          <dbl> <dbl>    <dbl>         <dbl>
#>  1 Vanderbilt SEC              736  2013        1           911
#>  2 Vanderbilt SEC              736  2014        1           911
#>  3 Vanderbilt SEC              736  2015        1           911
#>  4 Vanderbilt SEC              736  2016        1           911
#>  5 Vanderbilt SEC              736  2017        1           911
#>  6 Vanderbilt SEC              736  2018        1           911
#>  7 Vanderbilt SEC              736  2019        1           911
#>  8 Vanderbilt SEC              736  2020        1           911
#>  9 Vanderbilt SEC              736  2021        1           911
#> 10 Vanderbilt SEC              736  2022        1           911