Skip to content

Commit 42152cb

Browse files
More receiving yards cases (#117)
* add a receiving yardage case * docs: Created return tag for internal functions for CRAN * chore: Update drive and live play columns, fix tests Removed deprecated labels for elapsed time columns in cfbd_drives and added new columns to cfbd_live_plays documentation and tests. Updated test expectations to use expect_in instead of expect_setequal for column checks. * feat: Enhance cfbd_play_stats_player output and docs Added new columns to cfbd_play_stats_player output, improved sack player aggregation, handled NULL values, and updated documentation and examples to reflect changes. Also updated cfbd_live_plays documentation to include new columns for average start yard line and deserve to win metrics. * fix: Substitute timeouts in cfbd_pbp_data when missing Changed cfbd_pbp_data to assign 3 timeouts per half for offense and defense when timeout data is missing from the API. Updated documentation and examples to reflect this behavior. * fix: Specify .groups argument in summarise call Added .groups = "drop" to the dplyr::summarise call in add_play_counts to control grouping behavior and prevent potential warnings in future dplyr versions. * docs: Update cfbd_drives return documentation Removed the specific count of variables from the return value description in both R and Rd files to improve maintainability and accuracy as the data frame structure may change. * fix: use `dplyr::distinct()` over `dplyr::distinct_all()` spacing and function usage in play data functions Corrected spacing and replaced superseded `dplyr::distinct_all()` with `dplyr::distinct()`, and standardized assignment spacing for improved code readability and consistency. * add more cases * not sure these are necessary but sure * matching parse logic * chore: Fix argument order in expect_in for scoreboard tests * fix: Skip games with insufficient plays in play-by-play data Added a check to filter out games with fewer than 20 plays in the play-by-play data processing. This helps avoid issues with EPA/WPA models and improves data validation. * chore: Bump version to 2.1.0 and update release notes Update package version to 2.1.0. Add release notes for bug fixes in `cfbd_pbp_data()` and improvements to `add_yardage()` handling missing yardage values. Update cran-comments to reflect minor release and changes. * chore: Normalize column names and update tests Added normalization for 'seasonType' to 'season_type' in cfbd_stats_game_advanced. Updated tests to check for column inclusion with expect_in instead of expect_setequal, and added team ID columns to betting lines test. --------- Co-authored-by: saiemgilani <saiem.gilani@gmail.com>
1 parent b3312dc commit 42152cb

File tree

9 files changed

+71
-38
lines changed

9 files changed

+71
-38
lines changed

DESCRIPTION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
Package: cfbfastR
22
Title: Access College Football Play by Play Data
3-
Version: 2.0.0
3+
Version: 2.1.0
44
Authors@R: c(
55
person("Saiem", "Gilani", , "saiem.gilani@gmail.com", role = c("cre", "aut")),
66
person("Akshay", "Easwaran", , "akeaswaran@me.com", role = "aut"),

NEWS.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,9 @@
1+
# **cfbfastR v2.1.0**
2+
3+
* Fixes a bug in `cfbd_pbp_data()` where play-by-play data for some games were not as expected.
4+
* Improves `add_yardage()` where plays with missing yardage values were not being handled correctly.
5+
6+
17
# **cfbfastR v2.0.0**
28
### Breaking Changes to Loading Functions
39

R/cfbd_pbp_data.R

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -586,8 +586,15 @@ cfbd_pbp_data <- function(year,
586586
play_df <- purrr::map_dfr(
587587
g_ids,
588588
function(x){
589-
play_df <- play_df %>%
590-
dplyr::filter(.data$game_id == x) %>%
589+
# Note: this should be changed to a complete data validation test in the future
590+
# filter out games with less than 10 plays to avoid issues with EPA/WPA models
591+
game_plays <- play_df %>%
592+
dplyr::filter(.data$game_id == x)
593+
if (nrow(game_plays) < 20) {
594+
cli::cli_alert_danger(glue::glue("Skipping game_id {x} with only {nrow(game_plays)} plays"))
595+
return(NULL)
596+
}
597+
game_plays <- game_plays %>%
591598
penalty_detection() %>%
592599
add_play_counts() %>%
593600
clean_pbp_dat() %>%
@@ -599,7 +606,7 @@ cfbd_pbp_data <- function(year,
599606
# create_wpa_betting() %>%
600607
create_wpa_naive(wp_model = wp_model)
601608
p(sprintf("x=%s", as.integer(x)))
602-
return(play_df)
609+
return(game_plays)
603610
}, ...)
604611
# } else{
605612
# play_df <- purrr::map_dfr(

R/cfbd_stats.R

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -266,6 +266,8 @@ cfbd_stats_game_advanced <- function(year,
266266
colnames(df) <- gsub("_Start", "_start", colnames(df))
267267
colnames(df) <- gsub(".db", "_db", colnames(df))
268268
colnames(df) <- gsub("Id", "_id", colnames(df))
269+
colnames(df) <- gsub("seasonType", "season_type", colnames(df))
270+
269271

270272

271273
df <- df %>%

R/helper_pbp_add_yardage.R

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -85,12 +85,24 @@ add_yardage <- function(play_df) {
8585
-1 * as.numeric(stringr::str_extract(
8686
stringi::stri_extract_first_regex(.data$play_text, "(?<= for a loss of)[^,]+"), "\\d+"
8787
)),
88+
.data$pass == 1 &
89+
stringr::str_detect(.data$play_text, regex("pass to", ignore_case = TRUE)) &
90+
stringr::str_detect(.data$play_text, regex("for a loss of", ignore_case = TRUE)) ~
91+
-1 * as.numeric(stringr::str_extract(
92+
stringi::stri_extract_first_regex(.data$play_text, "(?<= for a loss of)[^,]+"), "\\d+"
93+
)),
8894
.data$pass == 1 &
8995
stringr::str_detect(.data$play_text, regex("pass complete to", ignore_case = TRUE)) &
9096
stringr::str_detect(.data$play_text, regex(" for \\d+ y\\w*ds?", ignore_case = TRUE)) ~
9197
as.numeric(stringr::str_extract(
9298
stringi::stri_extract_first_regex(.data$play_text, "(?<= for)[^,]+"), "\\d+"
9399
)),
100+
.data$pass == 1 &
101+
stringr::str_detect(.data$play_text, regex("pass to", ignore_case = TRUE)) &
102+
stringr::str_detect(.data$play_text, regex(" for \\d+ y\\w*ds?", ignore_case = TRUE)) ~
103+
as.numeric(stringr::str_extract(
104+
stringi::stri_extract_first_regex(.data$play_text, "(?<= for)[^,]+"), "\\d+"
105+
)),
94106
.data$pass == 1 &
95107
stringr::str_detect(.data$play_text, regex("Yd pass", ignore_case = TRUE)) ~
96108
as.numeric(stringr::str_extract(
@@ -99,6 +111,32 @@ add_yardage <- function(play_df) {
99111
.data$pass == 1 &
100112
stringr::str_detect(.data$play_text, regex("pass complete to", ignore_case = TRUE)) ~
101113
yards_gained, # 2024 has games that don't have yards in the PBP text but do have them in the yards_gained field.
114+
115+
# 2025 has some plays list "PASSER pass" at the very end of the play_text
116+
.data$pass == 1 &
117+
stringr::str_detect(.data$play_text, regex("pass \\(\\w", ignore_case = TRUE)) &
118+
stringr::str_detect(.data$play_text, regex("^to ", ignore_case = FALSE)) ~ as.numeric(stringr::str_extract(
119+
stringi::stri_extract_first_regex(.data$play_text, "(?<= for)[^,]+"), "\\d+"
120+
)),
121+
.data$pass == 1 &
122+
stringr::str_detect(.data$play_text, regex("pass$", ignore_case = TRUE)) &
123+
stringr::str_detect(.data$play_text, regex("^to ", ignore_case = FALSE)) ~ as.numeric(stringr::str_extract(
124+
stringi::stri_extract_first_regex(.data$play_text, "(?<= for)[^,]+"), "\\d+"
125+
)),
126+
# 2025 has some plays that have yards in the PBP but no listed passer. the format is the same though
127+
.data$pass == 1 &
128+
stringr::str_detect(.data$play_text, regex("^to ", ignore_case = FALSE)) ~ as.numeric(stringr::str_extract(
129+
stringi::stri_extract_first_regex(.data$play_text, "(?<= for)[^,]+"), "\\d+"
130+
)),
131+
.data$pass == 1 &
132+
stringr::str_detect(.data$play_text, regex("^to ", ignore_case = FALSE)) &
133+
stringr::str_detect(.data$play_text, regex("for a loss of", ignore_case = TRUE)) ~
134+
-1 * as.numeric(stringr::str_extract(
135+
stringi::stri_extract_first_regex(.data$play_text, "(?<= for a loss of)[^,]+"), "\\d+"
136+
)),
137+
.data$pass == 1 &
138+
stringr::str_detect(.data$play_text, regex("^to ", ignore_case = FALSE)) &
139+
stringr::str_detect(.data$play_text, regex("for no gain", ignore_case = TRUE)) ~ 0,
102140
TRUE ~ NA_real_
103141
)
104142
)

cran-comments.md

Lines changed: 5 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -1,30 +1,10 @@
11
## Release summary
22

3-
This is a major release that:
4-
5-
* Addresses the noted documentation issues which caused the previous release to be archived by CRAN.
6-
* Addresses the error from the previous CRAN submission today though does not change official version number.
7-
* Addresses the missing documentation for the `update_cfb_pbp()` function noted in the previous CRAN response.
8-
* Addresses minor functionality issues in `cfbd_play_stats_player()` function.
9-
* Updates the `cfbd_*()` functions to use the new College Football Data API v2.
10-
* Addresses the most recent CRAN comments from the previous submission.
11-
12-
The following functions were added:
13-
* `cfbd_metrics_fg_ep()`
14-
* `cfbd_metrics_wepa_team_season()`
15-
* `cfbd_metrics_wepa_players_passing()`
16-
* `cfbd_metrics_wepa_players_rushing()`
17-
* `cfbd_metrics_wepa_players_kicking()`
18-
* `cfbd_ratings_fpi()`
19-
* `cfbd_live_scoreboard()`
20-
* `cfbd_live_plays()`
21-
* `cfbd_api_key_info()`
22-
23-
There are minor changes to the existing `cfbd_*()` functions under the hood. See `NEWS.md` for more details.
24-
25-
While I believe I updated all twitter links in the `README.md` to non-redirecting links, they do give status 403
26-
when you try to access them without authentication. If this behavior is too problematic and against policy, please let me know and I will
27-
make the changes to the `README.md`.
3+
This is a minor release that:
4+
5+
* Fixes a bug in `cfbd_pbp_data()` where play-by-play data for some games were not as expected.
6+
* Improves `add_yardage()` where plays with missing yardage values were not being handled correctly.
7+
288

299
## R CMD check results
3010

tests/testthat/test-cfbd_betting_lines.R

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -19,16 +19,16 @@ test_that("CFB Betting Lines", {
1919
cols <- c(
2020
"game_id", "season", "season_type", "week",
2121
"start_date",
22-
"home_team", "home_conference", "home_classification", "home_score",
23-
"away_team", "away_conference", "away_classification", "away_score",
22+
"home_team_id", "home_team", "home_conference", "home_classification", "home_score",
23+
"away_team_id", "away_team", "away_conference", "away_classification", "away_score",
2424
"provider", "spread", "formatted_spread",
2525
"spread_open", "over_under", "over_under_open",
2626
"home_moneyline", "away_moneyline"
2727
)
2828
expect_equal(nrow(x), 4)
2929
expect_equal(nrow(y), 4)
30-
expect_setequal(colnames(x), cols)
31-
expect_setequal(colnames(y), cols)
30+
expect_in(cols, colnames(x))
31+
expect_in(cols, colnames(y))
3232
expect_s3_class(x, "data.frame")
3333
expect_s3_class(y, "data.frame")
3434
})

tests/testthat/test-cfbd_live_scoreboard.R

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -51,8 +51,8 @@ test_that("CFB Live Scoreboard", {
5151
x <- cfbd_live_scoreboard(division='fbs', conference = "B12")
5252

5353
y <- cfbd_live_scoreboard(division='fbs')
54-
expect_in(colnames(x), cols)
55-
expect_in(colnames(y), cols)
54+
expect_in(cols, colnames(x))
55+
expect_in(cols, colnames(y))
5656
expect_s3_class(x, "data.frame")
5757
expect_s3_class(y, "data.frame")
5858
})

tests/testthat/test-cfbd_stats_game_advanced.R

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -33,9 +33,9 @@ test_that("CFB Stats Game - Advanced", {
3333
y <- cfbd_stats_game_advanced(2019, team = "LSU")
3434

3535
z <- cfbd_stats_game_advanced(2013, team = "Florida State")
36-
expect_setequal(colnames(x), cols)
37-
expect_setequal(colnames(y), cols)
38-
expect_setequal(colnames(z), cols)
36+
expect_in(cols, colnames(x))
37+
expect_in(cols, colnames(y))
38+
expect_in(cols, colnames(z))
3939
expect_s3_class(x, "data.frame")
4040
expect_s3_class(y, "data.frame")
4141
expect_s3_class(z, "data.frame")

0 commit comments

Comments
 (0)