Skip to content

Conversation

wrridgeway
Copy link
Member

@wrridgeway wrridgeway commented Sep 3, 2025

Addressing an issue where we get new years for data in Athena which don't already exist in Socrata, or years are deleted from Athena and the corresponding rows need to be removed from Socrata. Before, this would error out trying to find a row_id column in an empty dataframe if the year value wasn't present on Socrata, or would leave rows undeleted on Socrata if they had a year value that had been removed from Athena.

This hasn't caused us an issue yet since we've almost exclusively been using the script to update the present year (2025) of data for each asset, and when I tested the script updating all years in the past enough time hadn't passed between runs for there to be differences in year values.

This run of the workflow successfully handled year = 205 which was causing the error here (205 had been added to athena but wasn't present in socrata), but ended up failing when athena got nuked.

Here's a run I initialized after solving the data is on athena but not on socrata issue. There was still a one row difference between the athena and socrata assets due to a year value that no longer existed in athena. The run found the row on socrata and removed it.

@wrridgeway wrridgeway self-assigned this Sep 3, 2025
Comment on lines +246 to +253
query_dict = {
year: f"{query} WHERE year = {year}"
for year in years
if not pd.isna(year)
}

if any(pd.isna(years)):
query_dict.update({"nan": f"{query} WHERE year IS NULL"})
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making sure we don't exclude data on socrata that has null year values.

input_data[":deleted"] = None
input_data.loc[input_data["_merge"] == "right_only", ":deleted"] = True
input_data = input_data.drop(columns=["_merge"])
if len(socrata_rows) > 0:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we get new years in athena there won't be any comparable socrata data to gather and this was causing an issue for the permits asset.

@wrridgeway wrridgeway changed the title Anticipate different years between Athena and Socrata assets Anticipate different year values between Athena and Socrata assets Sep 5, 2025
.split(",")
)
# row id won't show up here since it's hidden on the open data portal assets
asset_columns += ["row_id"]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is no longer needed since we exposed row_id on the open data assets.

@wrridgeway wrridgeway marked this pull request as ready for review September 5, 2025 14:51
@wrridgeway wrridgeway requested a review from a team as a code owner September 5, 2025 14:51
@jeancochrane jeancochrane self-assigned this Sep 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants