Skip to content

Conversation

charlie-costanzo
Copy link
Member

@charlie-costanzo charlie-costanzo commented Aug 12, 2025

Description

During our infrastructure refactor, we determined the need to preserve local warehouse development on production external tables as sources of truth WITH THE EXCEPTION OF RT EXTERNAL TABLES. This PR introduces environment variable DEV_SOURCE_GOOGLE_CLOUD_PROJECT which, when added locally, allows for dbt to reference production external tables at runtime WITH THE EXCEPTION OF RT EXTERNAL TABLES.

Resolves: #4188

Type of change

  • New feature

How has this been tested?

Screenshot 2025-08-12 at 14 58 10

Post-merge follow-ups

  • Action required
    To use the production external tables in local warehouse development, the following line needs to be added to your environment:

export DEV_SOURCE_GOOGLE_CLOUD_PROJECT='cal-itp-data-infra'

@charlie-costanzo charlie-costanzo self-assigned this Aug 12, 2025
@charlie-costanzo charlie-costanzo added the data-pipeline-ingestion-and-modeling Ingesting, parsing and modeling data. Evan Siroky is product owner. label Aug 12, 2025
Copy link

github-actions bot commented Aug 12, 2025

Terraform plan in iac/cal-itp-data-infra-staging/airflow/us

Plan: 0 to add, 3 to change, 0 to destroy.
Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
!~  update in-place

Terraform will perform the following actions:

  # google_storage_bucket_object.calitp-staging-composer-catalog will be updated in-place
!~  resource "google_storage_bucket_object" "calitp-staging-composer-catalog" {
!~      content             = (sensitive value)
!~      crc32c              = "scSVQA==" -> (known after apply)
!~      detect_md5hash      = "LzB7tkV9giBECRASvwDTJg==" -> "different hash"
!~      generation          = 1758060434829105 -> (known after apply)
        id                  = "calitp-staging-composer-data/warehouse/target/catalog.json"
!~      md5hash             = "LzB7tkV9giBECRASvwDTJg==" -> (known after apply)
        name                = "data/warehouse/target/catalog.json"
#        (16 unchanged attributes hidden)
    }

  # google_storage_bucket_object.calitp-staging-composer-dags["dbt_project.yml"] will be updated in-place
!~  resource "google_storage_bucket_object" "calitp-staging-composer-dags" {
!~      crc32c              = "41/WGA==" -> (known after apply)
!~      detect_md5hash      = "tXa6XCOXKda0qRNtC+2eAg==" -> "different hash"
!~      generation          = 1758060092636212 -> (known after apply)
        id                  = "calitp-staging-composer-data/warehouse/dbt_project.yml"
!~      md5hash             = "tXa6XCOXKda0qRNtC+2eAg==" -> (known after apply)
        name                = "data/warehouse/dbt_project.yml"
#        (17 unchanged attributes hidden)
    }

  # google_storage_bucket_object.calitp-staging-composer-manifest will be updated in-place
!~  resource "google_storage_bucket_object" "calitp-staging-composer-manifest" {
!~      content             = (sensitive value)
!~      crc32c              = "LrVEZQ==" -> (known after apply)
!~      detect_md5hash      = "nv12yYcJcck2UWfr3ppWyA==" -> "different hash"
!~      generation          = 1758060435641990 -> (known after apply)
        id                  = "calitp-staging-composer-data/warehouse/target/manifest.json"
!~      md5hash             = "nv12yYcJcck2UWfr3ppWyA==" -> (known after apply)
        name                = "data/warehouse/target/manifest.json"
#        (16 unchanged attributes hidden)
    }

Plan: 0 to add, 3 to change, 0 to destroy.

📝 Plan generated in Plan Terraform for Warehouse and DAG changes #640

Copy link

github-actions bot commented Aug 12, 2025

Terraform plan in iac/cal-itp-data-infra/airflow/us

Plan: 0 to add, 22 to change, 0 to destroy.
Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
!~  update in-place

Terraform will perform the following actions:

  # google_storage_bucket_object.calitp-composer-catalog will be updated in-place
!~  resource "google_storage_bucket_object" "calitp-composer-catalog" {
!~      content             = (sensitive value)
!~      crc32c              = "Tev42g==" -> (known after apply)
!~      detect_md5hash      = "WH9/csQm9d1jFHYJ9tDp6w==" -> "different hash"
!~      generation          = 1757541552203979 -> (known after apply)
        id                  = "calitp-composer-data/warehouse/target/catalog.json"
!~      md5hash             = "WH9/csQm9d1jFHYJ9tDp6w==" -> (known after apply)
        name                = "data/warehouse/target/catalog.json"
#        (16 unchanged attributes hidden)
    }

  # google_storage_bucket_object.calitp-composer-dags["dbt_project.yml"] will be updated in-place
!~  resource "google_storage_bucket_object" "calitp-composer-dags" {
!~      crc32c              = "XPuwJQ==" -> (known after apply)
!~      detect_md5hash      = "ZVQqwNQ/pizS7TrWVGpWrA==" -> "different hash"
!~      generation          = 1755538683311062 -> (known after apply)
        id                  = "calitp-composer-data/warehouse/dbt_project.yml"
!~      md5hash             = "ZVQqwNQ/pizS7TrWVGpWrA==" -> (known after apply)
        name                = "data/warehouse/dbt_project.yml"
#        (17 unchanged attributes hidden)
    }

  # google_storage_bucket_object.calitp-composer-dags["models/_source_gtfs_schedule_history.yml"] will be updated in-place
!~  resource "google_storage_bucket_object" "calitp-composer-dags" {
!~      crc32c              = "BXdZpA==" -> (known after apply)
!~      detect_md5hash      = "HVHCS36vhuXW2Wdk8hBKlg==" -> "different hash"
!~      generation          = 1751416662709951 -> (known after apply)
        id                  = "calitp-composer-data/warehouse/models/_source_gtfs_schedule_history.yml"
!~      md5hash             = "HVHCS36vhuXW2Wdk8hBKlg==" -> (known after apply)
        name                = "data/warehouse/models/_source_gtfs_schedule_history.yml"
#        (17 unchanged attributes hidden)
    }

  # google_storage_bucket_object.calitp-composer-dags["models/staging/amplitude/_amplitude.yml"] will be updated in-place
!~  resource "google_storage_bucket_object" "calitp-composer-dags" {
!~      crc32c              = "5ai1mg==" -> (known after apply)
!~      detect_md5hash      = "CCXiffBEEPZ5HLszdyijvw==" -> "different hash"
!~      generation          = 1751416666866881 -> (known after apply)
        id                  = "calitp-composer-data/warehouse/models/staging/amplitude/_amplitude.yml"
!~      md5hash             = "CCXiffBEEPZ5HLszdyijvw==" -> (known after apply)
        name                = "data/warehouse/models/staging/amplitude/_amplitude.yml"
#        (17 unchanged attributes hidden)
    }

  # google_storage_bucket_object.calitp-composer-dags["models/staging/audit/_src_audit.yml"] will be updated in-place
!~  resource "google_storage_bucket_object" "calitp-composer-dags" {
!~      crc32c              = "yeSEyg==" -> (known after apply)
!~      detect_md5hash      = "LP1avYyKTuGYkgdiP9r8GA==" -> "different hash"
!~      generation          = 1751416661148830 -> (known after apply)
        id                  = "calitp-composer-data/warehouse/models/staging/audit/_src_audit.yml"
!~      md5hash             = "LP1avYyKTuGYkgdiP9r8GA==" -> (known after apply)
        name                = "data/warehouse/models/staging/audit/_src_audit.yml"
#        (17 unchanged attributes hidden)
    }

  # google_storage_bucket_object.calitp-composer-dags["models/staging/gtfs/_src_gtfs_rt_external_tables.yml"] will be updated in-place
!~  resource "google_storage_bucket_object" "calitp-composer-dags" {
!~      crc32c              = "irg5NA==" -> (known after apply)
!~      detect_md5hash      = "VMjLStCSrjbmN/SVzNZsKA==" -> "different hash"
!~      generation          = 1751416666081058 -> (known after apply)
        id                  = "calitp-composer-data/warehouse/models/staging/gtfs/_src_gtfs_rt_external_tables.yml"
!~      md5hash             = "VMjLStCSrjbmN/SVzNZsKA==" -> (known after apply)
        name                = "data/warehouse/models/staging/gtfs/_src_gtfs_rt_external_tables.yml"
#        (17 unchanged attributes hidden)
    }

  # google_storage_bucket_object.calitp-composer-dags["models/staging/gtfs/_src_gtfs_schedule_external_tables.yml"] will be updated in-place
!~  resource "google_storage_bucket_object" "calitp-composer-dags" {
!~      crc32c              = "JRuXXA==" -> (known after apply)
!~      detect_md5hash      = "Caqsk8kIhrzLrYxkwYo53g==" -> "different hash"
!~      generation          = 1751416666931203 -> (known after apply)
        id                  = "calitp-composer-data/warehouse/models/staging/gtfs/_src_gtfs_schedule_external_tables.yml"
!~      md5hash             = "Caqsk8kIhrzLrYxkwYo53g==" -> (known after apply)
        name                = "data/warehouse/models/staging/gtfs/_src_gtfs_schedule_external_tables.yml"
#        (17 unchanged attributes hidden)
    }

  # google_storage_bucket_object.calitp-composer-dags["models/staging/hqta/_hqta.yml"] will be updated in-place
!~  resource "google_storage_bucket_object" "calitp-composer-dags" {
!~      crc32c              = "Uuiodw==" -> (known after apply)
!~      detect_md5hash      = "E8kXOfywGmTU9ixt2SEdjw==" -> "different hash"
!~      generation          = 1751416667415993 -> (known after apply)
        id                  = "calitp-composer-data/warehouse/models/staging/hqta/_hqta.yml"
!~      md5hash             = "E8kXOfywGmTU9ixt2SEdjw==" -> (known after apply)
        name                = "data/warehouse/models/staging/hqta/_hqta.yml"
#        (17 unchanged attributes hidden)
    }

  # google_storage_bucket_object.calitp-composer-dags["models/staging/kuba/_src.yml"] will be updated in-place
!~  resource "google_storage_bucket_object" "calitp-composer-dags" {
!~      crc32c              = "VFjYpg==" -> (known after apply)
!~      detect_md5hash      = "d+eUGgQXAgoXWUQWdd4hHg==" -> "different hash"
!~      generation          = 1755538683298627 -> (known after apply)
        id                  = "calitp-composer-data/warehouse/models/staging/kuba/_src.yml"
!~      md5hash             = "d+eUGgQXAgoXWUQWdd4hHg==" -> (known after apply)
        name                = "data/warehouse/models/staging/kuba/_src.yml"
#        (17 unchanged attributes hidden)
    }

  # google_storage_bucket_object.calitp-composer-dags["models/staging/ntd_annual_reporting/_src.yml"] will be updated in-place
!~  resource "google_storage_bucket_object" "calitp-composer-dags" {
!~      crc32c              = "6xrw7Q==" -> (known after apply)
!~      detect_md5hash      = "u1GUGr+qHvrdM3CByD7P5g==" -> "different hash"
!~      generation          = 1751416664560138 -> (known after apply)
        id                  = "calitp-composer-data/warehouse/models/staging/ntd_annual_reporting/_src.yml"
!~      md5hash             = "u1GUGr+qHvrdM3CByD7P5g==" -> (known after apply)
        name                = "data/warehouse/models/staging/ntd_annual_reporting/_src.yml"
#        (17 unchanged attributes hidden)
    }

  # google_storage_bucket_object.calitp-composer-dags["models/staging/ntd_assets/_src.yml"] will be updated in-place
!~  resource "google_storage_bucket_object" "calitp-composer-dags" {
!~      crc32c              = "xZy+xA==" -> (known after apply)
!~      detect_md5hash      = "D87Dg9H2Ttxydrw5TCb+kw==" -> "different hash"
!~      generation          = 1751416663042995 -> (known after apply)
        id                  = "calitp-composer-data/warehouse/models/staging/ntd_assets/_src.yml"
!~      md5hash             = "D87Dg9H2Ttxydrw5TCb+kw==" -> (known after apply)
        name                = "data/warehouse/models/staging/ntd_assets/_src.yml"
#        (17 unchanged attributes hidden)
    }

  # google_storage_bucket_object.calitp-composer-dags["models/staging/ntd_funding_and_expenses/_src.yml"] will be updated in-place
!~  resource "google_storage_bucket_object" "calitp-composer-dags" {
!~      crc32c              = "Rk9vbA==" -> (known after apply)
!~      detect_md5hash      = "1oG4kAhIV+IUFXaMzCafdg==" -> "different hash"
!~      generation          = 1751416666040352 -> (known after apply)
        id                  = "calitp-composer-data/warehouse/models/staging/ntd_funding_and_expenses/_src.yml"
!~      md5hash             = "1oG4kAhIV+IUFXaMzCafdg==" -> (known after apply)
        name                = "data/warehouse/models/staging/ntd_funding_and_expenses/_src.yml"
#        (17 unchanged attributes hidden)
    }

  # google_storage_bucket_object.calitp-composer-dags["models/staging/ntd_ridership/_src.yml"] will be updated in-place
!~  resource "google_storage_bucket_object" "calitp-composer-dags" {
!~      crc32c              = "vxfhWw==" -> (known after apply)
!~      detect_md5hash      = "tIKyKSkVeHX6Hy5dMmfDkg==" -> "different hash"
!~      generation          = 1751416665043819 -> (known after apply)
        id                  = "calitp-composer-data/warehouse/models/staging/ntd_ridership/_src.yml"
!~      md5hash             = "tIKyKSkVeHX6Hy5dMmfDkg==" -> (known after apply)
        name                = "data/warehouse/models/staging/ntd_ridership/_src.yml"
#        (17 unchanged attributes hidden)
    }

  # google_storage_bucket_object.calitp-composer-dags["models/staging/ntd_safety_and_security/_src.yml"] will be updated in-place
!~  resource "google_storage_bucket_object" "calitp-composer-dags" {
!~      crc32c              = "pFHMCQ==" -> (known after apply)
!~      detect_md5hash      = "VdG5ha4mT+RoWi4UQOT35Q==" -> "different hash"
!~      generation          = 1758049886357399 -> (known after apply)
        id                  = "calitp-composer-data/warehouse/models/staging/ntd_safety_and_security/_src.yml"
!~      md5hash             = "VdG5ha4mT+RoWi4UQOT35Q==" -> (known after apply)
        name                = "data/warehouse/models/staging/ntd_safety_and_security/_src.yml"
#        (17 unchanged attributes hidden)
    }

  # google_storage_bucket_object.calitp-composer-dags["models/staging/ntd_validation/_src_api_externaltable.yml"] will be updated in-place
!~  resource "google_storage_bucket_object" "calitp-composer-dags" {
!~      crc32c              = "aVRpvA==" -> (known after apply)
!~      detect_md5hash      = "V377ulINuAVHuOmUpO696g==" -> "different hash"
!~      generation          = 1751416666842087 -> (known after apply)
        id                  = "calitp-composer-data/warehouse/models/staging/ntd_validation/_src_api_externaltable.yml"
!~      md5hash             = "V377ulINuAVHuOmUpO696g==" -> (known after apply)
        name                = "data/warehouse/models/staging/ntd_validation/_src_api_externaltable.yml"
#        (17 unchanged attributes hidden)
    }

  # google_storage_bucket_object.calitp-composer-dags["models/staging/payments/elavon/_elavon.yml"] will be updated in-place
!~  resource "google_storage_bucket_object" "calitp-composer-dags" {
!~      crc32c              = "K73NkQ==" -> (known after apply)
!~      detect_md5hash      = "F3n1bpsPdvlWZzdDllkL5g==" -> "different hash"
!~      generation          = 1751416668573349 -> (known after apply)
        id                  = "calitp-composer-data/warehouse/models/staging/payments/elavon/_elavon.yml"
!~      md5hash             = "F3n1bpsPdvlWZzdDllkL5g==" -> (known after apply)
        name                = "data/warehouse/models/staging/payments/elavon/_elavon.yml"
#        (17 unchanged attributes hidden)
    }

  # google_storage_bucket_object.calitp-composer-dags["models/staging/payments/littlepay/_littlepay.yml"] will be updated in-place
!~  resource "google_storage_bucket_object" "calitp-composer-dags" {
!~      crc32c              = "z0l1MQ==" -> (known after apply)
!~      detect_md5hash      = "pQ3Or9N/wSYHXt235de9Zw==" -> "different hash"
!~      generation          = 1751416662768513 -> (known after apply)
        id                  = "calitp-composer-data/warehouse/models/staging/payments/littlepay/_littlepay.yml"
!~      md5hash             = "pQ3Or9N/wSYHXt235de9Zw==" -> (known after apply)
        name                = "data/warehouse/models/staging/payments/littlepay/_littlepay.yml"
#        (17 unchanged attributes hidden)
    }

  # google_storage_bucket_object.calitp-composer-dags["models/staging/payments/littlepay_v3/_littlepay_v3.yml"] will be updated in-place
!~  resource "google_storage_bucket_object" "calitp-composer-dags" {
!~      crc32c              = "OLGd3Q==" -> (known after apply)
!~      detect_md5hash      = "CT1Is41WF82GxtvZlgAPfw==" -> "different hash"
!~      generation          = 1751416665701155 -> (known after apply)
        id                  = "calitp-composer-data/warehouse/models/staging/payments/littlepay_v3/_littlepay_v3.yml"
!~      md5hash             = "CT1Is41WF82GxtvZlgAPfw==" -> (known after apply)
        name                = "data/warehouse/models/staging/payments/littlepay_v3/_littlepay_v3.yml"
#        (17 unchanged attributes hidden)
    }

  # google_storage_bucket_object.calitp-composer-dags["models/staging/rt/_src_gtfs_rt_external_tables.yml"] will be updated in-place
!~  resource "google_storage_bucket_object" "calitp-composer-dags" {
!~      crc32c              = "SwYd1A==" -> (known after apply)
!~      detect_md5hash      = "Z8wuwg61jF+m3F4tDyOR9Q==" -> "different hash"
!~      generation          = 1751416664716660 -> (known after apply)
        id                  = "calitp-composer-data/warehouse/models/staging/rt/_src_gtfs_rt_external_tables.yml"
!~      md5hash             = "Z8wuwg61jF+m3F4tDyOR9Q==" -> (known after apply)
        name                = "data/warehouse/models/staging/rt/_src_gtfs_rt_external_tables.yml"
#        (17 unchanged attributes hidden)
    }

  # google_storage_bucket_object.calitp-composer-dags["models/staging/state_geoportal/_src.yml"] will be updated in-place
!~  resource "google_storage_bucket_object" "calitp-composer-dags" {
!~      crc32c              = "wL4kuQ==" -> (known after apply)
!~      detect_md5hash      = "RurA8+fnfohppuRhvXSA1w==" -> "different hash"
!~      generation          = 1751416661849873 -> (known after apply)
        id                  = "calitp-composer-data/warehouse/models/staging/state_geoportal/_src.yml"
!~      md5hash             = "RurA8+fnfohppuRhvXSA1w==" -> (known after apply)
        name                = "data/warehouse/models/staging/state_geoportal/_src.yml"
#        (17 unchanged attributes hidden)
    }

  # google_storage_bucket_object.calitp-composer-dags["models/staging/transit_database/_src_airtable.yml"] will be updated in-place
!~  resource "google_storage_bucket_object" "calitp-composer-dags" {
!~      crc32c              = "XoDIpQ==" -> (known after apply)
!~      detect_md5hash      = "L1F0Z7s3z9BVOE64Fg/zDg==" -> "different hash"
!~      generation          = 1751416665331439 -> (known after apply)
        id                  = "calitp-composer-data/warehouse/models/staging/transit_database/_src_airtable.yml"
!~      md5hash             = "L1F0Z7s3z9BVOE64Fg/zDg==" -> (known after apply)
        name                = "data/warehouse/models/staging/transit_database/_src_airtable.yml"
#        (17 unchanged attributes hidden)
    }

  # google_storage_bucket_object.calitp-composer-manifest will be updated in-place
!~  resource "google_storage_bucket_object" "calitp-composer-manifest" {
!~      content             = (sensitive value)
!~      crc32c              = "0PcBgA==" -> (known after apply)
!~      detect_md5hash      = "TuukowzlGluvU+BnYGmOkA==" -> "different hash"
!~      generation          = 1757541553964048 -> (known after apply)
        id                  = "calitp-composer-data/warehouse/target/manifest.json"
!~      md5hash             = "TuukowzlGluvU+BnYGmOkA==" -> (known after apply)
        name                = "data/warehouse/target/manifest.json"
#        (16 unchanged attributes hidden)
    }

Plan: 0 to add, 22 to change, 0 to destroy.

📝 Plan generated in Plan Terraform for Warehouse and DAG changes #640

@lauriemerrell
Copy link
Contributor

Not a formal reviewer but since we discussed offline with Erika, I took a look -- just one comment:

@erikamov erikamov force-pushed the dbt-dev-source-variables branch from 7bb0573 to 4ba7cae Compare September 15, 2025 19:40
Copy link

github-actions bot commented Sep 15, 2025

Terraform plan in iac/cal-itp-data-infra/composer/us

No changes. Your infrastructure matches the configuration.
No changes. Your infrastructure matches the configuration.

Terraform has compared your real infrastructure against your configuration
and found no differences, so no changes are needed.

📝 Plan generated in Plan Terraform for Warehouse and DAG changes #640

@erikamov erikamov force-pushed the dbt-dev-source-variables branch 3 times, most recently from 5f26ba9 to aaf0c29 Compare September 16, 2025 22:46
@erikamov
Copy link
Contributor

Not a formal reviewer but since we discussed offline with Erika, I took a look -- just one comment:

@lauriemerrell
Not sure how this line works, can you explain more? And how do you think it should be?

@erikamov erikamov force-pushed the dbt-dev-source-variables branch from aaf0c29 to 947ec58 Compare September 16, 2025 22:59
@erikamov
Copy link
Contributor

erikamov commented Sep 16, 2025

Charlie, I did some tests and changes:

✅ I added some comments to the existing variables in dbt_project.yml

✅ When I was testing locally wasn't working so I had to invert the order of the variables and it worked, we can run with --vars: poetry run dbt compile -s model_name --vars 'EXTERNAL_TABLE_SOURCE: cal-itp-data-infra'

✅ Pushed the change to Airflow Staging and run dbt_daily. It ran successfully all models without complain about the new variable

❌ Tested on Airflow Staging adding EXTERNAL_TABLE_SOURCE as env var it did not change the view to point to prod.
I think it is nice not to change there, only on command line, but.... then it means that won't work on production. Views code will always point to staging since EXTERNAL_TABLE_SOURCE is set to cal-itp-data-infra-staging in dbt_project.yml. So we still need to have EXTERNAL_TABLE_SOURCE as env var to be able to change it.

  • We need to add instructions on Readme

@lauriemerrell
Copy link
Contributor

@erikamov sorry maybe something changed or I linked the wrong line, I meant to link: https://github.com/cal-itp/data-infra/blob/main/warehouse/dbt_project.yml#L31 -- basically, I think we need to document how this new variable interacts with that existing variable that already points at a project. It looks like the comments in dbt_project.yml have been updated appropriately, not sure if that was you or Charlie.

@@ -3,7 +3,7 @@ version: 2
sources:
- name: external_gtfs_rt
description: Hive-partitioned external tables reading GTFS RT data and validation errors from GCS.
database: "{{ env_var('GOOGLE_CLOUD_PROJECT', var('GOOGLE_CLOUD_PROJECT')) }}"
database: "{{ env_var('GOOGLE_CLOUD_PROJECT', var('EXTERNAL_TABLE_SOURCE')) }}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description says that GTFS RT data is meant to be excluded from this change, won't this update mean that it applies to GTFS-RT too? Might remove this change so that GTFS-RT is excluded, or update PR description to be clear that it affects all sources

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data-pipeline-ingestion-and-modeling Ingesting, parsing and modeling data. Evan Siroky is product owner.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

dbt: allow for local warehouse development on prod external tables
3 participants