This repository is part of a comprehensive data pipeline process, serving as the final step that incorporates processes from mapdata_fips_aggregation and mapdata_finalization_notebooks. The output feeds into a proof of concept mapping application: mapapp_cloudfront_streaming.
The notebooks follow a consistent pattern across different states (MD, TX, VA, IL) to prepare data for interactive map visualizations:
- Data Import & Setup: Uses pandas, numpy, shapefile libraries, and JSON for data manipulation
- Geospatial Point Mapping: Each loan entry gets paired with its shapely.geometry point based on latitude/longitude coordinates
- Geographic Aggregation: Processes data at multiple geographic levels by state, county, block group, and block for usage in chloropleth data visualizations
- Financial Data Processing: Aggregates loan/lending metrics including:
- Loan counts and approval amounts
- Jobs reported and lender information
- Profit and forgiveness amounts
- Data Standardization: Converts data types, rounds financial values, and formats output
- Export: Outputs processed data to standardized CSV files in state-specific directory structures
- Multi-state coverage: Processes data for Maryland, Texas, Virginia, and Illinois as POC examples (all 50 states + territories data is available)
- FIPS-based aggregation: Uses federal geographic coding for consistent geographic referencing
- Geospatial mapping: Shapely.geometry point pairing enables both individual loan visualization on maps and assignment of loans to specific geographic regions (counties, block groups, etc.)
- Interactive visualization support: Financial aggregations work in tandem with hierarchical geography in the map application, enabling filtering and chloropleth range displays by selected geographic view levels
- Dual data types: Handles both geographic (shapefile) and financial datasets
- Standardized output structure: Exports to organized directory paths like
../data/state_data/geo/state_agg/{state_code}/
The repository creates the final data layer for an interactive mapping application that allows users to visualize loan data across different geographic scales with dynamic filtering capabilities.