Skip to content
This repository was archived by the owner on Sep 29, 2025. It is now read-only.
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 16 additions & 2 deletions .github/workflows/mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -36,11 +36,13 @@ jobs:
with:
python-version-file: .github/workflows/.python-version
cache: pip
cache-dependency-path: "docs/requirements.txt"
cache-dependency-path: |
"**/pyproject.toml"
"docs/requirements.txt"

- name: Build MkDocs website
run: |
pip install -r docs/requirements.txt
pip install ./pems_data -r docs/requirements.txt
mkdocs build

- name: Install Netlify CLI
Expand Down Expand Up @@ -87,6 +89,18 @@ jobs:
- name: Checkout
uses: actions/checkout@v4

- uses: actions/setup-python@v5
with:
python-version-file: .github/workflows/.python-version
cache: pip
cache-dependency-path: |
"**/pyproject.toml"
"docs/requirements.txt"

- name: Install local pacakges
run: |
pip install ./pems_data

- name: Deploy docs
uses: mhausenblas/mkdocs-deploy-gh-pages@master
env:
Expand Down
61 changes: 61 additions & 0 deletions docs/development/pems_data/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# Introduction

The `pems_data` library provides a standardized, efficient interface for accessing Caltrans PeMS data within the project. It handles fetching data from the primary S3 data source and leverages a Redis-based caching layer to optimize performance for repeated requests.

This guide covers the specific setup and usage patterns for this library. For general development environment setup, please see the main [Getting started with development guide](../README.md).

## Prerequisites

Before using the library, ensure your environment is configured correctly.

### AWS credentials

The library's S3 data source requires AWS credentials to be available. The devcontainer is configured to use your host machine's AWS configuration. For details on setting this up via `aws configure sso`, please refer to the [Work with the Cloud infrastructure section in the main development guide](../README.md#work-with-the-cloud-infrastructure).

### Redis connection

A running Redis instance is required for the caching layer to function. The connection is configured with the following environment variables, which you can set in the `.env` file at the root of the project:

```env
# The hostname for the Redis server
REDIS_HOSTNAME=redis

# The port for the Redis server.
REDIS_PORT=6379
```

When running locally in the devcontainer, a `redis` service is started by Compose automatically.

## Architecture & Key concepts

The library is built around a few core components that work together to provide a simple data access experience.

- [`ServiceFactory`](./reference/service-factory.md): This is the primary entry point for using the library. It is a factory class that instantiates and wires together all the necessary dependencies, such as the data sources and caching clients.

- [**Services**](./reference/services.md): Services offer a high-level API for fetching specific, business-relevant data. For example, the `StationsService` has methods to get all station metadata for a given district or to retrieve 5-minute aggregated data for a specific station.

- [**Caching layer**](./reference/caching-layer.md): To minimize latency and load on the data source, the library uses a caching decorator by default. When a data request is made, this layer first checks the Redis cache for the requested data. If the data is not found (a cache miss), it retrieves the data from the underlying S3 source and stores it in the cache for future requests.

- [**Data sources**](./reference/data-sources.md): The underlying data source reads data directly from Parquet files stored in the Caltrans S3 bucket.

## Basic usage

Using the library involves creating the factory, getting a service, and calling a data-fetching method. The factory handles the underlying complexity of connecting to the data source and cache.

```python
from pems_data import ServiceFactory

# 1. Create the factory. This wires up all dependencies.
factory = ServiceFactory()

# 2. Request a pre-configured service.
stations_service = factory.stations_service()

# 3. Use the service to fetch data as a pandas DataFrame.
# This call will attempt to read from the cache first before
# falling back to the S3 data source.
district_7_metadata = stations_service.get_district_metadata(district_number="7")

print("Successfully fetched metadata for District 7:")
print(district_7_metadata.head())
```
64 changes: 64 additions & 0 deletions docs/development/pems_data/cli.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
# `pems-cache` CLI

The `pems_data` package includes `pems-cache`, a simple command-line tool for interacting directly with the Redis cache. It's useful for debugging cache issues or manually inspecting and setting values.

## Commands

The CLI supports three main operations:

- `check`
- `get`
- `set`

If you run `pems-cache` with no operation, it defaults to `check`.

### `check`

Verifies that a connection to the Redis server can be established and that the cache is responsive.

#### Usage

```shell
pems-cache check
```

#### Example output

```shell
$ pems-cache check
cache is available: True
```

### `get`

Retrieves and displays a value from the cache based on its key. The `--key` (or `-k`) argument is required.

#### Usage

```shell
pems-cache get --key <cache-key>
```

#### Example output

```shell
$ pems-cache get --key "stations:metadata:district:7"
[stations:metadata:district:7]: b'\x01\x00\x00\x00\xff\xff...'
```

### `set`

Sets a string value for a given key in the cache. Both the `--key` (`-k`) and `--value` (`-v`) arguments are required.

#### Usage

```shell
pems-cache set --key <cache-key> --value <cache-value>
```

#### Example output

```shell
$ pems-cache set -k "my:test:key" -v "hello from the cli"
[my:test:key] = 'hello from the cli'
```
7 changes: 7 additions & 0 deletions docs/development/pems_data/reference/caching-layer.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Caching layer

The caching layer wraps a backing [redis](https://redis.io/docs/latest/) service and provides a simple, focused interface to its usage.

::: pems_data.cache

::: pems_data.serialization
9 changes: 9 additions & 0 deletions docs/development/pems_data/reference/data-sources.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Data sources

The data source components are responsible for the actual reading of data (the "how"). The design uses an abstract interface, `IDataSource`, to define a standard contract for any data source, making it easy to swap and compose implementations.

::: pems_data.sources.IDataSource
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also go to the top module level:

::: pems_data.sources

::: pems_data.sources.s3

::: pems_data.sources.cache

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can come back to this? I agree it would make more sense to link to the module level if e.g. there were more classes / helper functions etc. in those modules.


::: pems_data.sources.s3.S3DataSource

::: pems_data.sources.cache.CachingDataSource
3 changes: 3 additions & 0 deletions docs/development/pems_data/reference/service-factory.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Service factory

::: pems_data.ServiceFactory
5 changes: 5 additions & 0 deletions docs/development/pems_data/reference/services.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Services

The services represent the business-logic of "what" data to fetch for specific use-cases. Services require an underlying data source to perform the actual reading of data.

::: pems_data.services.stations.StationsService
2 changes: 2 additions & 0 deletions docs/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,5 @@ mkdocs-awesome-pages-plugin
mkdocs-macros-plugin
mkdocs-material==9.6.16
mkdocs-redirects
mkdocstrings
mkdocstrings-python
12 changes: 10 additions & 2 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,14 @@ extra:
plugins:
- search
- awesome-pages
- mkdocstrings:
handlers:
python:
options:
show_root_heading: true
show_root_toc_entry: false
show_symbol_type_heading: true
show_symbol_type_toc: true
- redirects:
redirect_maps:

Expand All @@ -36,8 +44,8 @@ markdown_extensions:
- meta
- pymdownx.details
- pymdownx.emoji:
emoji_index: !!python/name:materialx.emoji.twemoji
emoji_generator: !!python/name:materialx.emoji.to_svg
emoji_index: !!python/name:material.extensions.emoji.twemoji
emoji_generator: !!python/name:material.extensions.emoji.to_svg
- pymdownx.inlinehilite
- pymdownx.smartsymbols
- pymdownx.superfences:
Expand Down
39 changes: 34 additions & 5 deletions pems_data/src/pems_data/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,40 @@ class ServiceFactory:
Shared dependencies are created once during initialization.
"""

@property
def cache(self) -> Cache:
"""
Returns:
value (pems_data.cache.Cache): The shared Cache instance managed by this factory.
"""
return self._cache

@property
def s3_source(self) -> S3DataSource:
"""
Returns:
value (pems_data.sources.s3.S3DataSource): The shared S3DataSource instance managed by this factory.
"""
return self._s3_source

@property
def caching_s3_source(self) -> CachingDataSource:
"""
Returns:
value (pems_data.sources.cache.CachingDataSource): The shared CachingDataSource instance managed by this factory.
"""
return self._caching_s3_source

def __init__(self):
self.cache = Cache()
self.s3_source = S3DataSource()
self.caching_s3_source = CachingDataSource(data_source=self.s3_source, cache=self.cache)
"""Initializes a new ServiceFactory and shared dependencies."""
self._cache = Cache()
self._s3_source = S3DataSource()
self._caching_s3_source = CachingDataSource(data_source=self._s3_source, cache=self._cache)

def stations_service(self) -> StationsService:
"""Creates a fully-configured `StationsService`."""
return StationsService(data_source=self.caching_s3_source)
"""Creates a fully-configured StationsService.

Returns:
value (pems_data.services.stations.StationsService): A StationsService instance configured by the factory.
"""
return StationsService(data_source=self._caching_s3_source)
Loading
Loading