Skip to content
This repository was archived by the owner on Sep 29, 2025. It is now read-only.

Commit 44e9242

Browse files
authored
Docs: pems_data architecture and usage (#196)
2 parents 0efc880 + 1cb42c9 commit 44e9242

File tree

18 files changed

+444
-81
lines changed

18 files changed

+444
-81
lines changed

.github/workflows/mkdocs.yml

Lines changed: 16 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -36,11 +36,13 @@ jobs:
3636
with:
3737
python-version-file: .github/workflows/.python-version
3838
cache: pip
39-
cache-dependency-path: "docs/requirements.txt"
39+
cache-dependency-path: |
40+
"**/pyproject.toml"
41+
"docs/requirements.txt"
4042
4143
- name: Build MkDocs website
4244
run: |
43-
pip install -r docs/requirements.txt
45+
pip install ./pems_data -r docs/requirements.txt
4446
mkdocs build
4547
4648
- name: Install Netlify CLI
@@ -87,6 +89,18 @@ jobs:
8789
- name: Checkout
8890
uses: actions/checkout@v4
8991

92+
- uses: actions/setup-python@v5
93+
with:
94+
python-version-file: .github/workflows/.python-version
95+
cache: pip
96+
cache-dependency-path: |
97+
"**/pyproject.toml"
98+
"docs/requirements.txt"
99+
100+
- name: Install local pacakges
101+
run: |
102+
pip install ./pems_data
103+
90104
- name: Deploy docs
91105
uses: mhausenblas/mkdocs-deploy-gh-pages@master
92106
env:
Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
# Introduction
2+
3+
The `pems_data` library provides a standardized, efficient interface for accessing Caltrans PeMS data within the project. It handles fetching data from the primary S3 data source and leverages a Redis-based caching layer to optimize performance for repeated requests.
4+
5+
This guide covers the specific setup and usage patterns for this library. For general development environment setup, please see the main [Getting started with development guide](../README.md).
6+
7+
## Prerequisites
8+
9+
Before using the library, ensure your environment is configured correctly.
10+
11+
### AWS credentials
12+
13+
The library's S3 data source requires AWS credentials to be available. The devcontainer is configured to use your host machine's AWS configuration. For details on setting this up via `aws configure sso`, please refer to the [Work with the Cloud infrastructure section in the main development guide](../README.md#work-with-the-cloud-infrastructure).
14+
15+
### Redis connection
16+
17+
A running Redis instance is required for the caching layer to function. The connection is configured with the following environment variables, which you can set in the `.env` file at the root of the project:
18+
19+
```env
20+
# The hostname for the Redis server
21+
REDIS_HOSTNAME=redis
22+
23+
# The port for the Redis server.
24+
REDIS_PORT=6379
25+
```
26+
27+
When running locally in the devcontainer, a `redis` service is started by Compose automatically.
28+
29+
## Architecture & Key concepts
30+
31+
The library is built around a few core components that work together to provide a simple data access experience.
32+
33+
- [`ServiceFactory`](./reference/service-factory.md): This is the primary entry point for using the library. It is a factory class that instantiates and wires together all the necessary dependencies, such as the data sources and caching clients.
34+
35+
- [**Services**](./reference/services.md): Services offer a high-level API for fetching specific, business-relevant data. For example, the `StationsService` has methods to get all station metadata for a given district or to retrieve 5-minute aggregated data for a specific station.
36+
37+
- [**Caching layer**](./reference/caching-layer.md): To minimize latency and load on the data source, the library uses a caching decorator by default. When a data request is made, this layer first checks the Redis cache for the requested data. If the data is not found (a cache miss), it retrieves the data from the underlying S3 source and stores it in the cache for future requests.
38+
39+
- [**Data sources**](./reference/data-sources.md): The underlying data source reads data directly from Parquet files stored in the Caltrans S3 bucket.
40+
41+
## Basic usage
42+
43+
Using the library involves creating the factory, getting a service, and calling a data-fetching method. The factory handles the underlying complexity of connecting to the data source and cache.
44+
45+
```python
46+
from pems_data import ServiceFactory
47+
48+
# 1. Create the factory. This wires up all dependencies.
49+
factory = ServiceFactory()
50+
51+
# 2. Request a pre-configured service.
52+
stations_service = factory.stations_service()
53+
54+
# 3. Use the service to fetch data as a pandas DataFrame.
55+
# This call will attempt to read from the cache first before
56+
# falling back to the S3 data source.
57+
district_7_metadata = stations_service.get_district_metadata(district_number="7")
58+
59+
print("Successfully fetched metadata for District 7:")
60+
print(district_7_metadata.head())
61+
```

docs/development/pems_data/cli.md

Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
# `pems-cache` CLI
2+
3+
The `pems_data` package includes `pems-cache`, a simple command-line tool for interacting directly with the Redis cache. It's useful for debugging cache issues or manually inspecting and setting values.
4+
5+
## Commands
6+
7+
The CLI supports three main operations:
8+
9+
- `check`
10+
- `get`
11+
- `set`
12+
13+
If you run `pems-cache` with no operation, it defaults to `check`.
14+
15+
### `check`
16+
17+
Verifies that a connection to the Redis server can be established and that the cache is responsive.
18+
19+
#### Usage
20+
21+
```shell
22+
pems-cache check
23+
```
24+
25+
#### Example output
26+
27+
```shell
28+
$ pems-cache check
29+
cache is available: True
30+
```
31+
32+
### `get`
33+
34+
Retrieves and displays a value from the cache based on its key. The `--key` (or `-k`) argument is required.
35+
36+
#### Usage
37+
38+
```shell
39+
pems-cache get --key <cache-key>
40+
```
41+
42+
#### Example output
43+
44+
```shell
45+
$ pems-cache get --key "stations:metadata:district:7"
46+
[stations:metadata:district:7]: b'\x01\x00\x00\x00\xff\xff...'
47+
```
48+
49+
### `set`
50+
51+
Sets a string value for a given key in the cache. Both the `--key` (`-k`) and `--value` (`-v`) arguments are required.
52+
53+
#### Usage
54+
55+
```shell
56+
pems-cache set --key <cache-key> --value <cache-value>
57+
```
58+
59+
#### Example output
60+
61+
```shell
62+
$ pems-cache set -k "my:test:key" -v "hello from the cli"
63+
[my:test:key] = 'hello from the cli'
64+
```
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
# Caching layer
2+
3+
The caching layer wraps a backing [redis](https://redis.io/docs/latest/) service and provides a simple, focused interface to its usage.
4+
5+
::: pems_data.cache
6+
7+
::: pems_data.serialization
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
# Data sources
2+
3+
The data source components are responsible for the actual reading of data (the "how"). The design uses an abstract interface, `IDataSource`, to define a standard contract for any data source, making it easy to swap and compose implementations.
4+
5+
::: pems_data.sources.IDataSource
6+
7+
::: pems_data.sources.s3.S3DataSource
8+
9+
::: pems_data.sources.cache.CachingDataSource
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# Service factory
2+
3+
::: pems_data.ServiceFactory
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
# Services
2+
3+
The services represent the business-logic of "what" data to fetch for specific use-cases. Services require an underlying data source to perform the actual reading of data.
4+
5+
::: pems_data.services.stations.StationsService

docs/requirements.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,3 +4,5 @@ mkdocs-awesome-pages-plugin
44
mkdocs-macros-plugin
55
mkdocs-material==9.6.16
66
mkdocs-redirects
7+
mkdocstrings
8+
mkdocstrings-python

mkdocs.yml

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,14 @@ extra:
2020
plugins:
2121
- search
2222
- awesome-pages
23+
- mkdocstrings:
24+
handlers:
25+
python:
26+
options:
27+
show_root_heading: true
28+
show_root_toc_entry: false
29+
show_symbol_type_heading: true
30+
show_symbol_type_toc: true
2331
- redirects:
2432
redirect_maps:
2533

@@ -36,8 +44,8 @@ markdown_extensions:
3644
- meta
3745
- pymdownx.details
3846
- pymdownx.emoji:
39-
emoji_index: !!python/name:materialx.emoji.twemoji
40-
emoji_generator: !!python/name:materialx.emoji.to_svg
47+
emoji_index: !!python/name:material.extensions.emoji.twemoji
48+
emoji_generator: !!python/name:material.extensions.emoji.to_svg
4149
- pymdownx.inlinehilite
4250
- pymdownx.smartsymbols
4351
- pymdownx.superfences:

pems_data/src/pems_data/__init__.py

Lines changed: 34 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -11,11 +11,40 @@ class ServiceFactory:
1111
Shared dependencies are created once during initialization.
1212
"""
1313

14+
@property
15+
def cache(self) -> Cache:
16+
"""
17+
Returns:
18+
value (pems_data.cache.Cache): The shared Cache instance managed by this factory.
19+
"""
20+
return self._cache
21+
22+
@property
23+
def s3_source(self) -> S3DataSource:
24+
"""
25+
Returns:
26+
value (pems_data.sources.s3.S3DataSource): The shared S3DataSource instance managed by this factory.
27+
"""
28+
return self._s3_source
29+
30+
@property
31+
def caching_s3_source(self) -> CachingDataSource:
32+
"""
33+
Returns:
34+
value (pems_data.sources.cache.CachingDataSource): The shared CachingDataSource instance managed by this factory.
35+
"""
36+
return self._caching_s3_source
37+
1438
def __init__(self):
15-
self.cache = Cache()
16-
self.s3_source = S3DataSource()
17-
self.caching_s3_source = CachingDataSource(data_source=self.s3_source, cache=self.cache)
39+
"""Initializes a new ServiceFactory and shared dependencies."""
40+
self._cache = Cache()
41+
self._s3_source = S3DataSource()
42+
self._caching_s3_source = CachingDataSource(data_source=self._s3_source, cache=self._cache)
1843

1944
def stations_service(self) -> StationsService:
20-
"""Creates a fully-configured `StationsService`."""
21-
return StationsService(data_source=self.caching_s3_source)
45+
"""Creates a fully-configured StationsService.
46+
47+
Returns:
48+
value (pems_data.services.stations.StationsService): A StationsService instance configured by the factory.
49+
"""
50+
return StationsService(data_source=self._caching_s3_source)

0 commit comments

Comments
 (0)