This repository was archived by the owner on Sep 29, 2025. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 1
Docs: pems_data
architecture and usage
#196
Merged
Merged
Changes from all commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
83163fb
feat(docs): install and configure mkdocstrings plugin
thekaveman e4d5a18
docs(pems_data): intro the pacakge, basic usage
thekaveman c736215
docs(pems_data): pems-cache CLI reference
thekaveman eb6404b
docs(pems_data): reference for ServiceFactory
thekaveman 3769a58
docs(pems_data): reference for StationsService
thekaveman da32527
docs(pems_data): reference for caching layer
thekaveman b18cb9b
docs(pems_data): reference for data sources
thekaveman c4636df
refactor(docs): rename api to reference
thekaveman eb583fa
docs(pems_data): link to reference sections
thekaveman 9592597
fix(ci): install local packages before mkdocs build
thekaveman 1cb42c9
fix(docs): replace deprecated materialx.emoji.twemoji config
thekaveman File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,61 @@ | ||
# Introduction | ||
|
||
The `pems_data` library provides a standardized, efficient interface for accessing Caltrans PeMS data within the project. It handles fetching data from the primary S3 data source and leverages a Redis-based caching layer to optimize performance for repeated requests. | ||
|
||
This guide covers the specific setup and usage patterns for this library. For general development environment setup, please see the main [Getting started with development guide](../README.md). | ||
|
||
## Prerequisites | ||
|
||
Before using the library, ensure your environment is configured correctly. | ||
|
||
### AWS credentials | ||
|
||
The library's S3 data source requires AWS credentials to be available. The devcontainer is configured to use your host machine's AWS configuration. For details on setting this up via `aws configure sso`, please refer to the [Work with the Cloud infrastructure section in the main development guide](../README.md#work-with-the-cloud-infrastructure). | ||
|
||
### Redis connection | ||
|
||
A running Redis instance is required for the caching layer to function. The connection is configured with the following environment variables, which you can set in the `.env` file at the root of the project: | ||
|
||
```env | ||
# The hostname for the Redis server | ||
REDIS_HOSTNAME=redis | ||
|
||
# The port for the Redis server. | ||
REDIS_PORT=6379 | ||
``` | ||
|
||
When running locally in the devcontainer, a `redis` service is started by Compose automatically. | ||
|
||
## Architecture & Key concepts | ||
|
||
The library is built around a few core components that work together to provide a simple data access experience. | ||
|
||
- [`ServiceFactory`](./reference/service-factory.md): This is the primary entry point for using the library. It is a factory class that instantiates and wires together all the necessary dependencies, such as the data sources and caching clients. | ||
|
||
- [**Services**](./reference/services.md): Services offer a high-level API for fetching specific, business-relevant data. For example, the `StationsService` has methods to get all station metadata for a given district or to retrieve 5-minute aggregated data for a specific station. | ||
|
||
- [**Caching layer**](./reference/caching-layer.md): To minimize latency and load on the data source, the library uses a caching decorator by default. When a data request is made, this layer first checks the Redis cache for the requested data. If the data is not found (a cache miss), it retrieves the data from the underlying S3 source and stores it in the cache for future requests. | ||
|
||
- [**Data sources**](./reference/data-sources.md): The underlying data source reads data directly from Parquet files stored in the Caltrans S3 bucket. | ||
|
||
## Basic usage | ||
|
||
Using the library involves creating the factory, getting a service, and calling a data-fetching method. The factory handles the underlying complexity of connecting to the data source and cache. | ||
|
||
```python | ||
from pems_data import ServiceFactory | ||
|
||
# 1. Create the factory. This wires up all dependencies. | ||
factory = ServiceFactory() | ||
|
||
# 2. Request a pre-configured service. | ||
stations_service = factory.stations_service() | ||
|
||
# 3. Use the service to fetch data as a pandas DataFrame. | ||
# This call will attempt to read from the cache first before | ||
# falling back to the S3 data source. | ||
district_7_metadata = stations_service.get_district_metadata(district_number="7") | ||
|
||
print("Successfully fetched metadata for District 7:") | ||
print(district_7_metadata.head()) | ||
``` |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,64 @@ | ||
# `pems-cache` CLI | ||
|
||
The `pems_data` package includes `pems-cache`, a simple command-line tool for interacting directly with the Redis cache. It's useful for debugging cache issues or manually inspecting and setting values. | ||
|
||
## Commands | ||
|
||
The CLI supports three main operations: | ||
|
||
- `check` | ||
- `get` | ||
- `set` | ||
|
||
If you run `pems-cache` with no operation, it defaults to `check`. | ||
|
||
### `check` | ||
|
||
Verifies that a connection to the Redis server can be established and that the cache is responsive. | ||
|
||
#### Usage | ||
|
||
```shell | ||
pems-cache check | ||
``` | ||
|
||
#### Example output | ||
|
||
```shell | ||
$ pems-cache check | ||
cache is available: True | ||
``` | ||
|
||
### `get` | ||
|
||
Retrieves and displays a value from the cache based on its key. The `--key` (or `-k`) argument is required. | ||
|
||
#### Usage | ||
|
||
```shell | ||
pems-cache get --key <cache-key> | ||
``` | ||
|
||
#### Example output | ||
|
||
```shell | ||
$ pems-cache get --key "stations:metadata:district:7" | ||
[stations:metadata:district:7]: b'\x01\x00\x00\x00\xff\xff...' | ||
``` | ||
|
||
### `set` | ||
|
||
Sets a string value for a given key in the cache. Both the `--key` (`-k`) and `--value` (`-v`) arguments are required. | ||
|
||
#### Usage | ||
|
||
```shell | ||
pems-cache set --key <cache-key> --value <cache-value> | ||
``` | ||
|
||
#### Example output | ||
|
||
```shell | ||
$ pems-cache set -k "my:test:key" -v "hello from the cli" | ||
[my:test:key] = 'hello from the cli' | ||
``` |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
# Caching layer | ||
|
||
The caching layer wraps a backing [redis](https://redis.io/docs/latest/) service and provides a simple, focused interface to its usage. | ||
|
||
::: pems_data.cache | ||
|
||
::: pems_data.serialization |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
# Data sources | ||
|
||
The data source components are responsible for the actual reading of data (the "how"). The design uses an abstract interface, `IDataSource`, to define a standard contract for any data source, making it easy to swap and compose implementations. | ||
|
||
::: pems_data.sources.IDataSource | ||
|
||
::: pems_data.sources.s3.S3DataSource | ||
|
||
::: pems_data.sources.cache.CachingDataSource |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
# Service factory | ||
|
||
::: pems_data.ServiceFactory |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
# Services | ||
|
||
The services represent the business-logic of "what" data to fetch for specific use-cases. Services require an underlying data source to perform the actual reading of data. | ||
|
||
::: pems_data.services.stations.StationsService |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also go to the top
module
level:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can come back to this? I agree it would make more sense to link to the module level if e.g. there were more classes / helper functions etc. in those modules.