Skip to content
This repository was archived by the owner on Sep 29, 2025. It is now read-only.

Conversation

thekaveman
Copy link
Member

@thekaveman thekaveman commented Jul 24, 2025

Closes #198

What this PR does

See #196 for docs that describe pems_data architecture and usage

Introduces a caching layer for pems_data that is mostly transparent for users/callers of pems_data functionality.

The main caching interface is defined in the pems_data.cache.Cache class, which is a wrapper around a redis connection.

redis is added as a local Compose service, and the pems-cache CLI is added to help debug and inspect the cache.

pems_data.cache.Cache class

  • Establishes and manages a connection to redis
  • Defines get and set wrappers
  • Helpers to get/set DataFrames from the cache

pems_data.sources.cache.CachingDataSource class

An IDataSource implementation (see #187) that wraps another IDataSource and calls out to the cache first before performing any needed .read() operations on the underlying data source. Data returned from .read() is cached before returning.

pems_data.ServiceFactory class

With the introduction of the caching layer, the services (e.g. StationsService) need additional wiring and setup. This class wraps that setup so callers can get a service that is ready to use. This class should usually serve as the main entrypoint into pems_data for callers.

@thekaveman thekaveman self-assigned this Jul 24, 2025
Copy link

github-actions bot commented Jul 24, 2025

Coverage report

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  pems_data/src/pems_data
  __init__.py
  cache.py
  serialization.py
  pems_data/src/pems_data/services
  stations.py
  pems_data/src/pems_data/sources
  cache.py
Project Total  

This report was generated by python-coverage-comment-action

@thekaveman thekaveman force-pushed the feat/pems-data branch 2 times, most recently from bf7fa41 to ba09852 Compare July 25, 2025 22:39
@thekaveman thekaveman force-pushed the feat/pems-data-caching branch from ae76c58 to e6efa3f Compare July 26, 2025 01:32
@thekaveman thekaveman force-pushed the feat/pems-data branch 4 times, most recently from 28ccf15 to 8eee59f Compare July 28, 2025 03:36
@thekaveman thekaveman force-pushed the feat/pems-data-caching branch from e6efa3f to b155365 Compare July 28, 2025 03:39
Base automatically changed from feat/pems-data to main July 28, 2025 15:16
@thekaveman thekaveman force-pushed the feat/pems-data-caching branch 3 times, most recently from 2921d39 to 333392d Compare July 28, 2025 15:33
@thekaveman thekaveman marked this pull request as ready for review July 28, 2025 15:51
@thekaveman thekaveman requested a review from a team as a code owner July 28, 2025 15:51
@thekaveman thekaveman force-pushed the feat/pems-data-caching branch from 333392d to 826539e Compare July 28, 2025 22:30
Copy link
Member

@lalver1 lalver1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks and works great! 🎉

I was able to follow the logic and it makes sense. At first I overlooked that CachingDataSource.read() (specifically, self.cache.set_df(cache_key, df, ttl=ttl)) will set a cache key (store a result) if it does not find one when reading the first time. Since I kept running the code with the same redis container and volume, I was confused about when the cache had been set, but after realizing this, it all made sense. The cache keys only get set "on demand" as data is read, we don't cache data that a service has not asked for yet (this makes so much sense, it's essentially the whole purpose of a cache but it took me a bit to realize it 😅).

I only saw one typo below, everything else looks great!

compose.yml Outdated
redis:
image: redis:8
ports:
- "${REDIS_LOCAL_PORT:-6379}:6379"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can probably change REDIS_LOCAL_PORT to REDIS_PORT to match .env.sample and the environment variable loading in cache.py.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arg, thanks for catching this typo! I must have renamed it at some point and missed this one.

@thekaveman thekaveman force-pushed the feat/pems-data-caching branch from 826539e to ba373c0 Compare July 29, 2025 17:14
uses an underlying redis connection
$ pems-cache -h
usage: pems-cache [-h] [--key KEY] [--value VALUE] [{check,get,set}]

Simple CLI for the cache

positional arguments:
  {check,get,set}       the operation to perform

options:
  -h, --help            show this help message and exit
  --key/-k KEY          the item's key, required for get/set
  --value/-v VALUE      the item's value, required for set
- accept a host and port parameter
- pass additional kwargs to the redis.Redis() instance
- catch redis.ConnectionError and return None
we don't need to connect to the cache backend immediately, delay that
internally via a helper _connect() function

for is_available, get, and set -- ensure the backend is connected
and available before attempting to use
convert DataFrames to and from Arrow IPC bytes
use the DataFrame serialization functions as mutators
so we don't have to implement the get/check/set dance every time
@thekaveman thekaveman force-pushed the feat/pems-data-caching branch from ba373c0 to 1f4d38e Compare July 29, 2025 17:15
@thekaveman thekaveman requested a review from lalver1 July 29, 2025 17:15
Copy link
Member

@lalver1 lalver1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@thekaveman thekaveman merged commit 0efc880 into main Jul 29, 2025
9 checks passed
@thekaveman thekaveman deleted the feat/pems-data-caching branch July 29, 2025 19:23
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement basic caching layer in pems_data code
2 participants