Testing data repo #735
tobeycarman
started this conversation in
Ideas
Replies: 2 comments
-
|
Beta Was this translation helpful? Give feedback.
0 replies
-
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Problem
The testing data that we need, both for ecological testing and software testing, are too large to reasonably store in the repository. We currently have ~350MB of testing data (see
testing-data/
directory) but it is easy to envision accumulating up to several GB of test data. Keeping this checked into the repository would make cloning the repo unduly expensive in many situations (both for end users and in CI/CD pipelines).Up until Aug 2024, we have been experiment with Git-LFS for this data, but we reached the bandwidth limit and need to find a different solution.
Notes
git-LFS, git-annex are lower level than DVC and DataLad
DVC is machine learning specific, DataLad is geared toward researchers storing (and sharing) data
git-annex uses symlinks. so the symlinks are tracked in version control but the data is elsewhere (S3, etc)
DVC uses a content addressable structure - the .dvc files are tracked, and each .dvc file has a hash in it. Then the data is stored in a set of folders based on the hashes. This data can be anywhere - on your local machine, or in the cloud
Looking at the DVC data in the cloud it is slightly confusing because it is in this content addressable storage format.
DataLad is built on Git and Git-annex
DVC is as much about pipelines and DAGs as it is about tracking data. It tracks and versions data, but really in the interest of building pipelines that generate and use the data.
https://stackoverflow.com/questions/10276604/where-in-repository-to-store-test-data
https://softwareengineering.stackexchange.com/questions/264925/where-should-i-store-test-data
Beta Was this translation helpful? Give feedback.
All reactions