Skip to content

DarkStarStrix/DataVolt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

90 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DataVolt: Modular Enterprise Data Engineering Framework

DataVolt Logo

Coverage License Version Python PyPI CI

Overview

DataVolt is an enterprise-grade framework for building and maintaining scalable data engineering pipelines. It provides a comprehensive suite of tools for data ingestion, transformation, and processing, enabling organizations to standardize their data operations and accelerate development cycles.

Modular VoltModule Architecture

At the core of DataVolt is the concept of VoltModules: modular, domain-scoped directories (mini_dirs) that encapsulate a single use case or data engineering workflow. Each VoltModule follows a consistent internal structure and pattern, making it easy to:

  • Reuse, extend, or compose modules for new domains or projects
  • Standardize data engineering practices across teams
  • Rapidly spin up new pipelines by combining or customizing VoltModules

VoltModules can cover a wide range of data engineering needs—from market analysis to tokenization, feature engineering, and beyond. The repository provides a rich set of ready-to-use modules, and you can easily add your own or extend existing ones.

Repository Structure

Note: The structure below is an illustrative example of how DataVolt is organized around VoltModules and shared utilities. Your actual repository may differ. To view your current structure, use a tool like tree or ls in your project root.

DataVolt/
├── modules/                # Collection of VoltModules (domain-specific mini_dirs)
│   ├── market_analysis/    # Example VoltModule: Market Analysis
│   │   ├── __init__.py
│   │   └── ...             # Module-specific logic
│   ├── tokenization/       # Example VoltModule: Tokenization
│   │   ├── __init__.py
│   │   └── ...
│   └── ...                 # Add or extend VoltModules as needed
├── loaders/                # Data Ingestion Layer (shared utilities)
│   ├── __init__.py
│   └── ...
├── preprocess/             # Data Transformation Layer (shared utilities)
│   ├── __init__.py
│   └── ...
├── ext/                    # Extension Layer (logging, custom steps, etc.)
│   ├── logger.py
│   └── ...
└── ...
  • modules/: Houses all VoltModules, each in its own directory, following a common pattern.
  • loaders/, preprocess/, ext/: Provide shared utilities and frameworks for use within VoltModules or standalone.

Key Features

  • VoltModules: Modular, domain-scoped, and reusable mini_dirs for any data engineering use case
  • Rapid Customization: Add, extend, or compose modules to fit evolving requirements
  • Standardization: Consistent patterns and internal structure across all modules
  • Comprehensive Toolkit: Everything needed for data engineering, from ingestion to advanced analytics

Installation

pip install datavolt

Or with uv:

uv install datavolt

Quick Start

Using a VoltModule

from datavolt.modules.market_analysis import MarketAnalysisModule

module = MarketAnalysisModule(config={...})
result = module.run()

Building Your Own VoltModule

  1. Create a new directory under modules/ (e.g., my_use_case/)
  2. Add an __init__.py and implement your logic following the VoltModule pattern
  3. Import and use your module as needed

Example: Data Ingestion and Transformation

from datavolt.loaders.csv_loader import CSVLoader
from datavolt.preprocess.pipeline import PreprocessingPipeline

loader = CSVLoader(file_path="data.csv")
dataset = loader.load()

pipeline = PreprocessingPipeline([...])
processed_dataset = pipeline.run(dataset)

Extending DataVolt

  • Add new VoltModules for new domains or workflows
  • Plug in tools (e.g., new loaders, preprocessors) into existing modules
  • Compose modules to build complex pipelines

Use Cases

  • Market analysis, tokenization, and domain-specific analytics
  • Standardized, reproducible data preprocessing
  • Scalable machine learning and feature engineering pipelines
  • Integration with cloud, SQL, and ML frameworks

Contributing

We welcome contributions! To add a new VoltModule or extend the framework:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/my-module)
  3. Add your module under modules/ and follow the VoltModule pattern
  4. Commit and push your changes
  5. Open a Pull Request

License

DataVolt is distributed under the MIT License. See LICENSE for details.

Support


DataVolt: Empowering Modular Data Engineering Excellence