OATFLAKE

Introduction

OATFLAKE is a no-code interface framework designed as a submodule of the BLOB browser. It enables community-governed intelligence training based on Retrieval-Augmented Generation (RAG). This repository serves as an easy-to-setup beta testing and development space for contributors to the system.

The unique value of OATFLAKE lies in its flexibility and autonomy. The backend can run entirely locally, without requiring external APIs or web access, by leveraging Ollama and local folder files. Additionally, it supports web scraping to gather resources and integrate them into the local vector space using FAISS. Many components are built with LangChain, and OpenRouter is included as an API for extended functionality.

OATFLAKE empowers communities, such as research groups and collectives, to maintain their local intelligence systems and easily swap out models. While currently tailored for extracting methods, definitions, resources, and materials, the system is evolving to support the collection and analysis of any type of text data. The framework is built with modularity in mind, offering small building blocks and adapters to handle diverse file inputs.

Our vision is to make as much of the system customizable through the interface as possible, enabling communities to adapt it to their unique needs without requiring coding expertise.

Features

Knowledge processing and extraction
Goal-based analysis
Slack integration for communication
Vector-based search capabilities
API endpoints for various functionalities
Web-based user interface

Architecture

OATFLAKE follows a modular architecture designed to provide flexibility and extensibility:

Core Components

FastAPI Backend: Powers all API endpoints and server-side operations through organized route handlers
Web Interface: Modern, responsive JavaScript frontend with Tailwind CSS for interacting with the system
RAG Pipeline: End-to-end pipeline for document processing, analysis, and retrieval
Vector Storage: FAISS-based vector storage for efficient similarity searches
Integrations: Connections to external systems (Slack, OpenRouter, Ollama, etc.)

Data Flow

Input Sources → Documents uploaded or URLs provided
Content Processing → Text extraction and chunking using format-specific processors
Analysis → LLM-powered analysis with entity extraction via MainProcessor
Embedding Generation → Vector embedding creation through local or remote models
Storage → Persistence to vector stores and databases with incremental updates
Retrieval → Context-sensitive document retrieval via FAISS similarity search
Generation → LLM-augmented response creation with local or cloud-based models

Key Subsystems

Analysis Engine

Orchestrates document processing through MainProcessor
Implements level-based URL discovery for breadth-first processing
Supports batched resource processing to prevent memory issues
Provides interruptible LLM functionality for long-running tasks
Extracts entities, methods, definitions, and other structured data

LLM Integration

Abstracts model interactions through unified interfaces
Supports local models via Ollama for privacy and cost efficiency
Integrates with OpenRouter for access to powerful cloud models
Provides embedding generation for vector search capabilities
Implements configurable model parameters and context handling

Data Processing

Handles diverse document formats (PDF, Markdown, HTML, etc.)
Implements intelligent chunking strategies optimized for different hardware
Preserves document structure and metadata during processing
Generates efficient vector embeddings for similarity search

Services Layer

Manages scheduled training and knowledge processing
Handles system-wide configuration and settings
Provides background tasks and automated operations
Implements caching and optimization strategies

Frontend System

Built with modern JavaScript and Tailwind CSS
Features modular widget architecture for extensibility
Provides interactive UI components for data visualization
Implements responsive design for all device sizes

API Endpoints

Organized into logical domains (knowledge, goals, analysis)
Implements RESTful patterns for consistent interaction
Provides authentication and permission management
Offers comprehensive system management capabilities

Prerequisites

Python 3.10 or higher
Poetry (for dependency management)
Ollama (optional, for local model hosting)
Slack workspace (for Slack integration)
Supabase account (for data storage)
OpenRouter account (for model access)

Installation

1. Clone the repository

git clone https://github.com/blob/OATFLAKE.git
cd OATFLAKE

2. Run the setup and start script

On Windows:

Double-click the start.bat file in File Explorer, or run it from the command line:

# In Command Prompt:
start.bat

# In PowerShell:
.\start.bat

On Mac/Linux:

# First time only - make the startup script executable
chmod +x start.sh

# Then run the script
./start.sh

This will:

Start the FastAPI server
Set up a ngrok tunnel for external access (if configured)
Open the web interface in your default browser

3. Create a `.env` file

Create a .env file in the root directory with the following content (replace with your actual credentials):

# Server Configuration
LOCAL_HOST=127.0.0.1
LOCAL_PORT=8999
UI_PORT=3000

# Central Server
BASE_DOMAIN=your-base-domain
API_KEY=your-api-key-here

# Ollama Configuration
OLLAMA_HOST=localhost
OLLAMA_PORT=11434

# Slack Configuration (Required)
SLACK_BOT_TOKEN=your-slack-bot-token
SLACK_SIGNING_SECRET=your-slack-signing-secret
SLACK_BOT_USER_ID=your-slack-bot-user-id

# Supabase Configuration
SUPABASE_URL=your-supabase-url
SUPABASE_KEY=your-supabase-key

# OpenRouter Configuration
OPENROUTER_API_KEY=your-openrouter-api-key

Project Structure

api/: API endpoints and routes
- routes/: RESTful endpoints organized by domain (knowledge, goals, analysis)
  - auth.py: Authentication and user management endpoints
  - knowledge.py: Knowledge base and document processing endpoints
  - goals.py: Goal tracking and management endpoints
  - slack.py: Slack integration endpoints
  - ollama.py: Ollama model interaction endpoints
  - openrouter.py: OpenRouter API integration endpoints
- middleware/: Request processing and authentication middleware
- models/: Pydantic data models for API requests and responses
- dependencies/: Reusable API dependencies and injected services
- main.py: Entry point and router registration
scripts/: Core processing and analysis scripts
- analysis/: LLM-powered content analysis modules
  - main_processor.py: Central orchestration for document processing
  - content_extractor.py: Entity extraction from documents
  - llm_analyzer.py: LLM-based document analysis
  - goal_extractor.py: Identification of goals in content
  - level_processor.py: Level-based URL discovery and processing
- data/: Document processing and management
  - document_loader.py: Format-specific document loaders
  - document_processor.py: Document chunking and preprocessing
  - embedding_service.py: Creation of vector embeddings
  - faiss_builder.py: FAISS index creation and management
- llm/: Language model integration
  - ollama_client.py: Local model inference via Ollama
  - open_router_client.py: Cloud model access via OpenRouter
  - prompt_templates.py: Reusable prompt templates
- services/: Background services and scheduled tasks
  - training_scheduler.py: Scheduled knowledge processing
  - settings_manager.py: Application settings management
  - cache_manager.py: Performance optimization through caching
- integrations/: External system connectors
  - slack.py: Slack messaging and event handling
  - supabase_connector.py: Supabase database integration
settings/: Configuration files and environment settings
- config.py: Central configuration management
- default_settings.json: Default application settings
- model_settings.json: Model-specific configuration
static/: Static assets for the web interface
- js/: JavaScript modules and UI components
  - components/: Reusable UI components
  - widgets/: Interactive widget implementations
  - modals/: Modal dialog implementations
- css/: Styling with Tailwind CSS and custom styles
  - main.css: Custom styles beyond Tailwind
- icons/: Icons and visual assets
templates/: HTML templates for web rendering
- components/: Reusable UI components
- pages/: Full page templates
utils/: Utility functions and helpers
- logging/: Logging configuration and utilities
- helpers/: Common utility functions
- security/: Authentication and authorization utilities
data/: Data storage and persistence
- vector_stores/: FAISS and other vector index storage
- processed/: Processed document outputs
tests/: Testing infrastructure and test cases
run.py: Main application entry point for running the server
start.sh/bat: Platform-specific startup scripts
.env: Environment configuration file (to be created by user)

Future Development

In-app token management through the interface (under development)
Additional integration options
Enhanced analysis capabilities

License

This project is licensed under the MIT License (Modified for Non-Commercial Use) - with the following key restrictions:

The software cannot be used for commercial purposes
Proper attribution to the origin of the software stack is required

See the LICENSE file for full details.

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
api		api
assets		assets
nginx		nginx
scripts		scripts
static		static
templates		templates
tests		tests
utils		utils
.DS_Store		.DS_Store
.gitignore		.gitignore
Info.plist		Info.plist
LICENSE		LICENSE
NOTICE.txt		NOTICE.txt
README.md		README.md
analyze_csv_fields.py		analyze_csv_fields.py
app.py		app.py
config.json		config.json
debug_knowledge_process_error.py		debug_knowledge_process_error.py
process_level.py		process_level.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run.py		run.py
server.py		server.py
setup.py		setup.py
start.bat		start.bat
start.sh		start.sh
system_endpoints.py		system_endpoints.py
universal_table_example_restructured.json		universal_table_example_restructured.json
version.txt		version.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

OATFLAKE

Introduction

Features

Architecture

Core Components

Data Flow

Key Subsystems

Analysis Engine

LLM Integration

Data Processing

Services Layer

Frontend System

API Endpoints

Prerequisites

Installation

1. Clone the repository

2. Run the setup and start script

On Windows:

On Mac/Linux:

3. Create a `.env` file

Project Structure

Future Development

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

BLOB-browser/OATFLAKE

Folders and files

Latest commit

History

Repository files navigation

OATFLAKE

Introduction

Features

Architecture

Core Components

Data Flow

Key Subsystems

Analysis Engine

LLM Integration

Data Processing

Services Layer

Frontend System

API Endpoints

Prerequisites

Installation

1. Clone the repository

2. Run the setup and start script

On Windows:

On Mac/Linux:

3. Create a .env file

Project Structure

Future Development

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

3. Create a `.env` file

Packages