|
| 1 | +--- |
| 2 | +layout: default |
| 3 | +title: Flow Classes |
| 4 | +parent: Configuration |
| 5 | +grand_parent: Reference |
| 6 | +nav_order: 1 |
| 7 | +permalink: /reference/configuration/flow-classes |
| 8 | +--- |
| 9 | + |
| 10 | +# Flow Class Configuration |
| 11 | + |
| 12 | +Flow classes define complete dataflow pattern templates in TrustGraph. When instantiated, they create interconnected networks of processors that handle data ingestion, processing, storage, and querying as a unified system. |
| 13 | + |
| 14 | +## Overview |
| 15 | + |
| 16 | +A flow class serves as a blueprint for creating flow instances. Each flow class defines: |
| 17 | +- **Shared services** that are used by all flow instances of the same class |
| 18 | +- **Flow-specific processors** that are unique to each flow instance |
| 19 | +- **Interfaces** that define how external systems interact with the flow |
| 20 | +- **Queue patterns** that route messages between processors |
| 21 | + |
| 22 | +Flow classes are stored in TrustGraph's configuration system with the configuration type `flow-classes` and are managed through dedicated CLI commands. |
| 23 | + |
| 24 | +## Structure |
| 25 | + |
| 26 | +Every flow class definition has four main sections: |
| 27 | + |
| 28 | +### 1. Class Section |
| 29 | + |
| 30 | +Defines shared service processors that are instantiated once per flow class. These processors handle requests from all flow instances of this class. |
| 31 | + |
| 32 | +```json |
| 33 | +{ |
| 34 | + "class": { |
| 35 | + "embeddings:{class}": { |
| 36 | + "request": "non-persistent://tg/request/embeddings:{class}", |
| 37 | + "response": "non-persistent://tg/response/embeddings:{class}" |
| 38 | + }, |
| 39 | + "text-completion:{class}": { |
| 40 | + "request": "non-persistent://tg/request/text-completion:{class}", |
| 41 | + "response": "non-persistent://tg/response/text-completion:{class}" |
| 42 | + } |
| 43 | + } |
| 44 | +} |
| 45 | +``` |
| 46 | + |
| 47 | +**Characteristics:** |
| 48 | +- Shared across all flow instances of the same class |
| 49 | +- Typically expensive or stateless services (LLMs, embedding models) |
| 50 | +- Use `{class}` template variable for queue naming |
| 51 | +- Examples: `embeddings:{class}`, `text-completion:{class}`, `graph-rag:{class}` |
| 52 | + |
| 53 | +### 2. Flow Section |
| 54 | + |
| 55 | +Defines flow-specific processors that are instantiated for each individual flow instance. Each flow gets its own isolated set of these processors. |
| 56 | + |
| 57 | +```json |
| 58 | +{ |
| 59 | + "flow": { |
| 60 | + "chunker:{id}": { |
| 61 | + "input": "persistent://tg/flow/chunk:{id}", |
| 62 | + "output": "persistent://tg/flow/chunk-load:{id}" |
| 63 | + }, |
| 64 | + "pdf-decoder:{id}": { |
| 65 | + "input": "persistent://tg/flow/document-load:{id}", |
| 66 | + "output": "persistent://tg/flow/chunk:{id}" |
| 67 | + } |
| 68 | + } |
| 69 | +} |
| 70 | +``` |
| 71 | + |
| 72 | +**Characteristics:** |
| 73 | +- Unique instance per flow |
| 74 | +- Handle flow-specific data and state |
| 75 | +- Use `{id}` template variable for queue naming |
| 76 | +- Examples: `chunker:{id}`, `pdf-decoder:{id}`, `kg-extract-relationships:{id}` |
| 77 | + |
| 78 | +### 3. Interfaces Section |
| 79 | + |
| 80 | +Defines the entry points and interaction contracts for the flow. These form the API surface for external systems and internal component communication. |
| 81 | + |
| 82 | +Interfaces can take two forms: |
| 83 | + |
| 84 | +**Fire-and-Forget Pattern** (single queue): |
| 85 | +```json |
| 86 | +{ |
| 87 | + "interfaces": { |
| 88 | + "document-load": "persistent://tg/flow/document-load:{id}", |
| 89 | + "triples-store": "persistent://tg/flow/triples-store:{id}" |
| 90 | + } |
| 91 | +} |
| 92 | +``` |
| 93 | + |
| 94 | +**Request/Response Pattern** (object with request/response fields): |
| 95 | +```json |
| 96 | +{ |
| 97 | + "interfaces": { |
| 98 | + "embeddings": { |
| 99 | + "request": "non-persistent://tg/request/embeddings:{class}", |
| 100 | + "response": "non-persistent://tg/response/embeddings:{class}" |
| 101 | + }, |
| 102 | + "text-completion": { |
| 103 | + "request": "non-persistent://tg/request/text-completion:{class}", |
| 104 | + "response": "non-persistent://tg/response/text-completion:{class}" |
| 105 | + } |
| 106 | + } |
| 107 | +} |
| 108 | +``` |
| 109 | + |
| 110 | +**Types of Interfaces:** |
| 111 | +- **Entry Points**: Where external systems inject data (`document-load`, `agent`) |
| 112 | +- **Service Interfaces**: Request/response patterns for services (`embeddings`, `text-completion`) |
| 113 | +- **Data Interfaces**: Fire-and-forget data flow connection points (`triples-store`, `entity-contexts-load`) |
| 114 | + |
| 115 | +### 4. Metadata |
| 116 | + |
| 117 | +Additional information about the flow class: |
| 118 | + |
| 119 | +```json |
| 120 | +{ |
| 121 | + "description": "Standard RAG pipeline with document processing and query capabilities", |
| 122 | + "tags": ["rag", "document-processing", "embeddings", "graph-query"] |
| 123 | +} |
| 124 | +``` |
| 125 | + |
| 126 | +## Template Variables |
| 127 | + |
| 128 | +Flow class definitions use template variables that are replaced when flow instances are created: |
| 129 | + |
| 130 | +### {id} |
| 131 | +- **Purpose**: Creates isolated resources for each flow instance |
| 132 | +- **Usage**: Flow-specific processors and data pathways |
| 133 | +- **Example**: `persistent://tg/flow/chunk-load:{id}` becomes `persistent://tg/flow/chunk-load:customer-A-flow` |
| 134 | + |
| 135 | +### {class} |
| 136 | +- **Purpose**: Creates shared resources across flows of the same class |
| 137 | +- **Usage**: Shared services and expensive processors |
| 138 | +- **Example**: `non-persistent://tg/request/embeddings:{class}` becomes `non-persistent://tg/request/embeddings:standard-rag` |
| 139 | + |
| 140 | +## Queue Patterns |
| 141 | + |
| 142 | +Flow classes use Apache Pulsar for messaging. Queue names follow the Pulsar format: |
| 143 | + |
| 144 | +``` |
| 145 | +<persistence>://<tenant>/<namespace>/<topic> |
| 146 | +``` |
| 147 | + |
| 148 | +### Queue Components |
| 149 | + |
| 150 | +| Component | Description | Examples | |
| 151 | +|-----------|-------------|----------| |
| 152 | +| **persistence** | Pulsar persistence mode | `persistent`, `non-persistent` | |
| 153 | +| **tenant** | Organization identifier | `tg` (TrustGraph) | |
| 154 | +| **namespace** | Messaging pattern | `flow`, `request`, `response` | |
| 155 | +| **topic** | Queue/topic name | `chunk-load:{id}`, `embeddings:{class}` | |
| 156 | + |
| 157 | +### Persistent Queues |
| 158 | + |
| 159 | +Used for fire-and-forget services and durable data flow: |
| 160 | + |
| 161 | +``` |
| 162 | +persistent://tg/flow/<topic>:{id} |
| 163 | +``` |
| 164 | + |
| 165 | +**Characteristics:** |
| 166 | +- Data persists in Pulsar storage across restarts |
| 167 | +- Used for document processing pipelines |
| 168 | +- Ensures data durability and reliability |
| 169 | +- Examples: `persistent://tg/flow/chunk-load:{id}`, `persistent://tg/flow/triples-store:{id}` |
| 170 | + |
| 171 | +### Non-Persistent Queues |
| 172 | + |
| 173 | +Used for request/response messaging patterns: |
| 174 | + |
| 175 | +``` |
| 176 | +non-persistent://tg/request/<topic>:{class} |
| 177 | +non-persistent://tg/response/<topic>:{class} |
| 178 | +``` |
| 179 | + |
| 180 | +**Characteristics:** |
| 181 | +- Ephemeral, not persisted to disk |
| 182 | +- Lower latency, suitable for RPC-style communication |
| 183 | +- Used for shared services like embeddings and LLM calls |
| 184 | +- Examples: `non-persistent://tg/request/embeddings:{class}`, `non-persistent://tg/response/text-completion:{class}` |
| 185 | + |
| 186 | +## Complete Example |
| 187 | + |
| 188 | +Here's a simplified flow class definition for a standard RAG pipeline: |
| 189 | + |
| 190 | +```json |
| 191 | +{ |
| 192 | + "description": "Standard RAG pipeline with document processing and query capabilities", |
| 193 | + "tags": ["rag", "document-processing", "embeddings"], |
| 194 | + |
| 195 | + "class": { |
| 196 | + "embeddings:{class}": { |
| 197 | + "request": "non-persistent://tg/request/embeddings:{class}", |
| 198 | + "response": "non-persistent://tg/response/embeddings:{class}" |
| 199 | + }, |
| 200 | + "text-completion:{class}": { |
| 201 | + "request": "non-persistent://tg/request/text-completion:{class}", |
| 202 | + "response": "non-persistent://tg/response/text-completion:{class}" |
| 203 | + } |
| 204 | + }, |
| 205 | + |
| 206 | + "flow": { |
| 207 | + "pdf-decoder:{id}": { |
| 208 | + "input": "persistent://tg/flow/document-load:{id}", |
| 209 | + "output": "persistent://tg/flow/chunk:{id}" |
| 210 | + }, |
| 211 | + "chunker:{id}": { |
| 212 | + "input": "persistent://tg/flow/chunk:{id}", |
| 213 | + "output": "persistent://tg/flow/chunk-load:{id}" |
| 214 | + }, |
| 215 | + "vectorizer:{id}": { |
| 216 | + "input": "persistent://tg/flow/chunk-load:{id}", |
| 217 | + "output": "persistent://tg/flow/doc-embeds-store:{id}" |
| 218 | + } |
| 219 | + }, |
| 220 | + |
| 221 | + "interfaces": { |
| 222 | + "document-load": "persistent://tg/flow/document-load:{id}", |
| 223 | + "embeddings": { |
| 224 | + "request": "non-persistent://tg/request/embeddings:{class}", |
| 225 | + "response": "non-persistent://tg/response/embeddings:{class}" |
| 226 | + }, |
| 227 | + "text-completion": { |
| 228 | + "request": "non-persistent://tg/request/text-completion:{class}", |
| 229 | + "response": "non-persistent://tg/response/text-completion:{class}" |
| 230 | + } |
| 231 | + } |
| 232 | +} |
| 233 | +``` |
| 234 | + |
| 235 | +## Flow Instantiation |
| 236 | + |
| 237 | +When a flow instance is created from this class: |
| 238 | + |
| 239 | +**Given:** |
| 240 | +- Flow Instance ID: `customer-A-flow` |
| 241 | +- Flow Class: `standard-rag` |
| 242 | + |
| 243 | +**Template Expansions:** |
| 244 | +- `persistent://tg/flow/chunk-load:{id}` → `persistent://tg/flow/chunk-load:customer-A-flow` |
| 245 | +- `non-persistent://tg/request/embeddings:{class}` → `non-persistent://tg/request/embeddings:standard-rag` |
| 246 | + |
| 247 | +**Result:** |
| 248 | +- Isolated document processing pipeline for `customer-A-flow` |
| 249 | +- Shared embedding service for all `standard-rag` flows |
| 250 | +- Complete dataflow from document ingestion through querying |
| 251 | + |
| 252 | +## Dataflow Architecture |
| 253 | + |
| 254 | +Flow classes create unified dataflows where: |
| 255 | + |
| 256 | +1. **Document Processing Pipeline**: Flows from ingestion through transformation to storage |
| 257 | +2. **Query Services**: Integrated processors that query the same data stores and services |
| 258 | +3. **Shared Services**: Centralized processors that all flows can utilize |
| 259 | +4. **Storage Writers**: Persist processed data to appropriate stores |
| 260 | + |
| 261 | +All processors (both `{id}` and `{class}`) work together as a cohesive dataflow graph, not as separate systems. |
| 262 | + |
| 263 | +## Benefits |
| 264 | + |
| 265 | +### Resource Efficiency |
| 266 | +- Expensive services (LLMs, embedding models) are shared across flows |
| 267 | +- Reduces computational costs and resource usage |
| 268 | + |
| 269 | +### Flow Isolation |
| 270 | +- Each flow has its own data processing pipeline |
| 271 | +- Prevents data mixing between different flows |
| 272 | + |
| 273 | +### Scalability |
| 274 | +- Can instantiate multiple flows from the same template |
| 275 | +- Horizontal scaling by adding more flow instances |
| 276 | + |
| 277 | +### Modularity |
| 278 | +- Clear separation between shared and flow-specific components |
| 279 | +- Easy to modify and extend flow capabilities |
| 280 | + |
| 281 | +### Unified Architecture |
| 282 | +- Query and processing are part of the same dataflow |
| 283 | +- Consistent data handling across ingestion and retrieval |
| 284 | + |
| 285 | +## Common Patterns |
| 286 | + |
| 287 | +### Standard RAG Flow |
| 288 | +- Document ingestion → chunking → embedding → storage |
| 289 | +- Query interface for retrieval and generation |
| 290 | + |
| 291 | +### Knowledge Graph Flow |
| 292 | +- Document ingestion → entity extraction → relationship extraction → graph storage |
| 293 | +- Query interface for graph traversal and reasoning |
| 294 | + |
| 295 | +### Object Extraction Flow |
| 296 | +- Document ingestion → structured data extraction → object storage |
| 297 | +- Query interface for structured data retrieval |
| 298 | + |
| 299 | +## Best Practices |
| 300 | + |
| 301 | +### Queue Design |
| 302 | +- Use persistent queues for data that must survive restarts |
| 303 | +- Use non-persistent queues for fast request/response patterns |
| 304 | +- Include template variables in queue names for proper isolation |
| 305 | + |
| 306 | +### Service Sharing |
| 307 | +- Share expensive services (LLMs, embeddings) at the class level |
| 308 | +- Keep data processing isolated at the flow level |
| 309 | + |
| 310 | +### Interface Design |
| 311 | +- Provide clear entry points for external systems |
| 312 | +- Use request/response patterns for synchronous operations |
| 313 | +- Use fire-and-forget patterns for asynchronous data flow |
| 314 | + |
| 315 | +### Template Variables |
| 316 | +- Use `{id}` for flow-specific resources |
| 317 | +- Use `{class}` for shared resources |
| 318 | +- Be consistent with naming conventions |
| 319 | + |
| 320 | +## See Also |
| 321 | + |
| 322 | +- [tg-put-flow-class](../cli/tg-put-flow-class) - Create or update flow classes |
| 323 | +- [tg-get-flow-class](../cli/tg-get-flow-class) - Retrieve flow class definitions |
| 324 | +- [tg-show-flow-classes](../cli/tg-show-flow-classes) - List available flow classes |
| 325 | +- [Flow Processor Reference](../extending/flow-processor) - Building custom processors |
| 326 | +- [Pulsar Configuration](pulsar) - Message queue configuration |
0 commit comments