You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/introduction.md
+16-16Lines changed: 16 additions & 16 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,31 +10,31 @@ CueMeet.ai is a pioneering open source API that enables developers to build inte
10
10
11
11
This documentation will guide you through setting up and using the CueMeet platform effectively.
12
12
13
-
##Overview
13
+
### **1. System Architecture**
14
14
15
-
CueMeet is a powerful platform that offer a self-hostable, unified API solution for deploying intelligent meeting bots across platforms like Zoom, Google Meet, Microsoft Teams, and more. This guide will walk you through:
15
+
To fully leverage CueMeet, first, go through our **[System Architecture Guide](/cuemeet-documentation/docs/system-architecture)**, where we break down:
16
+
-**Core components** of the CueMeet ecosystem
17
+
-**How the API communicates with meeting platforms**
18
+
-**Data flow and processing pipeline**
19
+
-**Security and scalability considerations**
16
20
17
-
1.**Local Setup** - Getting your development environment ready
18
-
2.**AWS Setup** - Configuring your AWS infrastructure to run the meeting bots on demand
19
-
3.**API Integration** - Using CueMeet's APIs for communication
20
-
4.**Bot Configuration** - Setting up and customizing your bots
21
+
Understanding the architecture will help you make informed deployment decisions.
21
22
22
-
##Getting Started
23
+
### **2. Getting Started**
23
24
24
-
We recommend following these steps in order:
25
+
Once you have a clear understanding of the architecture, follow these steps:
25
26
26
-
1.First, complete the [Local Setup](/cuemeet-documentation/docs/local-setup) to prepare your development environment
27
-
2.Then, follow the [AWS Setup Guide](/cuemeet-documentation/docs/aws-setup) to configure your cloud infrastructure
28
-
3.Learn how to integrate with the [CueMeet API](/cuemeet-documentation/docs/bot/api-info) for seamless communication
29
-
4.Finally, explore our [Bot Configuration Guide](/cuemeet-documentation/docs/meeting-bots) to customize your bots
27
+
1.**[AWS Setup Guide](/cuemeet-documentation/docs/aws-setup)** – Configure your cloud infrastructure to run meeting bots efficiently.
28
+
2.**[Local Setup](/cuemeet-documentation/docs/local-setup)** – Prepare your development environment for customization.
29
+
3.**[CueMeet API Integration](/cuemeet-documentation/docs/bot/api-info)** – Learn how to interact with CueMeet's APIs.
30
+
4.**[Bot Configuration Guide](/cuemeet-documentation/docs/meeting-bots)** – Customize and improve bot functionalities.
30
31
31
32
Each section contains detailed instructions and best practices to ensure a smooth implementation.
32
33
34
+
---
35
+
33
36
## Need Help?
34
37
35
38
If you encounter any issues or have questions, please:
36
39
- Check out our [GitHub Organization](https://github.com/CueMeet) for updates and contributions
37
-
- Join our [Discord Server](https://discord.gg/55FbAHA9) to connect with the community
38
-
39
-
40
-
Let's get started with the [Local Setup](./local-setup)!
40
+
- Join our [Discord Server](https://discord.gg/GjBS3EeMzp) to connect with the community
Copy file name to clipboardExpand all lines: docs/local-setup.md
+22-22Lines changed: 22 additions & 22 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,7 +3,7 @@ sidebar_position: 3
3
3
displayed_sidebar: pageSidebar
4
4
---
5
5
6
-
# Local Setup
6
+
# Control Panel Local Setup
7
7
8
8
This guide walks you through setting up Cuemeet locally for development and testing purposes.
9
9
@@ -93,24 +93,24 @@ REDIS_HOST=redis
93
93
REDIS_PORT=6379
94
94
95
95
96
-
# AWS
97
-
AWS_ACCESS_KEY=
98
-
AWS_SECRET_KEY=
96
+
# AWS (Must be filled in from AWS setup steps)
97
+
AWS_ACCESS_KEY= # Your AWS Access Key from AWS setup
98
+
AWS_SECRET_KEY= # Your AWS Secret Key from AWS setup
99
99
100
100
## S3
101
-
AWS_BUCKET_REGION=
102
-
AWS_MEETING_BOT_BUCKET_NAME=
101
+
AWS_BUCKET_REGION= # Your S3 bucket region
102
+
AWS_MEETING_BOT_BUCKET_NAME= # Your S3 bucket name
103
103
104
-
## ECS
105
-
AWS_ECS_CLUSTER_NAME=
106
-
AWS_SECURITY_GROUP=
107
-
AWS_VPS_SUBNET=
108
-
ECS_TASK_DEFINITION_GOOGLE=
109
-
ECS_CONTAINER_NAME_GOOGLE=
110
-
ECS_TASK_DEFINITION_ZOOM=
111
-
ECS_CONTAINER_NAME_ZOOM=
112
-
ECS_TASK_DEFINITION_TEAMS=
113
-
ECS_CONTAINER_NAME_TEAMS=
104
+
## ECS (Must match the AWS configurations)
105
+
AWS_ECS_CLUSTER_NAME= # Your AWS ECS Cluster Name
106
+
AWS_SECURITY_GROUP= # Your AWS Security Group ID
107
+
AWS_VPS_SUBNET= # Your AWS Subnet ID
108
+
ECS_TASK_DEFINITION_GOOGLE= # Task Definition for Google Meet bots
109
+
ECS_CONTAINER_NAME_GOOGLE= # Container Name for Google Meet bots
110
+
ECS_TASK_DEFINITION_ZOOM= # Task Definition for Zoom bots
111
+
ECS_CONTAINER_NAME_ZOOM= # Container Name for Zoom bots
112
+
ECS_TASK_DEFINITION_TEAMS= # Task Definition for Microsoft Teams bots
113
+
ECS_CONTAINER_NAME_TEAMS= # Container Name for Microsoft Teams bots
114
114
115
115
116
116
# Meeting Bot
@@ -120,7 +120,7 @@ MEETING_BOT_RETRY_COUNT=2
120
120
# Worker Backend gRPC URL
121
121
WORKER_BACKEND_GRPC_URL=worker-grpc
122
122
```
123
-
123
+
⚠️ Important: The AWS-related environment variables must be obtained from the AWS Setup Guide. Complete the AWS setup first and copy the relevant values into this file.
124
124
</details>
125
125
126
126
<details>
@@ -151,12 +151,12 @@ REDIS_DB=2
151
151
152
152
153
153
# AWS Configuration
154
-
AWS_ACCESS_KEY_ID=
155
-
AWS_SECRET_ACCESS_KEY=
154
+
AWS_ACCESS_KEY_ID= # Your AWS Access Key from AWS setup
155
+
AWS_SECRET_ACCESS_KEY= # Your AWS Access Key from AWS setup
156
156
157
157
## AWS S3
158
-
AWS_REGION=
159
-
AWS_STORAGE_BUCKET_NAME=
158
+
AWS_REGION= # Your S3 bucket region
159
+
AWS_STORAGE_BUCKET_NAME= # Your S3 bucket name
160
160
161
161
_SIGNED_URL_EXPIRY_TIME=60
162
162
@@ -166,7 +166,7 @@ HIGHLIGHT_ENVIRONMENT_NAME=""
166
166
167
167
168
168
## ASSEMBLY AI
169
-
ASSEMBLY_AI_API_KEY=""
169
+
ASSEMBLY_AI_API_KEY="" # https://www.assemblyai.com API KEY
Copy file name to clipboardExpand all lines: docs/system-architecture.mdx
+112-9Lines changed: 112 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,7 +3,9 @@ sidebar_position: 3
3
3
displayed_sidebar: pageSidebar
4
4
---
5
5
6
-
# System Design Diagram
6
+
# System Architecture Overview
7
+
8
+
## **System Design Diagram:**
7
9
8
10
Below is a diagram of the system architecture for CueMeet and Meeting Bots Infrastructure and inner workings.
9
11
@@ -16,18 +18,119 @@ Below is a diagram of the system architecture for CueMeet and Meeting Bots Infra
16
18
17
19
<br /><br />
18
20
19
-
This architecture diagram illustrates an on-demand meeting bot system deployed on Amazon Web Services (AWS). It integrates several cloud-native and backend technologies to manage the lifecycle of meeting bots, from task orchestration to containerized execution and error-handling mechanisms. Let’s break this down in digestible chunks.
21
+
The CueMeet.ai system is designed to automate and manage meeting bots across various platforms using a robust, cloud-native architecture on Amazon Web Services (AWS). This system orchestrates bot execution, data processing, and error handling to provide a seamless meeting automation experience.
22
+
23
+
## **System Overview:**
24
+
25
+
**1. Control Backend (NestJS with BullMQ):**
26
+
27
+
-**Purpose:** Orchestrates the lifecycle of meeting bot jobs and directly manages ECS tasks.
28
+
-**Technology:** Built using NestJS with BullMQ for task queuing and scheduling.
29
+
-**Functionality:**
30
+
- Exposes APIs for managing meeting bots (no dedicated frontend).
31
+
- Schedules and manages bot jobs.
32
+
- Uses the AWS SDK to directly interact with AWS ECS.
33
+
- Stores meeting bot configuration and status in a PostgreSQL database.
34
+
- Utilizes BullMQ for task queuing and processing.
35
+
- Includes cron jobs for:
36
+
-`syncTaskStatus`: Periodically checks the status of running ECS tasks and updates the database accordingly.
-**Purpose:** Provides a serverless compute platform for running Docker containers.
47
+
-**Functionality:**
48
+
- Dynamically spins up containers based on requests from the Control Backend.
49
+
- Executes the MeetingBots, which are containerized Python applications.
50
+
- Scales automatically based on demand.
51
+
-**Amazon ECR (Elastic Container Registry):** Stores the Docker images required for the MeetingBots, ensuring efficient deployment and version control.
52
+
53
+
**3. MeetingBots (Python with [Selenium](https://selenium-python.readthedocs.io/) and [FFmpeg](https://www.ffmpeg.org/)):**
54
+
55
+
-**Purpose:** Automates meeting participation and data capture.
56
+
-**Technology:** Built using Python, [Selenium](https://selenium-python.readthedocs.io/) (for browser automation), and [FFmpeg](https://www.ffmpeg.org/) (for audio recording).
57
+
-**Functionality:**
58
+
- Programmatically joins meetings on various platforms (Google Meet, Zoom, Microsoft Teams).
59
+
- Captures high-quality audio recordings.
60
+
- Extracts live captions and transcripts from the respective meeting platform.
61
+
- Uploads the audio and transcript data in a combined `.tar` file and also standalone audio files (.opus) to Amazon S3 using presigned URLs.
62
+
- The output files are stored in the `/out` directory inside the container.
63
+
64
+
**4. Amazon S3 (Object Storage):**
65
+
66
+
-**Purpose:** Stores the meeting data (audio and transcript files).
67
+
-**Functionality:**
68
+
- Receives uploaded data from the MeetingBots.
69
+
- Provides durable and scalable storage.
70
+
- Triggers SNS notifications on new file uploads.
71
+
72
+
**5. Worker Backend (Django with Celery):**
20
73
21
-
Core Workflow
74
+
-**Purpose:** Processes the uploaded meeting data, generating enhanced transcripts and performing post-processing tasks.
75
+
-**Technology:** Built using Django and Celery, a distributed task queue.
76
+
-**Functionality:**
77
+
- Receives notifications from S3 via SNS.
78
+
- Downloads the `.tar` file from S3.
79
+
- Extracts audio and transcript files.
80
+
- Uses [AssemblyAI](https://www.assemblyai.com/) for high-quality audio transcription.
81
+
- Uses [NLTK](https://www.nltk.org/) and [rapidfuzz](https://github.com/rapidfuzz/RapidFuzz) for speaker reconciliation, combining AssemblyAI's audio transcription with the original meeting captions.
82
+
- Stores the processed data in a PostgreSQL database.
83
+
- Exposes a gRPC interface for communication with the Control Backend.
84
+
- Provides a Celery task to retry failed transcriptions.
85
+
-**Post-Processing (Speaker Reconciliation):**
86
+
- Uses [AssemblyAI](https://www.assemblyai.com/) for accurate audio transcription.
87
+
- Aligns speaker names from the original meeting captions with the [AssemblyAI](https://www.assemblyai.com/) transcript.
88
+
- Employs [NLTK](https://www.nltk.org/) for sentence tokenization and [rapidfuzz](https://github.com/rapidfuzz/RapidFuzz) for fuzzy string matching to reconcile speakers.
89
+
- Handles timestamp alignment and text similarity comparisons.
90
+
-**Django Celery Tasks:**
91
+
-`_transcript_generator_worker`:
92
+
- Downloads the `.tar` file from S3.
93
+
- Extracts audio and transcript files.
94
+
- Transcribes the audio using [AssemblyAI](https://www.assemblyai.com/).
95
+
- Reconciles speaker labels using the meeting metadata and NLTK/rapidfuzz.
96
+
- Stores the transcription data in the PostgreSQL database.
97
+
- Handles error logging and retries.
98
+
-`_transcript_retry_cronjob`:
99
+
- Periodically checks for failed transcription tasks.
100
+
- Retries failed tasks, up to a maximum number of retries.
101
+
- Sends failure notifications if retries are exhausted.
102
+
-**PostgreSQL Database:** Stores processed meeting data, including transcripts and metadata.
103
+
-**Redis Integration:** Celery uses Redis as a message broker and result backend.
104
+
-**gRPC Interface:** Enables communication between the Worker Backend and the Control Backend.
105
+
-**SNS/SQS Integration:** S3 triggers SNS notifications upon file upload, which are then queued in SQS for processing by the Worker Backend. This system includes a Dead Letter Queue (DLQ) for handling failed messages.
22
106
23
-
At the heart of the system lies a Control Backend, built with BullMQ (a Redis-based task queue) and a reactive framework (likely NestJS, judging by the pink logo). This backend schedules and manages bot jobs, using AWS SDK to dynamically spin up containers in AWS Fargate a serverless compute engine for containers. These containers run the MeetingBot, likely a Python-based automation tool, utilizing Selenium for browser control.
107
+
**6. Error Monitoring and Observability:**
24
108
25
-
Execution & Persistence
109
+
-**[highlight.io](https://www.highlight.io/):**
110
+
- Monitors the system for errors and exceptions for both meeting bots and each of the control panel services.
111
+
- Provides real-time alerts via Slack.
112
+
- Captures detailed logs and error reports.
113
+
-**CloudWatch:**
114
+
- Monitors ECS logs and application-level metrics.
115
+
- Aids in debugging and performance optimization.
26
116
27
-
When a bot job is dispatched, Fargate pulls the required Docker image from Amazon ECR (Elastic Container Registry). These bots, upon execution, may generate artifacts (e.g., screenshots, logs) that are uploaded to Amazon S3. This decouples the data from the execution lifecycle and ensures durability. Meanwhile, the Worker Backend (built on Django and powered by Celery) interacts with the Control Backend using gRPC. It’s likely responsible for long-running jobs, failure recovery, and scheduling. Redis and PostgreSQL back these services with fast in-memory caching and persistent state, respectively.
117
+
### **Workflow Summary:**
28
118
29
-
Failure Handling & Observability
119
+
1. A meeting bot task is scheduled or triggered via the Control Backend API.
120
+
2. The Control Backend uses the ECS Client Service to launch an ECS task in Fargate.
121
+
3. The MeetingBot joins the meeting, records audio, and extracts captions.
122
+
4. The MeetingBot uploads the data to S3.
123
+
5. S3 triggers an SNS notification.
124
+
6. The Worker Backend, via Celery, processes the uploaded data, transcribes it using [AssemblyAI](https://www.assemblyai.com/), and reconciles speakers using [NLTK](https://www.nltk.org/) and [rapidfuzz](https://github.com/rapidfuzz/RapidFuzz).
125
+
7. The processed data is stored in the PostgreSQL database.
126
+
8.[highlight.io](https://www.highlight.io/) and CloudWatch monitor the system for errors and performance issues.
30
127
31
-
In the event of issues like failure to publish events or data corruption notifications are sent to an SNS Topic (Simple Notification Service). To ensure reliability, the SNS topic is paired with a Dead Letter Queue (DLQ) via SQS (Simple Queue Service), catching unprocessed or failed messages for further inspection or reprocessing.
128
+
### **Key Technologies**
32
129
33
-
CloudWatch (not shown here but implied in AWS architecture) is typically used to monitor ECS logs and application-level metrics, aiding in debugging and performance optimization.
0 commit comments