Skip to content

Commit 7882862

Browse files
authored
Merge pull request #209 from aws-samples/Nova_speech_2_speech
feat(novasample): speech to speech sample with amazon nova
2 parents 6c5c414 + a1131de commit 7882862

File tree

106 files changed

+36587
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

106 files changed

+36587
-0
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ This repo provides samples to demonstrate how to build your own Generative AI so
2424
|[Stateless MCP Server on AWS Lambda](samples/mcp-stateless-lambda/)| Sample MCP Server running natively on AWS Lambda and API Gateway without any extra bridging components or custom transports and a test MCP client. | API layer | TypeScript |
2525
|[Stateless MCP Server on ECS](samples/mcp-stateless-ecs/)| Sample stateless MCP Server running natively on ECS Fargate and ALB without any extra bridging components or custom transports and a test MCP client. | API layer | TypeScript |
2626
|[Stateful MCP Server on ECS](samples/mcp-stateful-ecs/)| Sample stateful MCP Server running natively on ECS Fargate and ALB without any extra bridging components or custom transports and a test MCP client. | API layer | TypeScript |
27+
|[Speech to speech](samples/speech-to-speech/)| Real-time Speech to Speech solution with Amazon Nova Sonic, featuring a Java WebSocket server and React frontend. | Backend + Frontend | Python for Backend, TypeScript (React) for Frontend |
2728

2829
## Contributing
2930

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
*.js
2+
!jest.config.js
3+
*.d.ts
4+
node_modules
5+
6+
# CDK asset staging directory
7+
.cdk.staging
8+
*cdk.out

samples/content-generation/.gitignore

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
*.js
2+
!jest.config.js
3+
*.d.ts
4+
node_modules
5+
6+
# CDK asset staging directory
7+
.cdk.staging
8+
*cdk.out

samples/document_explorer/.gitignore

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
*.js
2+
!jest.config.js
3+
*.d.ts
4+
node_modules
5+
6+
# CDK asset staging directory
7+
.cdk.staging
8+
*cdk.out

samples/image-description/.gitignore

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
*.js
2+
!jest.config.js
3+
*.d.ts
4+
node_modules
5+
6+
# CDK asset staging directory
7+
.cdk.staging
8+
*cdk.out

samples/speech-to-speech/.gitignore

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
!jest.config.js
2+
*.d.ts
3+
node_modules
4+
5+
# CDK asset staging directory
6+
.cdk.staging
7+
*cdk.out

samples/speech-to-speech/README.md

Lines changed: 246 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,246 @@
1+
# Nova Sonic Solution
2+
3+
## Table of Contents
4+
5+
- [Overview](#overview)
6+
- [Architecture](#architecture)
7+
- [Project Structure](#project-structure)
8+
- [Prerequisites](#prerequisites)
9+
- [Deployment](#deployment)
10+
- [User creation](#user-creation)
11+
- [Usage](#usage)
12+
- [Load testing](#load-testing)
13+
- [Clean Up](#clean-up)
14+
- [Content Security Legal Disclaimer](#content-security-legal-disclaimer)
15+
- [Operational Metrics Collection](#operational-metrics-collection)
16+
17+
## Overview
18+
19+
A real-time speech-to-speech communication platform powered by Amazon Bedrock's Nova model for advanced language processing and AWS real-time messaging capabilities, featuring a Java WebSocket server and React frontend. Nova enables natural, context-aware speech-to-speech conversations through its state-of-the-art language understanding and generation capabilities.
20+
21+
## Architecture
22+
23+
![Architecture Diagram](docs/images/architecture.png)
24+
25+
The solution consists of three main components:
26+
27+
1. **Frontend Application**
28+
- React + TypeScript application
29+
- Real-time WebSocket communication
30+
- AWS Amplify for authentication
31+
- Tailwind CSS for styling
32+
33+
2. **Backend Infrastructure**
34+
- AWS CDK for infrastructure as code
35+
- Java WebSocket server running on AWS Fargate
36+
- Amazon Cognito for user authentication
37+
- CloudFront for content delivery
38+
- S3 for static website hosting
39+
- Network Load Balancer for WebSocket traffic
40+
41+
3. **Development Tools**
42+
- Load testing suite for WebSocket performance testing
43+
- Automated deployment pipeline
44+
45+
## Project Structure
46+
47+
```
48+
.
49+
├── frontend/ # React + TypeScript frontend application
50+
├── backend/ # AWS CDK infrastructure and Java WebSocket server
51+
│ ├── app/ # Java WebSocket server implementation
52+
│ ├── stack/ # CDK infrastructure code
53+
│ └── load-test/ # WebSocket load testing suite
54+
└── images/ # Architecture diagrams and documentation images
55+
```
56+
57+
## Prerequisites
58+
59+
- [Python](https://www.python.org/downloads/) 3.11 or higher
60+
- [Docker Desktop](https://docs.docker.com/desktop/install/)
61+
- [Gradle](https://gradle.org/install/) 7.x or higher
62+
- [Git](https://git-scm.com/downloads)
63+
- [AWS CDK Toolkit](https://docs.aws.amazon.com/cdk/v2/guide/cli.html)
64+
- [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html)
65+
```
66+
aws configure --profile [your-profile]
67+
AWS Access Key ID [None]: xxxxxx
68+
AWS Secret Access Key [None]:yyyyyyyyyy
69+
Default region name [None]: us-east-1
70+
Default output format [None]: json
71+
```
72+
- Node.js: v18.12.1 or higher
73+
- npm 8.x or higher
74+
- Ensure you enable model access to Amazon Nova Sonic in the [Bedrock console](https://us-east-1.console.aws.amazon.com/bedrock/home?region=us-east-1#/modelaccess) in the region you intend to deploy this sample. For an up to date list of supported regions for Amazon Nova Sonic, please refer to the [documentation](https://docs.aws.amazon.com/bedrock/latest/userguide/models-regions.html)
75+
- Chrome, Safari, or Edge browser environment (Firefox is currently not supported)
76+
- Microphone and speakers
77+
78+
## Deployment
79+
80+
1. If not done already, clone this repository:
81+
82+
```shell
83+
$ git clone https://github.com/aws-samples/generative-ai-cdk-constructs-samples.git
84+
```
85+
86+
2. Enter the sample directory:
87+
88+
```shell
89+
$ cd samples/speech-to-speech
90+
```
91+
92+
3. Build the frontend first:
93+
94+
```shell
95+
$ cd frontend
96+
```
97+
98+
Install dependencies:
99+
100+
```shell
101+
$ npm install
102+
```
103+
104+
Build the web application
105+
106+
```shell
107+
$ npm run build
108+
```
109+
110+
The build output in `frontend/dist/` directory will be automatically deployed by the backend CDK stack to S3 and served through CloudFront. The environment variables are automatically configured by the `custom_resource_construct.py` in the CDK stack, which updates the frontend configuration during deployment.
111+
112+
4. Go to the backend directory:
113+
114+
```shell
115+
$ cd ../backend
116+
```
117+
118+
5. Create a virtualenv on MacOS and Linux:
119+
120+
```shell
121+
$ python3 -m venv .venv
122+
```
123+
124+
After the init process completes and the virtualenv is created, you can use the following
125+
step to activate your virtualenv.
126+
127+
```shell
128+
$ source .venv/bin/activate
129+
```
130+
131+
If you are a Windows platform, you would activate the virtualenv like this:
132+
133+
```shell
134+
$ .venv\Scripts\activate.bat
135+
```
136+
137+
6. Once the virtualenv is activated, you can install the required dependencies.
138+
139+
```shell
140+
$ pip install -r requirements.txt
141+
```
142+
143+
7. Run the following to bootstrap your account:
144+
145+
```shell
146+
$ cdk bootstrap
147+
```
148+
149+
8. Run AWS CDK Toolkit to deploy the Backend stack with the runtime resources.
150+
151+
```shell
152+
$ cdk deploy --require-approval=never
153+
```
154+
155+
Any modifications made to the code can be applied to the deployed stack by running the same command again.
156+
157+
```shell
158+
cdk deploy --require-approval=never
159+
```
160+
161+
The command above will deploy one stack in your account. With the default configuration of this sample, the observed deployment time was ~646 seconds (10.5 minutes).
162+
163+
Get the CloudFront domain name:
164+
165+
```shell
166+
aws cloudformation describe-stacks \
167+
--stack-name NovaSonicSolutionBackendStack \
168+
--query 'Stacks[0].Outputs[?OutputKey==`CloudFrontDistributionDomainName`].OutputValue' \
169+
--output text
170+
```
171+
172+
The frontend can be accessed at the domain name above (XXXX.cloudfront.net).
173+
174+
## User creation
175+
176+
First, locate the Cognito User Pool ID, through the AWS CLI:
177+
178+
```shell
179+
$ aws cloudformation describe-stacks --stack-name NovaSonicSolutionBackendStack --query "Stacks[0].Outputs[?contains(OutputKey, 'UserPoolId')].OutputValue"
180+
181+
[
182+
"OutputValue": "<region>_a1aaaA1Aa"
183+
]
184+
```
185+
186+
1. Navigate to AWS Console:
187+
2. Search for "Cognito" in the AWS Console search bar, Click on "Cognito" under Services, Click on "User Pools" in the left navigation.
188+
Find and click on the User Pool created by the CDK stack you recovered above.
189+
3. In the User Pool dashboard, click "Users" in the left navigation. Click the "Create user" button and create user with password.
190+
191+
## Usage
192+
193+
1. Open your browser and go to the application URL (CloudFront domain from CDK outputs) previously recovered.
194+
2. Click on "Speech to Speech" in the sidebar navigation menu.
195+
3. Click the "Start Streaming" button. When prompted, allow access to your microphone.
196+
4. Begin speaking - you should see your speech being transcribed in real-time on the UI
197+
5. The assistant will automatically process your message and respond through speech
198+
6. Click "Stop Streaming" when you're done
199+
200+
![Speech to Speech Interface](docs/images/speechToSpeech_home.png)
201+
202+
> Note: Ensure your microphone is properly connected and working before testing. The browser may require you to grant microphone permissions the first time you use the feature.
203+
204+
## Load testing
205+
206+
The [backend/load-test](backend/load-test/) directory contains [Artillery](https://www.artillery.io/docs) scripts for WebSocket performance testing. This will require the installation of [Artillery](https://www.artillery.io/docs/get-started/get-artillery).
207+
208+
1. Set up load testing:
209+
210+
```shell
211+
$ cd backend/load-test
212+
$ npm install
213+
$ ./setup-load-test.sh
214+
```
215+
216+
2. Run load tests:
217+
218+
```shell
219+
$ ./run-load-test.sh
220+
```
221+
222+
3. Generate HTML report
223+
224+
```shell
225+
$ artillery report report.json
226+
```
227+
228+
## Clean Up
229+
230+
Do not forget to delete the stack to avoid unexpected charges.
231+
232+
```shell
233+
cdk destroy NovaSonicSolutionBackendStack
234+
```
235+
236+
Delete the associated logs created by the different services in Amazon CloudWatch logs.
237+
238+
Ensure S3 buckets are emptied before deletion.
239+
240+
## Content Security Legal Disclaimer
241+
242+
The sample code; software libraries; command line tools; proofs of concept; templates; or other related technology (including any of the foregoing that are provided by our personnel) is provided to you as AWS Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.
243+
244+
## Operational Metrics Collection
245+
246+
This solution collects anonymous operational metrics to help AWS improve the quality and features of the solution. Data collection is subject to the AWS Privacy Policy (https://aws.amazon.com/privacy/). To opt out of this feature, simply remove the tag(s) starting with “uksb-” or “SO” from the description(s) in any CloudFormation templates or CDK TemplateOptions.
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
*.swp
2+
package-lock.json
3+
__pycache__
4+
.pytest_cache
5+
.venv
6+
*.egg-info
7+
8+
# CDK asset staging directory
9+
.cdk.staging
10+
cdk.out
11+
12+
### macOS ###
13+
# General
14+
.DS_Store
15+
.AppleDouble
16+
.LSOverride
17+
# Gradle files
18+
.gradle/
19+
**/.gradle/
20+
*/build/
21+
build
22+
# Java class files
23+
*.class
24+
**/*.class
25+
26+
cdk.context.json
Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
FROM gradle:8.6.0-jdk21 AS build
2+
3+
WORKDIR /app
4+
COPY . .
5+
# Ensure we build a proper executable JAR with manifest
6+
RUN gradle build --no-daemon
7+
8+
FROM eclipse-temurin:21.0.2_13-jre-jammy
9+
10+
# Create a non-root user
11+
RUN groupadd -r appuser && useradd -r -g appuser appuser
12+
13+
WORKDIR /app
14+
COPY --from=build /app/app/build/libs/*.jar app.jar
15+
16+
# Update system packages to fix vulnerabilities
17+
RUN apt-get update && \
18+
apt-get upgrade -y && \
19+
apt-get clean && \
20+
rm -rf /var/lib/apt/lists/*
21+
22+
# Set environment variables
23+
ENV AWS_REGION=us-east-1
24+
ENV PORT=8080
25+
ENV LOG_LEVEL=INFO
26+
ENV CORS_ALLOWED_ORIGINS=*
27+
ENV COGNITO_USER_POOL_ID=""
28+
ENV AWS_ACCESS_KEY_ID=""
29+
ENV AWS_SECRET_ACCESS_KEY=""
30+
ENV AWS_SESSION_TOKEN=""
31+
ENV DEPLOYMENT_TYPE="remote"
32+
33+
# Change ownership of the application files to the non-root user
34+
RUN chown -R appuser:appuser /app
35+
36+
# Switch to non-root user
37+
USER appuser
38+
39+
EXPOSE ${PORT}
40+
41+
# Health check
42+
HEALTHCHECK --interval=30s --timeout=5s --start-period=30s --retries=3 CMD curl -f http://localhost:${PORT}/health || exit 1
43+
44+
# Start Java WebSocket server
45+
CMD ["java", "-jar", "app.jar"]

0 commit comments

Comments
 (0)