A Simple JavaScript demo application that let's you transcribe spoken audio in the browser using Flux.
FLUX is Deepgram's breakthrough conversational AI model that understands turn-taking dynamics - not just transcribing words, but knowing when to listen, when to think, and when to speak. Perfect for building voice agents and interactive applications. Learn more about Flux by checking out our Documentation
This demo will run in Chrome and Safari browsers only. No Firefox support.
- π€ Real-time microphone input with Linear16 PCM audio processing
- π Turn-based speech recognition optimized for conversations
- β‘ Ultra-low latency with model-integrated end-of-turn detection
- π― Smart turn detection with configurable confidence thresholds
- π WebSocket proxy server with proper authentication
- π Live event monitoring with detailed FLUX response logging
- π¨ Modern responsive UI with real-time transcript display
StartOfTurn
- User begins speaking (trigger interruption)Update
- Real-time transcript updates during speechEager EndOfTurn
- Medium confidence turn end (start preparing response)TurnResumed
- Speech continues after Eager EndOfTurn (cancel response)EndOfTurn
- High confidence turn end (send to LLM)
- Node.js 14.0.0 or higher
- Deepgram API key with FLUX early access
- Modern browser with microphone access
-
Clone and install:
git clone git@github.com:deepgram-devs/deepgram-flux-demo.git cd deepgram-flux-demo npm install
-
Set your Deepgram API key:
export DEEPGRAM_API_KEY="your_deepgram_api_key_here"
-
Start the server:
npm start
-
Open the demo: Navigate to
http://localhost:3000
Note: To open the demo in Production Mode run NODE_ENV=production npm start
and navigate to: http://localhost:3000/flux-streaming
- Connect: Click "Connect to FLUX"
- Start microphone: Click "π€ Start Microphone" and grant browser permissions
- Speak clearly: The app will show real-time transcription and turn events
- Watch the magic: Observe FLUX's turn detection and conversational flow
- Leave empty: Disable for simpler implementation
- Lower values (0.3-0.4): More aggressive early turn detection
- Higher values (0.6-0.9): More conservative early turn detection
- Recommended: Start with 0.6 for balanced performance
- Default: 0.7 (good balance of speed and accuracy)
- Lower values: Faster turn detection, more false positives
- Higher values: More confident detection, slightly higher latency
This demo uses a production-ready WebSocket proxy pattern:
Browser ββ Local Proxy Server ββ Deepgram FLUX API
Why a proxy?
- π Security: API key stays server-side, never exposed to browser
- π Compatibility: Works with all browsers (WebSocket auth limitations)
- π Production-ready: Same pattern used in real voice agent applications
- π Message handling: Proper binary/text conversion for FLUX responses
Ports:
- 3000: Web interface and WebSocket proxy (both on same port)
FLUX API has strict audio format requirements:
- Format: Linear16 PCM (raw 16-bit signed little-endian)
- Sample Rate: 16000 Hz (16kHz)
- Channels: Mono only
- Chunk Size: 1024 samples (64ms) for optimal performance
- Input: Browser microphone with real-time processing
Note: Compressed formats (MP3, AAC, WebM) won't work with FLUX API.
This demo is containerized and ready for deployment to any platform that supports Docker.
Dockerfile
: Multi-stage Node.js container setupfly.toml
: Fly.io configuration (can be adapted for other platforms)
DEEPGRAM_API_KEY
: Your Deepgram API key (required)
The application supports flexible routing:
- Local development: Both
http://localhost:3000
andhttp://localhost:3000/flux-streaming
work - Production: Can be deployed at any base path (configured via
/flux-streaming
by default) - WebSocket connections: Automatically adapt to the host and path structure
- Check API key: Verify
DEEPGRAM_API_KEY
environment variable is set - FLUX access: Ensure your Deepgram account has FLUX early access enabled
- Port conflicts: Make sure port 3000 is available
- Server logs: Check terminal for detailed connection error messages
- Browser permissions: Ensure microphone access is granted
- Audio levels: Look for "π΅ Audio level" messages in the browser log
- No transcripts: Check if you see "π€ Sending chunk" messages
- HTTPS: Some browsers require HTTPS for microphone access
- Check server logs: Should see "π¨ Deepgram response" messages
- WebSocket connection: Verify proxy server shows "β Connected to Deepgram FLUX API"
- Audio format: FLUX requires Linear16 PCM (handled automatically by the app)
- Early access: Confirm your account has FLUX API access
When everything is working correctly, you should see:
π₯οΈ Browser Interface:
- Real-time transcript updates as you speak
- Turn Index, Current Event, and Confidence scores updating
- FLUX Events log showing JSON responses from the API
- Audio level indicators showing microphone input
π» Server Logs:
- Connection success to Deepgram FLUX API
- Audio chunks being forwarded (2048 bytes each)
- FLUX responses with TurnInfo events
- Message type debugging information
We love to hear from you! If you have questions: