Skip to content

TEN-framework/ten-turn-detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TEN Turn Detection banner

Discussion posts Commits Issues closed PRs Welcome

GitHub watchers GitHub forks GitHub stars


Table of Contents


Welcome to TEN

TEN is a collection of open-source projects for building real-time, multimodal conversational voice agents, including TEN Framework , TEN VAD , TEN Turn Detection , TEN Agent, TMAN Designer, TEN Portal , and more.


Community Channel Purpose
Follow on X Follow TEN Framework on X for updates and announcements
Discord TEN Community Join our Discord community to connect with developers
Hugging Face Space Join our Hugging Face community to explore our spaces and models
WeChat Join our WeChat group for Chinese community discussions

Important

Star TEN Repositories ⭐️

Get instant notifications for new releases and updates. Your support helps us grow and improve TEN!


TEN star us gif


Introduction

TEN Turn Detection is an advanced intelligent turn detection model designed specifically for natural and dynamic communication between humans and AI agents. This technology addresses one of the most challenging aspects of human-AI conversation: detecting natural turn-taking cues and enabling contextually-aware interruptions. TEN Turn Detection incorporates deep semantic understanding of conversation context and linguistic patterns to create more natural dialogue with AI.

TEN Turn Detection SVG Diagram

TEN Turn Detection categorizes user's text into three key states:

finished: A finished utterance where the user has expressed a complete thought and expects a response. Example: "Hey there I was wondering can you help me with my order"

wait: An ambiguous utterance where the system cannot confidently determine if more speech will follow. Example: "This conversation needs to end now"

unfinished: A clearly unfinished utterance where the user has momentarily paused but intends to continue speaking. Example: "Hello I have a question about"

These three classification states allow the TEN system to create natural conversation dynamics by intelligently managing turn-taking, reducing awkward interruptions while maintaining conversation flow.

TEN Turn Detection utilizes a multi-layered approach based on the transformer-based language model(Qwen2.5-7B) for semantic analysis.

Key Features

  • Context-Aware Turn Management TEN Turn Detection analyzes linguistic patterns and semantic context to accurately identify turn completion points. This capability enables intelligent interruption handling, allowing the system to determine when interruptions are contextually appropriate while maintaining natural conversation flow across various dialogue scenarios.

  • Multilingual Turn Detection Support TEN Turn Detection provides comprehensive support for both English and Chinese languages. It is engineered to accurately identify turn-taking cues and completion signals across multilingual conversations.

  • Superior Performance Compared with multiple open-source solutions, TEN achieves superior performance across all metrics on our publicly available test dataset.

Prepared Dataset

We have open-sourced the TEN-Turn-Detection TestSet, a bilingual (Chinese and English) collection of conversational inputs specifically designed to evaluate turn detection capabilities in AI dialogue systems. The dataset consists of three distinct components:

wait.txt: Contains expressions requesting conversation pauses or termination

unfinished.txt: Features incomplete dialogue inputs with truncated utterances

finished.txt: Provides complete conversational inputs across multiple domains

Detection Performance

We conducted comprehensive evaluations comparing several open-source models for turn detection using our test dataset:

LANGUAGE MODEL FINISHED
ACCURACY
UNFINISHED
ACCURACY
WAIT
ACCURACY
English Model A 59.74% 86.46% N/A
English Model B 71.61% 96.88% N/A
English TEN Turn Detection 90.64% 98.44% 91%
LANGUAGE MODEL FINISHED
ACCURACY
UNFINISHED
ACCURACY
WAIT
ACCURACY
Chinese Model B 74.63% 88.89% N/A
Chinese TEN Turn Detection 98.90% 92.74% 92%

Notes:

  1. Model A doesn't support Chinese language processing
  2. Neither Model A nor Model B support the "WAIT" state detection

Quick Start

Installation

git clone https://github.com/TEN-framework/ten-turn-detection.git
pip install "transformers>=4.30.0"
pip install "torch>=2.0.0"

Model Weights

The TEN Turn Detection model is available on HuggingFace:

You can download the model in several ways:

  1. Automatic download (recommended): The model weights will be automatically downloaded when you run the inference script for the first time. HuggingFace Transformers will cache the model locally.

  2. Using Git LFS:

    # Install Git LFS if you haven't already
    git lfs install
    
    # Clone the repository with model weights
    git clone https://huggingface.co/TEN-framework/TEN_Turn_Detection
  3. Using the Hugging Face Hub library:

    from huggingface_hub import snapshot_download
    
    snapshot_download(repo_id="TEN-framework/TEN_Turn_Detection")

Inference

The inference script accepts command line arguments for user input:

# Basic usage
python inference.py --input "Your text to analyze"

Example output:

Loading model from TEN-framework/TEN_Turn_Detection...
Running inference on: 'Hello I have a question about'

Results:
Input: 'Hello I have a question about'
Turn Detection Result: 'unfinished'

Citation

If you use TEN Turn Detection in your research or applications, please cite:

@misc{TEN_Turn_Detection,
author = {TEN Team},
title = {TEN Turn Detection: Turn detection for full-duplex dialogue communication 

},
year = {2025},
url = {https://github.com/TEN-framework/ten-turn-detection},
}

TEN Ecosystem

Project Preview
🏚️ TEN Framework
TEN is an open-source framework for real-time, multimodal conversational AI.

TEN VAD
TEN VAD is a low-latency, lightweight and high-performance streaming voice activity detector (VAD).

️TEN Turn Detection
TEN is for full-duplex dialogue communication.

🎙️ TEN Agent
TEN Agent is a showcase of TEN Framewrok.

🎨 TMAN Designer beta
TMAN Designer is low/no code option to make a voice agent with easy to use workflow UI.

📒 TEN Portal
The official site of TEN framework, it has documentation and blog.


License

This project is Apache 2.0 licensed.