SwiftUI Speech Diarization Example

Note: This project is currently under development and this README will be periodcally updated.

Update Sept 28, 2025 Over summer, a clever person ported the Speech Diarization model to CoreML. Its neatly wrapped and abstracted in the FluidAudio Library. The barrier for entry for using that API should be lower than this one. However, if you're doing some pretty advanced and nuanced code stuff. This project will still be useful as FluidAudio is built atop of Sherpa.

Project Overview

This repository aims to refactor and simplify the SwiftUI example provided by k2-fsa/sherpa-onnx, specifically focusing on Speech Diarization.

I wrote a companion article breaking down how and why I built this project.

Additionally, I recently created an algorithm for Active Speaker Detection using this project as a base.

Getting Started

1. Required Frameworks

Before building this project, ensure the required frameworks are in place:

onnxruntime is too large to be included directly. You must download it manually.
Sherpa-Onnx.xcframework must also be built and added to your project. See Building from Sherpa Onnx.

Without these, building the project will fail.

Note: After setup, test the app using the File Picker to load an audio file. Alternatively, hardcode a file path in ContentView (line 18) for testing.

Download Required Framework

Download the onnxruntime framework:

onnxruntime.xcframework-1.17.1.tar.bz2

Steps:

Extract the archive.
Copy onnxruntime.xcframework into your Xcode project directory.

Building from Sherpa Onnx

To build Sherpa-Onnx.xcframework, follow these steps:

Visit this link for more detailed build instructions.

Summary of Build Steps

Clone the reposity

 git clone https://github.com/k2-fsa/sherpa-onnx

Enter the repo directory
```
cd sherpa-onnx
```
Run the ios build script with
```
./build-ios.sh
```
After the script completes, a build-ios folder will be created.
Copy sherpa-onnx.xcframework from build-ios into your Xcode project.

You’ll also find onnxruntime.xcframework in:

ios-onnxruntime/1.17.1/onnxruntime.xcframework

This is the same xcframework from the previous section

The Actual App

The App requires you to select an Audio/Video file via File Picker. Alternatively, you can change line 18 in ContentView to hardcode a file in your bundle for testing.

It then converts it to a format that the speech diarization model accepts

Afterwards, run the model and the results will eventually replace the placehodler text

Screen.Recording.2025-04-11.at.8.55.42.PM.mov

Contributing

Contributions and suggestions are welcome as the project is actively evolving.

Updates and additional documentation will be provided as development progresses.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
SpeechDiarizationStarter.xcodeproj		SpeechDiarizationStarter.xcodeproj
SpeechDiarizationStarter		SpeechDiarizationStarter
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SwiftUI Speech Diarization Example

Project Overview

Getting Started

1. Required Frameworks

Download Required Framework

Building from Sherpa Onnx

Summary of Build Steps

The Actual App

Contributing

About

Uh oh!

Releases

Packages

Languages

carlosmbe/SpeechDiarizationStarter

Folders and files

Latest commit

History

Repository files navigation

SwiftUI Speech Diarization Example

Project Overview

Getting Started

1. Required Frameworks

Download Required Framework

Building from Sherpa Onnx

Summary of Build Steps

The Actual App

Contributing

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages