Audio Vessel Classifier

Project Overview

The Audio Vessel Classifier is a deep learning project designed to monitor vessel activity in marine environments using passive acoustic recordings. It processes 10-second audio clips recorded underwater and predicts the distance to the nearest vessel. This information supports marine conservation, shipping traffic analysis, and assessing human impact on sensitive ecosystems. More information can be found in the associated publication:

DOI: 10.1109/JSTARS.2025.3593779

Data Preparation

To prepare the data for distance classification, audio recordings were segmented into 10-second, non-overlapping windows. Each segment was categorized based on its proximity to the nearest vessel. The distance categories were divided into 1 km bins:

Model Approaches

The project leverages the Contrastive Language-Audio Pretraining (CLAP-LAION) model, which is built upon the original CLAP architecture. The pre-trained CLAP-LAION model named Biolingual, partly trained on underwater bioacoustic data, was used for transfer learning following two approaches:

Feature Extraction
- High-level features were extracted from Log-Mel spectrograms using the pre-trained layers.
- Extracted features were passed through three custom layers for classification.
- Computationally efficient, requiring adjustment only to the final layers.
Fine-Tuning
- Pre-trained weights were used for initialization, but the entire model was retrained.
- Followed by a single linear layer for distance classification.
- Expected to achieve slightly better performance due to retraining all layers but is significantly more computationally demanding.

This trade-off between computational efficiency and performance is important when selecting the approach. Feature extraction allows fast adaptation with limited resources, while fine-tuning can maximize predictive accuracy at higher computational cost.

Model weights

The model weights can be found here:

Model Input & Output (I/O)

The Audio Vessel Classifier can accept two types of input:

Raw 10-second audio files
- Users can upload a .wav file containing 10 seconds of underwater audio.
- The model will process the audio, extract Log-Mel spectrograms, generate embeddings (if using the feature extraction approach), and predict the distance category.
Pre-computed CLAP embeddings
- Users who already have embeddings extracted from the CLAP-LAION model can upload these directly.
- This option skips the feature extraction step and allows faster inference for users with pre-processed data.

Input Format

Audio file: .wav, 10 seconds long, mono channel recommended.
Embedding: 1D or 2D numpy array or tensor, matching the output dimensions of the pre-trained CLAP-LAION embedding layer.

Output Format

The model outputs a distance category corresponding to the proximity of the nearest vessel:

0-1 km
1-2 km
2-3 km
3-4 km
4-5 km
5-6 km
6-7 km
7-8 km
8-9 km
9-10 km
10+ km

Launch

To launch it, first install the package then run deepaas:

git clone https://github.com/ai4os-hub/audio_vessel_classifier
cd audio_vessel_classifier
pip install -e .
deepaas-run --listen-ip 0.0.0.0

After launching, you can choose the model approach:

Feature Extraction (FE): Faster, less computationally demanding.
Fine-Tuning (FT): Recommended for higher precision.

You can also select the input type:

Raw 10-second audio file (.wav): The model will process the audio, generate embeddings, and predict the distance category.
Pre-computed CLAP embedding: Upload embeddings directly for faster inference.

The model outputs:

Distance category: 0-1 km, 1-2 km, 2-3 km, 3-4 km, 4-5 km, 5-6 km, 6-7 km, 7-8 km, 8-9 km, 9-10 km, 10+ km

Project structure

│
├── Dockerfile             <- Describes main steps on integration of DEEPaaS API and
│                             audio_vessel_classifier application in one Docker image
│
├── Jenkinsfile            <- Describes basic Jenkins CI/CD pipeline (see .sqa/)
│
├── LICENSE                <- License file
│
├── README.md              <- The top-level README for developers using this project.
│
├── VERSION                <- audio_vessel_classifier version file
│
├── .sqa/                  <- CI/CD configuration files
│
├── audio_vessel_classifier    <- Source code for use in this project.
│   │
│   ├── __init__.py        <- Makes audio_vessel_classifier a Python module
│   │
│   ├── api.py             <- Main script for the integration with DEEPaaS API
│   |
│   ├── config.py          <- Configuration file to define Constants used across audio_vessel_classifier
│   │
│   └── misc.py            <- Misc functions that were helpful accross projects
│
├── data/                  <- Folder to store the data
│
├── models/                <- Folder to store models
│   
├── tests/                 <- Scripts to perfrom code testing
|
├── metadata.json          <- Metadata information propagated to the AI4OS Hub
│
├── pyproject.toml         <- a configuration file used by packaging tools, so audio_vessel_classifier
│                             can be imported or installed with  `pip install -e .`                             
│
├── requirements.txt       <- The requirements file for reproducing the analysis environment, i.e.
│                             contains a list of packages needed to make audio_vessel_classifier work
│
├── requirements-test.txt  <- The requirements file for running code tests (see tests/ directory)
│
└── tox.ini                <- Configuration file for the tox tool used for testing (see .sqa/)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Audio Vessel Classifier

Project Overview

Data Preparation

Model Approaches

Model weights

Model Input & Output (I/O)

Input Format

Output Format

Launch

Project structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 136 Commits
.sqa		.sqa
audio_vessel_classifier		audio_vessel_classifier
data		data
etc		etc
models		models
tests		tests
.gitignore		.gitignore
Dockerfile		Dockerfile
JenkinsConstants.groovy		JenkinsConstants.groovy
Jenkinsfile		Jenkinsfile
LICENSE		LICENSE
README.md		README.md
SCRIPT.PY		SCRIPT.PY
VERSION		VERSION
ai4-metadata.yml		ai4-metadata.yml
metadata.json		metadata.json
pyproject.toml		pyproject.toml
requirements-test.txt		requirements-test.txt
requirements.txt		requirements.txt
tox.ini		tox.ini

License

ai4os-hub/audio-vessel-classification

Folders and files

Latest commit

History

Repository files navigation

Audio Vessel Classifier

Project Overview

Data Preparation

Model Approaches

Model weights

Model Input & Output (I/O)

Input Format

Output Format

Launch

Project structure

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Packages