This project contains two audio forgery dataset generators based on the TIMIT speech corpus. It simulates splicing and copy-move forgeries for use in training and evaluating audio forensic systems.
The dataset generation process involves applying transformations to authentic audio files from TIMIT using two distinct methods:
Simulates forgeries by:
- Selecting a random segment from the original audio.
- Inserting that segment at a random new position.
- Reconstructing the audio so that the inserted segment appears naturally within the waveform.
π Forgery Sample Generation
Original A: ---[Original Audio A]
Original B: ---[Original Audio B]---
Forgery: ---[Segment from A]---[Segment from B]---[Remaining A]---
Based on the paper:
"Autoencoder for Audio Forgery Detection using Spliced and Copy-Move Audio",
π Shaikh et al., 2021
Read the paper here
This method simulates forgeries by:
- Extracts 2-second and 1-second segments from each audio file.
- Concatenates them in different combinations to simulate forged samples.
- Produces:
- 3-second forgered audio
- 2-second forgered audio
π Forgery Sample Generation
- Forgery:
2s [Segment from A] + 1s [Segment from B] β 3s [Forgered Audio]
- Forgery:
1s [Segment from A] + 1s [Segment from B] β 2s [Forgered Audio]
- Forgery:
1s [Segment from A] + 1s [Segment from B] + 1s [Segment from A] β 3s [Forgered Audio]
- Forgery:
0.5s [Segment from A] + 1s [Segment from B] + 0.5s [Segment from A] β 2s [Forgered Audio]
For each original audio file, this tool will generate:
- Original audio dataset
- Copy-move forgeries dataset
- Splicing forgeries dataset
- Training deep learning models for audio forgery detection
- Evaluating robustness of audio forensic systems
- Dataset creation for research in speech integrity