🎧 Splicing and Copy-Move Audio Forgery Dataset Generator

This project contains two audio forgery dataset generators based on the TIMIT speech corpus. It simulates splicing and copy-move forgeries for use in training and evaluating audio forensic systems.

🛠️ Overview

The dataset generation process involves applying transformations to authentic audio files from TIMIT using two distinct methods:

🔀 1. RandomPosition Method

Simulates forgeries by:

Selecting a random segment from the original audio.
Inserting that segment at a random new position.
Reconstructing the audio so that the inserted segment appears naturally within the waveform.

📌 Forgery Sample Generation

Original A: ---[Original Audio A] Original B: ---[Original Audio B]---
Forgery: ---[Segment from A]---[Segment from B]---[Remaining A]---

🔁 2. Concatenation Method

Based on the paper:
"Autoencoder for Audio Forgery Detection using Spliced and Copy-Move Audio",
📄 Shaikh et al., 2021
Read the paper here

This method simulates forgeries by:

Extracts 2-second and 1-second segments from each audio file.
Concatenates them in different combinations to simulate forged samples.
Produces:
- 3-second forgered audio
- 2-second forgered audio

📌 Forgery Sample Generation

Forgery: 2s [Segment from A] + 1s [Segment from B] → 3s [Forgered Audio]
Forgery: 1s [Segment from A] + 1s [Segment from B] → 2s [Forgered Audio]
Forgery: 1s [Segment from A] + 1s [Segment from B] + 1s [Segment from A] → 3s [Forgered Audio]
Forgery: 0.5s [Segment from A] + 1s [Segment from B] + 0.5s [Segment from A] → 2s [Forgered Audio]

📂 Output

For each original audio file, this tool will generate:

Original audio dataset
Copy-move forgeries dataset
Splicing forgeries dataset

📌 Use Cases

Training deep learning models for audio forgery detection
Evaluating robustness of audio forensic systems
Dataset creation for research in speech integrity

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
ConcatenationMethod		ConcatenationMethod
RandomPositionMethod		RandomPositionMethod
__pycache__		__pycache__
.gitattributes		.gitattributes
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🎧 Splicing and Copy-Move Audio Forgery Dataset Generator

🛠️ Overview

🔀 1. RandomPosition Method

🔁 2. Concatenation Method

📂 Output

📌 Use Cases

About

Uh oh!

Releases

Packages

Languages

JoseRuiz01/SplicingAndCopyMoveDatasetGenerator

Folders and files

Latest commit

History

Repository files navigation

🎧 Splicing and Copy-Move Audio Forgery Dataset Generator

🛠️ Overview

🔀 1. RandomPosition Method

🔁 2. Concatenation Method

📂 Output

📌 Use Cases

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages