Voice activity detection in noisy environment

The project was developed in Google Colab GPU environment and the training of the model generally takes about 6-8 hours on CPU for about 500 sample audio files where as the model training takes about 10-15 mins on GPU for the same amount of the utterances.

Dataset can be downloaded and extracted from : https://ai.googleblog.com/2017/08/launching-speech-commands-dataset.html

The audio files were manually sampled from each utterance and saved in a single file for my project rather than how it is present in the extracted compressed files in multiple folders.

The audio data was converted to spectrograms via framing using hamming window.

The spectrograms generated was used to train CNN to develop a model to predict the mask that was initially generated using the thresholding technique on the signal without the presence of noise.

The trained model is available in h5 file format at : https://drive.google.com/open?id=1nY3fAWb6SHOy0FvzlaBGUf6C-JTEFrMH

The required code to reload the model is provided at the end to load and make predictions.

Speech validation signal with noise and after multiplying with the mask genreated from CNN is available in the same location as model.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
ReadME.txt		ReadME.txt
VAD_N.py		VAD_N.py
Voice Activity Detection.pdf		Voice Activity Detection.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Voice activity detection in noisy environment

About

Uh oh!

Releases

Packages

Languages

License

S4nd3sh/voice_activity_detection_in_noisy_environment

Folders and files

Latest commit

History

Repository files navigation

Voice activity detection in noisy environment

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages