The Speech Signal Processing Toolkit (SPTK) is a software for speech signal processing tools.
- SPTK consists of over 100 commands for speech signal processing.
- The data format used in SPTK is raw header-less, i.e., there is no specific structure.
Thanks to the data format, we can check file contents immediately on CUI.
dmp +s data.raw
- The data used in the commands is passed through standard input/output.
We can chain multiple processes using pipes.
x2x +sd < data.raw | clip | x2x +da | less
- The data type is basically little-endian double 8 bytes.
- The commands do not require interactive user inputs.
Parameters are set via command line options beforehand.
impulse -l 4 | sopr -m 10 | x2x +da
- Refer to the reference manual.
- Refer to the tutorial slides.
- Our paper is available on the ISCA Archive.
- GCC 4.8.5+ / Clang 3.5.0+ / Visual Studio 2015+
- CMake 3.1+
expand
The latest release can be downloaded through Git. The install procedure is as follows.
git clone https://github.com/sp-nitech/SPTK.git
cd SPTK
makeThen the SPTK commands can be used by adding bin/ directory to the PATH environment variable.
If you would like to use a part of the SPTK functions, please link the static library lib/libsptk.a.
expand
You may need to add cmake and MSBuild to the PATH environment variable in advance.
Please run make.bat or open Command Prompt and follow the below procedure:
cd /path/to/SPTK # Please change here to your appropriate path.
mkdir build
cd build
cmake .. -DCMAKE_INSTALL_PREFIX=.. # Please change install directory.
MSBuild /p:Configuration=Release INSTALL.vcxprojYou can compile SPTK via GUI instead of running MSBuild by opening the generated project file.
Then the SPTK functions can be used by linking the static library lib/sptk.lib.
SPTK provides some examples.
Go to an example directory and execute run.sh, e.g.,
cd egs/analysis_synthesis/mgc
./run.shThe below is a simple example that decreases the volume of input audio in input.wav.
You may need to install sox command on your system.
sox -t wav input.wav -c 1 -t s16 -r 16000 - |
x2x +sd | sopr -m 0.5 | x2x +ds -r |
sox -c 1 -t s16 -r 16000 - -t wav output.wavIf you would like to draw figures, please prepare a python environment.
cd tools; make venv PYTHON_VERSION=3.8; cd ..
. ./tools/venv/bin/activate
impulse -l 32 | gseries impulse.png
deactivate- Input and output types are changed to double from float
- Signal processing classes are written in C++ instead of C
- Drawing commands are implemented in Python
- Some option names
- No memory leaks
- Thread-safe
- New main features:
- Aperiodicity extraction (
ap) - Dynamic range compression (
drc) - Magic number interpolation (
magic_intpl) - Median filter (
medfilt) - Mel-filter-bank extraction (
fbank) - Nonrecursive MLPG (
mlpg -R 1) - Pitch adaptive spectrum estimation (
pitch_spec) - Pitch extraction used in WORLD (
pitch -a 3andpitch -a 4) - PLP extraction (
plp) - Sinusoidal generation from pitch (
pitch2sin) - Subband decomposition (
pqmfandipqmf) - WORLD synthesis (
world_synth) - Windows build support
- Aperiodicity extraction (
- Obsoleted commands:
acep,agcep, andamcep->amgcepbellc2sp->mgc2spcat2andecho2dads,us,us16, anduscd->soxfiggc2gc->mgc2mgcgcep,mcep, anduels->mgcepglsadf,lmadf, andmlsadf->mglsadfivqandvq->imsvqandmsvqlsp2sp->mglsp2spmgc2mgclspandmgclsp2mgcpsgrandxgrraw2wav,wav2raw,wavjoin, andwavsplit->sox
- Separated commands:
c2ir->c2mpirandmpir2cdtw->dtwanddtw_mergemglsadf->mglsadfandimglsadftrain->trainandmsequlaw->ulawandiulawvstat->vstatandmedian
- Renamed commands:
mgclsp2sp->mglsp2sp
- Keiichi Tokuda - Produce and Design - Nagoya Institute of Technology
- Keiichiro Oura - Nagoya Institute of Technology
- Takenori Yoshimura - Main Maintainer - Nagoya Institute of Technology
- Takato Fujimoto - Nagoya Institute of Technology
- Akira Tamamori
- Cassia Valentini
- Chiyomi Miyajima
- Fernando Gil Resende Junior
- Gou Hirabayashi
- Heiga Zen
- Junichi Yamagishi
- Kazuhito Koishida
- Keiichi Tokuda
- Keiichiro Oura
- Kenji Chiba
- Masatsune Tamura
- Naohiro Isshiki
- Noboru Miyazaki
- Satoshi Imai
- Shinji Sako
- Tadashi Kitamura
- Takao Kobayashi
- Takashi Masuko
- Takashi Nose
- Takato Fujimoto
- Takayoshi Yoshimura
- Takenori Yoshimura
- Toru Takahashi
- Toshiaki Fukada
- Toshihiko Kato
- Toshio Kanno
- Yoshihiko Nankaku
This software is released under the Apache License 2.0.
@InProceedings{sp-nitech2023sptk,
author = {Takenori Yoshimura and Takato Fujimoto and Keiichiro Oura and Keiichi Tokuda},
title = {{SPTK4}: An open-source software toolkit for speech signal processing},
booktitle = {12th ISCA Speech Synthesis Workshop (SSW 2023)},
pages = {211--217},
year = {2023},
}