This project uses OpenAI's Whisper model to transcribe audio files from a directory and save the results as text files in another directory.
- Python 3.x
whisper
library (install viapip install openai-whisper
)ffmpeg
(required by Whisper, install via your package manager)
-
Clone the repository:
git clone https://github.com/yourusername/speech-to-text.git cd speech-to-text
-
Install dependencies:
pip install -r requirements.txt
-
(Optional) Create and activate a virtual environment:
python3 -m venv venv source venv/bin/activate # On macOS/Linux venv\Scripts\activate # On Windows
-
Default directories and language:
python3 src/transcribe_audio.py
This will transcribe all
.ogg
and.wav
files from thevoice_input
directory and save the results in thetext_output
directory. The default language is Russian (ru
). -
Custom directories and language:
python3 src/transcribe_audio.py --input_dir my_input_folder --output_dir my_output_folder --language en
This will transcribe files from
my_input_folder
and save the results inmy_output_folder
. The language is set to English (en
).
-
Default directories and language:
make transcribe
This will transcribe all
.ogg
and.wav
files from thevoice_input
directory and save the results in thetext_output
directory. The default language is Russian (ru
). -
Custom directories and language:
make transcribe INPUT_DIR=my_input_folder OUTPUT_DIR=my_output_folder LANGUAGE=ru
This will transcribe files from
my_input_folder
and save the results inmy_output_folder
. The language is set to English (ru
).
- Ensure that the
voice_input
directory exists and contains valid audio files. - The
text_output
directory will be created automatically if it doesn't exist. - Supported languages include
ru
(Russian),en
(English), and others. Refer to the Whisper documentation for a full list.