Skip to content

Commit 3204636

Browse files
authored
Update README.md
1 parent 45c662b commit 3204636

File tree

1 file changed

+18
-89
lines changed

1 file changed

+18
-89
lines changed

docs/whisper_transcription/README.md

Lines changed: 18 additions & 89 deletions
Original file line numberDiff line numberDiff line change
@@ -19,91 +19,8 @@ The overall architecture consists of several key stages. First, audio is convert
1919

2020
---
2121

22-
## Installation
23-
24-
### 1. Create virtual environment
25-
```bash
26-
python3 -m venv whisper_env
27-
source whisper_env/bin/activate
28-
```
29-
30-
### 2. Install PyTorch (with CUDA 12.1 for H100/A100)
31-
```bash
32-
pip install torch==2.2.2+cu121 torchaudio==2.2.2+cu121 -f https://download.pytorch.org/whl/torch_stable.html
33-
```
34-
35-
### 3. Install requirements
36-
```bash
37-
pip install -r requirements.txt
38-
```
39-
40-
> Make sure to have `ffmpeg` installed on your system:
41-
```bash
42-
sudo apt install ffmpeg
43-
```
44-
45-
---
46-
4722
## Usage
4823

49-
### CLI Transcription & Summarization
50-
51-
```bash
52-
python faster_code_week1_v28.py \
53-
--input /path/to/audio_or_folder \
54-
--model medium \
55-
--output-dir output/ \
56-
--summarized-model mistralai/Mistral-7B-Instruct-v0.1 \
57-
--summary \
58-
--speaker \
59-
--denoise \
60-
--prop-decrease 0.7 \
61-
--hf-token YOUR_HUGGINGFACE_TOKEN \
62-
--streaming \
63-
--max-speakers 2 \
64-
--ground-truth ground_truth.txt
65-
```
66-
67-
**Optional flags:**
68-
69-
| Argument | Description |
70-
|---------------------|-----------------------------------------------------------------------------|
71-
| `--input` | **Required.** Path to input file or directory of audio/video. |
72-
| `--model` | Whisper model to use (`base`, `small`, `medium`, `large`, `turbo`). Auto-detects if not specified. |
73-
| `--output-dir` | Directory to store output files. Defaults to a timestamped folder. |
74-
| `--summarized-model`| Hugging Face or local LLM for summarization. Default: `Mistral-7B`. |
75-
| `--denoise` | Enable two-stage denoising (Demucs + noisereduce). |
76-
| `--prop-decrease` | Float [0.0–1.0]. Controls noise suppression. Default = 0.7 |
77-
| `--summary` | Enable summarization after transcription. |
78-
| `--speaker` | Enable speaker diarization using PyAnnote. |
79-
| `--streaming` | Stream results in real-time chunk-by-chunk. |
80-
| `--hf-token` | Hugging Face token for gated model access. |
81-
| `--max-speakers` | Limit the number of identified speakers. Optional. |
82-
| `--ground-truth` | Path to ground truth `.txt` for WER evaluation. Optional. |
83-
84-
---
85-
86-
### Start API Server
87-
88-
```bash
89-
uvicorn whisper_api_server:app --host 0.0.0.0 --port 8000
90-
```
91-
92-
### Example API Call
93-
94-
```bash
95-
curl -X POST http://<YOUR_IP>:8000/transcribe \
96-
-F "audio_file=@test.wav" \
97-
-F "model=medium" \
98-
-F "summary=true" \
99-
-F "speaker=true" \
100-
-F "denoise=false" \
101-
-F "streaming=true" \
102-
-F "hf_token=hf_xxx" \
103-
-F "max_speakers=2"
104-
```
105-
106-
10724
### Start Blueprint Deployment
10825
in the deploymen part of Blueprint, add a recipe suchas the following
10926
```bash
@@ -142,15 +59,15 @@ https://whisper-transcription-a10-6666.130-162-199-33.nip.io/transcribe
14259
|----------------|-----------|-----------------------------------------------------------------------------------------------------------------------------------------------------------|
14360
| `audio_url` | `string` | URL to a Pre-Authenticated Request (PAR) of the audio file stored in OCI Object Storage. |
14461
| `model` | `string` | Whisper model name to use (`base`, `medium`, `turbo`, etc.). |
145-
| `summary` | `bool` | Whether to generate a summary at the end. If `true` and no custom model path is provided, `mistralai/Mistral-7B-Instruct-v0.1` will be loaded from Hugging Face. Requires `hf_token`. |
146-
| `speaker` | `bool` | Whether to enable speaker diarization. Requires `hf_token`. If `false`, all segments will be labeled as "Speaker 1". |
62+
| `summary` | `bool` | (Optional)Whether to generate a summary at the end. If `true` and no custom model path is provided, `mistralai/Mistral-7B-Instruct-v0.1` will be loaded from Hugging Face. Requires `hf_token`. |
63+
| `speaker` | `bool` | (Optional)Whether to enable speaker diarization. Requires `hf_token`. If `false`, all segments will be labeled as "Speaker 1". |
14764
| `max_speakers` | `int` | (Optional) Helps improve diarization accuracy by specifying the expected number of speakers. |
14865
| `denoise` | `bool` | (Optional) Apply basic denoising to improve quality in noisy recordings. |
14966
| `streaming` | `bool` | (Optional) Enable real-time log streaming for transcription chunks and progress updates. |
150-
| `hf_token` | `string` | Hugging Face token, required for loading models like Mistral or enabling speaker diarization. |
151-
| `prop-decrease` | `Float` | Controls noise suppression. Default = 0.7 |
152-
| `summarized-model` | `path` | Hugging Face or local LLM path for summarization. Default: Mistral-7B. |
153-
| `ground-truth` | `path` | Path to ground truth `.txt` file for WER evaluation. |
67+
| `hf_token` | `string` | (Optional)Hugging Face token, required for loading models like Mistral or enabling speaker diarization. |
68+
| `prop-decrease` | `Float` | (Optional) Controls level of noise suppression. Range: 0.0–1.0. Default: 0.7. |
69+
| `summarized-model` | `path` | (Optional) Path or HF ID of LLM used for summarization. Default: mistralai/Mistral-7B-Instruct-v0.1. |
70+
| `ground-truth` | `path` | (Optional) Path to .txt file with expected transcription for WER (Word Error Rate) evaluation. |
15471

15572

15673
---
@@ -168,7 +85,19 @@ curl -k -N -L -X POST https://<YOUR_DEPLOYMENT>.nip.io/transcribe \
16885
-F "hf_token=hf_xxxxxxxxxxxxxxx" \
16986
-F "max_speakers=2"
17087
```
88+
**Example:**
17189

90+
```bash
91+
curl -k -N -L -X POST https://whisper-transcription-a10-6666.130-162-199-33.nip.io/transcribe\
92+
-F "audio_url=https://objectstorage.ap-melbourne-1.oraclecloud.com/p/Kn-d3p3vHBqYGck5hcG24p1BrE63d7MN4jqpQzmYWchBIPZA5bymsVPWwJl-VbPq/n/iduyx1qnmway/b/whisper-transcription/o/test.wav" \
93+
-F "model=turbo" \
94+
-F "summary=true" \
95+
-F "speaker=true" \
96+
-F "streaming=true" \
97+
-F "denoise=false" \
98+
-F "hf_token=hf_xxxxxxxxxxxxxxx" \
99+
-F "max_speakers=2"
100+
```
172101
---
173102

174103
#### Real-Time Log Streaming

0 commit comments

Comments
 (0)