Whisper
By DorsalHub
Audited
dorsalhub/whisper
High-performance speech-to-text transcription using CTranslate2-based Whisper models.
User
Dorsalhub ModelsDorsal Whisper
A Dorsal annotation model wrapper for faster-whisper.
High-performance speech-to-text transcription using CTranslate2-based Whisper models.
Features
- Fast Inference: Uses
faster-whisper(CTranslate2) for up to 4x speed improvements over the original OpenAI implementation. - Automatic VAD: Built-in Voice Activity Detection to filter silence and improve accuracy.
- Schema Compliant: Outputs standardized JSON matching the
open/audio-transcriptionschema. - Standard Outputs: Outputs can be exported to a variety of formats including SubRip Text (.srt), Markdown (.md) and others via Dorsal Adapters.
Compatibility
- Python: 3.11, 3.12, 3.13
- Note: Python 3.14 is not yet supported due to upstream dependencies (
onnxruntimeandfaster-whisper) that are not yet compatible with 3.14.
Quick Start
Run the model directly against an audio or video file (downloads and installs if not already installed):
dorsal run dorsalhub/dorsal-whisper ./audio.wav
Configuration Options
You can pass options to the model using the --opt (or -o) flag.
Example: use a larger whisper model and translate the output:
dorsal model run dorsalhub/dorsal-whisper ./audio.wav --opt model_size=large-v3 --opt task=translate
Note: You may need to install NVIDIA libraries (cuBLAS/cuDNN) separately if you intend to run on GPU. See the faster-whisper documentation for GPU setup.
Supported Core Options:
model_size(default:base)beam_size(default:5)vad_filter(default:true)batch_size(Wraps the model in aBatchedInferencePipelinefor much faster processing)compute_type(default:default, can forceint8orfloat16)**kwargs: Any additional arguments supported byfaster-whisper'stranscribemethod (e.g.,task="translate",language="fr",word_timestamps=true).
Passing Complex Arguments (JSON)
Dorsal's CLI natively supports parsing JSON strings for advanced configuration. This is incredibly useful for tuning faster-whisper's Voice Activity Detection (VAD) to prevent the model from hallucinating or looping on background noise.
Pass dictionary arguments by wrapping valid JSON in single quotes:
dorsal model run dorsalhub/dorsal-whisper ./video.mkv \
--opt model_size=large-v2 \
--opt vad_parameters='{"threshold": 0.8, "min_speech_duration_ms": 250}'
Output Formats & Exporting
By default, the CLI outputs a table with timestamps, and saves a validated JSON record to the current working directory.
You also export to other standard formats right from the CLI:
# Export directly to SubRip Subtitle format (.srt)
dorsal model run dorsalhub/dorsal-whisper ./video.mkv --export=srt
Output
This model produces a file annotation conforming to the Open Validation Schemas Audio Transcription schema:
- Schema ID:
open/audio-transcription(v0.5.0) - Key Fields:
text: The full concatenated transcript.segments: An array of timed segments withstart_time,end_time, andscore.language: The detected ISO-639-3 language code (e.g.,eng).duration: The total duration of the source media in seconds.attributes: Includeslanguage_probability.
Development
Running Tests
This repository uses pytest for integration testing.
pip install -e .[test]
pytest
License
This project is licensed under the Apache 2.0 License.
Install
To use this model, you must have Dorsal installed in your environment:
pip install dorsalhub
Once installed, run the command below in your terminal to install the model:
dorsal model install dorsalhub/whisper
- Version
- 0.3.0
- Published By
-
Dorsalhub Models
- Creation Date
- 2026-02-26
- Last Modified Date
- 2026-02-26
- Source Code
- GitHub
Output
- Schema
- open/audio-transcription
Supported Media