Whisper

DorsalHub Logo By DorsalHub Audited

dorsalhub/whisper

High-performance speech-to-text transcription using CTranslate2-based Whisper models.

Apache-2.0

Dorsal Whisper

A Dorsal annotation model wrapper for faster-whisper.

High-performance speech-to-text transcription using CTranslate2-based Whisper models.

Features

  • Fast Inference: Uses faster-whisper (CTranslate2) for up to 4x speed improvements over the original OpenAI implementation.
  • Automatic VAD: Built-in Voice Activity Detection to filter silence and improve accuracy.
  • Schema Compliant: Outputs standardized JSON matching the open/audio-transcription schema.
  • Standard Outputs: Outputs can be exported to a variety of formats including SubRip Text (.srt), Markdown (.md) and others via Dorsal Adapters.

Compatibility

  • Python: 3.11, 3.12, 3.13
  • Note: Python 3.14 is not yet supported due to upstream dependencies (onnxruntime and faster-whisper) that are not yet compatible with 3.14.

Quick Start

Run the model directly against an audio or video file (downloads and installs if not already installed):

dorsal run dorsalhub/dorsal-whisper ./audio.wav

Configuration Options

You can pass options to the model using the --opt (or -o) flag.

Example: use a larger whisper model and translate the output:

dorsal model run dorsalhub/dorsal-whisper ./audio.wav --opt model_size=large-v3 --opt task=translate

Note: You may need to install NVIDIA libraries (cuBLAS/cuDNN) separately if you intend to run on GPU. See the faster-whisper documentation for GPU setup.

Supported Core Options:

  • model_size (default: base)
  • beam_size (default: 5)
  • vad_filter (default: true)
  • batch_size (Wraps the model in a BatchedInferencePipeline for much faster processing)
  • compute_type (default: default, can force int8 or float16)
  • **kwargs: Any additional arguments supported by faster-whisper's transcribe method (e.g., task="translate", language="fr", word_timestamps=true).

Passing Complex Arguments (JSON)

Dorsal's CLI natively supports parsing JSON strings for advanced configuration. This is incredibly useful for tuning faster-whisper's Voice Activity Detection (VAD) to prevent the model from hallucinating or looping on background noise.

Pass dictionary arguments by wrapping valid JSON in single quotes:

dorsal model run dorsalhub/dorsal-whisper ./video.mkv \
  --opt model_size=large-v2 \
  --opt vad_parameters='{"threshold": 0.8, "min_speech_duration_ms": 250}'

Output Formats & Exporting

By default, the CLI outputs a table with timestamps, and saves a validated JSON record to the current working directory.

You also export to other standard formats right from the CLI:

# Export directly to SubRip Subtitle format (.srt)
dorsal model run dorsalhub/dorsal-whisper ./video.mkv --export=srt

Output

This model produces a file annotation conforming to the Open Validation Schemas Audio Transcription schema:

  • Schema ID: open/audio-transcription (v0.5.0)
  • Key Fields:
  • text: The full concatenated transcript.
  • segments: An array of timed segments with start_time, end_time, and score.
  • language: The detected ISO-639-3 language code (e.g., eng).
  • duration: The total duration of the source media in seconds.
  • attributes: Includes language_probability.

Development

Running Tests

This repository uses pytest for integration testing.

pip install -e .[test]
pytest

License

This project is licensed under the Apache 2.0 License.

Install

To use this model, you must have Dorsal installed in your environment:

pip install dorsalhub

Once installed, run the command below in your terminal to install the model:

dorsal model install dorsalhub/whisper

Model Details

Version
0.3.0
Published By
Owner Dorsalhub Models
Creation Date
2026-02-26
Last Modified Date
2026-02-26
Source Code
GitHub

Supported Media

audio video
Requirements
dorsalhub>=0.8.2 faster-whisper>=0.10.0 protobuf>=6.33.5

License