Silero-VAD Series

Notes

  • Model background: A deep learning‑based VAD model released by the open‑source community snakers4. A mainstream open‑source VAD solution, designed for 16kHz audio input.
  • Features: Applies pre‑processing steps (pre‑emphasis, framing, windowing) to the audio, then learns voice features through a neural network to accurately distinguish speech from non‑speech segments. Adaptively adjusts detection thresholds based on the noise environment, performing particularly well in noisy scenes. Iterative versions continuously improve noise robustness.
  • Open source repository: https://github.com/snakers4/silero-vad

Model List

Model NameDescriptionDownload Link
silero-vad-onnxBase version, suitable for general‑purpose scenariosmodelscope
silero-vad-v5-onnxV5 iterative version, optimised for noisy environmentsmodelscope
silero-vad-v6-onnxV6 latest version, best detection performance in noisy scenesmodelscope