Model Selection Guide
Faced with an ever‑growing list of models, you don't need to memorise every name. This guide teaches you how to read the key metrics of each model, then match them to your own needs – language, real‑time requirement, hardware, timestamps … filter step by step, and only a handful of models will remain.
📌 All models are in ONNX format. When you use them for the first time in
manyspeech, they are downloaded automatically.
I. ASR (Speech Recognition) Models
1. Six Key Metrics: Understand the Model, Then Write the Command
Every model table contains these columns. Once you understand them, you'll know what to choose and how to write the command.
Why can you ignore punctuation? (for Chinese/English scenarios)
- Because regardless of whether the model natively outputs punctuation,
manyspeechby default calls a punctuation restoration model (--punc) to add punctuation to the output. You don't need to worry aboutPunctuation = No.
Microphone → online, file → offline
This is the recommended configuration in most cases. You can also mix them:
A very basic command example (assuming you have already chosen a model called some-model):
⚠️ The
asrsubcommand must include-i, either-i fileor-i mic. It cannot be omitted. If you omit--model, the program uses a built‑in default model, which may not be suitable for your scenario – we recommend always specifying a model.
2. More Information Is Encoded in the Model Name
The model name itself is a “mini spec sheet”. Besides the metrics in the table, the name reveals additional details.
When you see
distil-whisper-xxx, it is faster and smaller than awhisper-xxxof the same size – ideal for resource‑constrained scenarios.
When you seexxx-cantonese-onnxorxxx-wenetspeech-yue, it has been fine‑tuned for Cantonese and will be more accurate than a general model.
When you seexxx-onnx-opt, it usually performs better thanxxx-onnx.
3. Four‑Step Selection: Filter According to Your Needs
Step 1: What language(s) do you mainly speak?
Look at the Languages column in the tables and filter for models that include your required language(s).
- Mandarin: Prefer models tagged with
zhorChinese - Code‑switching Chinese/English: Look for
zh-enorChinese/English - Cantonese: Look for
yueorCantonese(or names containingcantonese/yue) - English: Look for
enorEnglish, ordistil-whisper-*-enetc. - Japanese/Korean/Thai/Russian etc.: Find models with the corresponding language tags (e.g.
ja,ko,th,ru) - Many languages worldwide: Look for
multilingual,multi, or models supporting many languages (e.g. Whisper series supports 99‑106 languages)
A model that is specifically designed for a given language will usually achieve higher accuracy on that language. Multilingual models are convenient but may be slightly less accurate than dedicated ones.
If you seefinetuneor a specific suffix like-belleor-wenetspeech, it means the model has been fine‑tuned for a vertical domain (medical, conversation, dialect). If your scenario matches, give it priority.
Step 2: Real‑time or offline?
Check the Type column:
Step 3: What is your hardware level?
Look at the Precision suffix and the size encoded in the model name:
int8quantised versions reduce size by 50‑75%, increase speed by 2‑4x, and typically lose less than 1% accuracy – strongly recommended.
distil-*models are faster and smaller than the original of the same size – also good for resource‑constrained scenarios.
Step 4: Do you need to generate subtitles?
Without using a VAD model:
- Need SRT/VTT subtitles → You must select a model with
Timestamps = Yes(the column markedYes, or a name containingtimestamp) - No subtitles needed → Ignore the timestamp metric; any model works
When using a VAD model:
- The model's timestamp metric is not required. You can even use only an online model and still generate subtitles with timestamps.
You don't need to worry about punctuation – the program adds it automatically. So the
Punctuationcolumn can be completely ignored.
Extra requirement: Hotword customisation
If you want to improve recognition of specific terms (brand names, person names, technical terms), look for models with seaco in their name (SeACo‑Paraformer). They support hotword boosting.
4. Common Command Templates (just fill in the model you selected)
II. VAD (Voice Activity Detection) Models
1. Available Models
2. Selection Advice
- Quiet environment: Use the default
alifsmnvad-onnx. - Noisy environment (fan, traffic, multiple people chatting): Switch to
silero-vad-v6-onnx.
3. Common Commands
III. Punctuation Restoration Models
1. Available Model
2. Notes
- By default, ASR automatically enables this model to add punctuation to recognition results. Usually no manual intervention is needed.
- If you want to call it manually or test it, you can use the
puncsubcommand.
3. Common Commands
IV. AudioSep Audio Separation Models (planned)
This feature is under development. No models are available yet. The following is a preview of selection dimensions.
4.1 Future Available Models (example)
- Vocals separation models (e.g. ONNX versions of Demucs, Spleeter)
- Accompaniment / instrument separation models
4.2 Selection Dimensions (planned)
4.3 Placeholder Command (future implementation)
V. Frequently Asked Questions
Q: In the ASR table, Punctuation says “No”. Will the output have punctuation?
A: Yes. Because the program by default calls punctuation restoration (--punc). You don't need to worry about whether the model itself outputs punctuation.
Q: Can KV acceleration and int8 be used together?
A: Yes. Models whose names contain both int8 and kv/selfcrosskv support that combination.
Q: What is the difference between distil‑whisper and regular whisper?
A: distil‑whisper is a distilled version – smaller, faster, slightly lower accuracy but usually sufficient. It is suitable for resource‑constrained scenarios.
Q: How can I tell whether a model supports my language?
A: Look at the Languages column in the table, or infer from language codes in the model name (zh / yue / en / ja / ko etc.).
Q: What if downloading is too slow?
A: Download manually from ModelScope and place the files in your --base directory (by default, models/ under the program directory).
VI. Summary: Selecting a Model Means Selecting Metrics
Follow this order to filter models in the list, put the resulting model name after --model, and run the command. If the result is not satisfactory, adjust the filters and try another batch of models.
Next Steps
- Model Library - Detailed model documentation

