ASR Models
DolphinAsr Series
Notes
- License: Apache 2.0
opt: Optimized version, moves audio feature extraction module out of the model to reduce inference overhead- Full language and region code mapping:
DolphinAsr-base Models
DolphinAsr-small Models
FireRedAsr Series
FireRedAsr-AED Chinese-English Model (v1)
FireRedAsr2-AED Chinese-English Model (v2)
Fun-ASR Series
Notes
- Model background: End-to-end speech recognition foundation model released by Tongyi Lab. Pre-trained on tens of millions of hours of real speech data, featuring strong contextual understanding and domain adaptability
- Features: All models are non-streaming, support punctuation, support timestamps. Support low-latency real-time transcription, with recognition accuracy reaching 93% in far-field, high-noise environments
- Version identifier meanings:
int8: INT8 quantized version, smaller size, faster inference, suitable for edge deploymentLLM: Large model enhanced version, stronger context understanding, suppresses recognition hallucinationsCTC: Lightweight classic CTC architecture version, lightweight inferenceMLT: Multilingual general-purpose version, covers 31 languagessplit-adaptor: Version with feature adaptation module deployed separately- Language and capability notes:
- Fun-ASR-Nano: Supports Chinese, English, Japanese; 7 dialects (Wu, Cantonese, Min, Hakka, Gan, Xiang, Jin); 26 regional accents (Henan, Shanxi, Hubei, Sichuan, Chongqing, Yunnan, Guizhou, Guangdong, Guangxi, Shaanxi, Hebei, Shandong, Anhui, Tianjin, Ningxia, Liaoning, Gansu, Hunan, Heilongjiang, Jilin, Inner Mongolia, Jiangsu, Zhejiang, Fujian, Jiangxi, Hainan); additionally supports lyrics recognition and rap speech recognition
- Fun-ASR-MLT-Nano: Supports 31 languages total: Chinese, English, Cantonese, Japanese, Korean, Vietnamese, Indonesian, Thai, Malay, Filipino, Arabic, Hindi, Bulgarian, Croatian, Czech, Danish, Dutch, Estonian, Finnish, Greek, Hungarian, Irish, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovenian, Swedish
- Domain advantages: Excellent performance in vertical fields such as education and finance, accurately recognizes domain-specific terminology, effectively suppresses hallucinations and language confusion
Fun-ASR-Nano Models
Fun-ASR-MLT-Nano Models
FunASR Series
Paraformer Chinese-English Models
Paraformer Cantonese/Chinese/English Multilingual Models
SeACo-Paraformer Hotword Customization Model
SeACoParaformer is a next-generation non-autoregressive speech recognition model with hotword customization, proposed by Alibaba Speech Lab. Compared to the previous CLAS-based hotword customization solution, SeACoParaformer decouples the hotword module from the ASR model and performs hotword boosting via posterior probability fusion, making the boosting process visible and controllable, while significantly improving hotword recall.
SenseVoice Models
K2TransducerAsr Series
Streaming Models
Non-streaming Models
MedAsr Series
Notes
- Model architecture: Based on Conformer, a medical-domain speech recognition model released by Google Health
- Application scenarios: Suitable for radiology dictation, doctor-patient dialogue, medical transcription, etc.
- Supported languages: English only (primarily American English)
- Model characteristics: Pre-trained on approximately 5,000 hours of medical speech data, strong recognition of medical terminology. Performance on non-standard drug names and structured data such as dates/times may vary, suitable for fine-tuning to adapt to specific business scenarios
moonshine Series
moonshine-tiny Models
moonshine-base Models
WeNet Series
Streaming Models
Non-streaming Models
Whisper Series
Notes
- Models with
-kvsuffix have KV Cache inference acceleration enabled- All models support punctuation and timestamps. Output paragraph-level timestamps by default, can enable word-level timestamps via parameters
- Language coverage:
- Standard multilingual versions (tiny/small/medium/large-v1/large-v2): Support 99 languages (including Chinese, Cantonese, English, Japanese, Korean, Russian, Arabic, Vietnamese, Ukrainian, and other major world languages)
- large-v3 / large-v3-turbo series: Extend low-resource languages beyond the 99, total approximately 106 languages. New additions include Zulu (zu), Maori (mi), Swahili (sw), Hausa (ha), etc., with significantly improved language identification
- Full language list and codes:
- Language code short form:
whisper-tiny Models
whisper-small Models
whisper-medium Models
whisper-large Models
Distil-Whisper Models
General Notes
int8= quantized version, smaller size, faster speedkv/selfcrosskv/selfcrosskvstack/opt= inference optimization versions- Some models provide HuggingFace or GitHub sources; see each table

