Punc Models
CT-Transformer Series
Notes
- Model background: Punctuation model open-sourced by Alibaba DAMO Academy, built on the Controllable Time-delay Transformer (CT-Transformer) architecture. Designed primarily for post-processing of ASR results to predict and restore punctuation in text.
- Features: The model consists of three parts: Embedding, Encoder, Predictor. Embedding fuses word vectors and positional vectors; Encoder supports various network structures such as Transformer and Conformer; Predictor predicts punctuation type per token. To address the issues of high inference latency and frequent punctuation flickering in traditional Transformers, CT-Transformer achieves controllable inference latency while maintaining accuracy, making it suitable for real‑time business scenarios. Test results on general domain business datasets: Precision 53.8%, Recall 60.0%, F1 score 56.5%. Total training samples: approximately 33 million.
- Open source repository: https://github.com/modelscope/FunASR
Terminology explanations
int8: INT8 quantised version, reduces model size and speeds up inference, with a small loss in accuracymge: Targeted quantisation optimisation for core layers (MatMul, Gather, Embed). Further reduces model size, improves loading and inference speed; accuracy may degrade slightly
Model List
FireRedPunc Series
Notes
- Model background: FireRedPunc is an independent punctuation prediction module within the FireRedASR2S integrated speech system. Built on the BERT architecture, it is designed for ASR post‑processing scenarios and supports Chinese‑English bilingual punctuation restoration.
- Features: The model achieves SOTA performance in the industry, with an average F1 score of 78.90%. It performs excellently across multiple domains for both Chinese and English datasets, adapting to various offline and real‑time transcription tasks.
- Open source repository: https://github.com/FireRedTeam/FireRedASR2S

