Punc Models

CT-Transformer Series

Notes

  • Model background: Punctuation model open-sourced by Alibaba DAMO Academy, built on the Controllable Time-delay Transformer (CT-Transformer) architecture. Designed primarily for post-processing of ASR results to predict and restore punctuation in text.
  • Features: The model consists of three parts: Embedding, Encoder, Predictor. Embedding fuses word vectors and positional vectors; Encoder supports various network structures such as Transformer and Conformer; Predictor predicts punctuation type per token. To address the issues of high inference latency and frequent punctuation flickering in traditional Transformers, CT-Transformer achieves controllable inference latency while maintaining accuracy, making it suitable for real‑time business scenarios. Test results on general domain business datasets: Precision 53.8%, Recall 60.0%, F1 score 56.5%. Total training samples: approximately 33 million.
  • Open source repository: https://github.com/modelscope/FunASR

Terminology explanations

  • int8: INT8 quantised version, reduces model size and speeds up inference, with a small loss in accuracy
  • mge: Targeted quantisation optimisation for core layers (MatMul, Gather, Embed). Further reduces model size, improves loading and inference speed; accuracy may degrade slightly

Model List

Model NameVocabulary SizeDescriptionDownload Link
alicttransformerpunc-zh-en-onnx272,727Standard original version, general Chinese‑English punctuation modelmodelscope
alicttransformerpunc-zh-en-int8-onnx272,727Standard version INT8 quantised, smaller size, faster inferencemodelscope
alicttransformerpunc-zh-en-mge-int8-onnx272,727Standard version + core‑layer MGE optimisation + INT8 quantisation, further improved loading and inference speed, slightly lower accuracymodelscope
alicttransformerpunc-large-zh-en-onnx471,067Large‑parameter original version, higher punctuation recognition accuracymodelscope
alicttransformerpunc-large-zh-en-int8-onnx471,067Large‑parameter version INT8 quantised, balancing accuracy and inference speedmodelscope
alicttransformerpunc-large-zh-en-mge-int8-onnx471,067Large‑parameter version + core‑layer MGE optimisation + INT8 quantisation, best overall runtime efficiency, slightly lower accuracymodelscope

FireRedPunc Series

Notes

  • Model background: FireRedPunc is an independent punctuation prediction module within the FireRedASR2S integrated speech system. Built on the BERT architecture, it is designed for ASR post‑processing scenarios and supports Chinese‑English bilingual punctuation restoration.
  • Features: The model achieves SOTA performance in the industry, with an average F1 score of 78.90%. It performs excellently across multiple domains for both Chinese and English datasets, adapting to various offline and real‑time transcription tasks.
  • Open source repository: https://github.com/FireRedTeam/FireRedASR2S

Model List

Model NameDescriptionDownload Link
FireRedPunc-zh-en-onnxGeneral Chinese‑English punctuation prediction model, excellent overall accuracy, SOTA‑level performancemodelscope