CT-Transformer Series

Notes

  • Model background: Punctuation model open-sourced by Alibaba DAMO Academy, built on the Controllable Time-delay Transformer (CT-Transformer) architecture. Designed primarily for post-processing of ASR results to predict and restore punctuation in text.
  • Features: The model consists of three parts: Embedding, Encoder, Predictor. Embedding fuses word vectors and positional vectors; Encoder supports various network structures such as Transformer and Conformer; Predictor predicts punctuation type per token. To address the issues of high inference latency and frequent punctuation flickering in traditional Transformers, CT-Transformer achieves controllable inference latency while maintaining accuracy, making it suitable for real‑time business scenarios. Test results on general domain business datasets: Precision 53.8%, Recall 60.0%, F1 score 56.5%. Total training samples: approximately 33 million.
  • Open source repository: https://github.com/modelscope/FunASR

Terminology explanations

  • int8: INT8 quantised version, reduces model size and speeds up inference, with a small loss in accuracy
  • mge: Targeted quantisation optimisation for core layers (MatMul, Gather, Embed). Further reduces model size, improves loading and inference speed; accuracy may degrade slightly

Model List

Model NameVocabulary SizeDescriptionDownload Link
alicttransformerpunc-zh-en-onnx272,727Standard original version, general Chinese‑English punctuation modelmodelscope
alicttransformerpunc-zh-en-int8-onnx272,727Standard version INT8 quantised, smaller size, faster inferencemodelscope
alicttransformerpunc-zh-en-mge-int8-onnx272,727Standard version + core‑layer MGE optimisation + INT8 quantisation, further improved loading and inference speed, slightly lower accuracymodelscope
alicttransformerpunc-large-zh-en-onnx471,067Large-parameter original version, higher punctuation recognition accuracymodelscope
alicttransformerpunc-large-zh-en-int8-onnx471,067Large-parameter version INT8 quantised, balancing accuracy and inference speedmodelscope
alicttransformerpunc-large-zh-en-mge-int8-onnx471,067Large-parameter version + core‑layer MGE optimisation + INT8 quantisation, best overall runtime efficiency, slightly lower accuracymodelscope