Authors: Shengding Hu, Yuge Tu, Xu Han*, Ganqu Cui, Chaoqun He, Weilin Zhao, Xiang Long, Zhi Zheng, Yewei Fang, Kaihuo Zhang, Yuxiang Huang, Zhenning Dai, Baitao Gong, Chongyi Wang, Yuan Yao, Jie Zhou, Jie Cai, Xinrong Zhang, Zhongwu Zhai, Ning Ding, Chao Jia, Guoyang Zeng, Dahai Li, Zhiyuan Liu*, Maosong Sun
Affiliation: Modelbest Inc., THUNLP
Github: OpenBMB/MiniCPM: MiniCPM-2B: An end-side LLM outperforms Llama2-13B. (github.com)
💥 The full paper is now available on Arxiv: [2404.06395] MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies (arxiv.org)
MiniCPM is a series of edge-side large language models, with the base model, MiniCPM-2B, having 2.4B non-embedding parameters. It ranks closely with Mistral-7B on comprehensive benchmarks (with better performance in Chinese, mathematics, and coding abilities), surpassing models like Llama2-13B, MPT-30B, and Falcon-40B. On the MTBench benchmark, which is closest to user experience, MiniCPM-2B also outperforms many representative open-source models such as Llama2-70B-Chat, Vicuna-33B, Mistral-7B-Instruct-v0.1, and Zephyr-7B-alpha.
We will fully open source the model parameters of MiniCPM-2B for academic research and limited commercial use, as well as all checkpoints and most non-proprietary data during the training process for model mechanism research.
Currently, we open source the following models:
The overall performance of the model:
Average | English Average (including code and mathematical reasoning) | Chinese Average | |
---|---|---|---|
Llama2-7B | 35.40 | 36.21 | 31.77 |
Qwen-7B | 49.46 | 47.19 | 59.66 |
Deepseek-7B | 39.96 | 39.15 | 43.64 |
Mistral-7B | 48.97 | 49.96 | 44.54 |
Llama2-13B | 41.48 | 42.44 | 37.19 |
MPT-30B | 38.17 | 39.82 | 30.72 |
Falcon-40B | 43.62 | 44.21 | 40.93 |
MiniCPM-2B | 52.33 | 52.60 | 51.10 |
Limitations: