MiniCPM: Unveiling the Potential of End-side Large Language Models

Authors: Shengding Hu, Yuge Tu, Xu Han*, Ganqu Cui, Chaoqun He, Weilin Zhao, Xiang Long, Zhi Zheng, Yewei Fang, Kaihuo Zhang, Yuxiang Huang, Zhenning Dai, Baitao Gong, Chongyi Wang, Yuan Yao, Jie Zhou, Jie Cai, Xinrong Zhang, Zhongwu Zhai, Ning Ding, Chao Jia, Guoyang Zeng, Dahai Li, Zhiyuan Liu*, Maosong Sun

Affiliation: Modelbest Inc., THUNLP

Github: OpenBMB/MiniCPM: MiniCPM-2B: An end-side LLM outperforms Llama2-13B. (github.com)

💥 The full paper is now available on Arxiv: [2404.06395] MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies (arxiv.org)

I. Introduction

MiniCPM is a series of edge-side large language models, with the base model, MiniCPM-2B, having 2.4B non-embedding parameters. It ranks closely with Mistral-7B on comprehensive benchmarks (with better performance in Chinese, mathematics, and coding abilities), surpassing models like Llama2-13B, MPT-30B, and Falcon-40B. On the MTBench benchmark, which is closest to user experience, MiniCPM-2B also outperforms many representative open-source models such as Llama2-70B-Chat, Vicuna-33B, Mistral-7B-Instruct-v0.1, and Zephyr-7B-alpha.

We will fully open source the model parameters of MiniCPM-2B for academic research and limited commercial use, as well as all checkpoints and most non-proprietary data during the training process for model mechanism research.

Currently, we open source the following models:

MiniCPM-2B-SFT/DPO: Instruction tuned and aligned with human preferences based on MiniCPM-2B.
MiniCPM-V: Multi-modal model based on MiniCPM-2B, which surpasses multi-modal model based on Phi-2 at the same parameter level.
MiniCPM-2B-SFT/DPO-Int4: Int4 quantized version of MiniCPM-2B-SFT/DPO.
MiniCPM mobile application developed based on MLC-LLM, LLMFarm, which supports both text and multi-modal model inference on mobile devices.

The overall performance of the model:

	Average	English Average (including code and mathematical reasoning)	Chinese Average
Llama2-7B	35.40	36.21	31.77
Qwen-7B	49.46	47.19	59.66
Deepseek-7B	39.96	39.15	43.64
Mistral-7B	48.97	49.96	44.54
Llama2-13B	41.48	42.44	37.19
MPT-30B	38.17	39.82	30.72
Falcon-40B	43.62	44.21	40.93
MiniCPM-2B	52.33	52.60	51.10

Limitations:

Constrained by the model size, it may exhibit hallucinatory issues, particularly with longer and more elaborate responses generated by the DPO model. We will continue to iterate and improve the MiniCPM model.
To ensure the generality of the model for academic research purposes, we have not subjected it to any identity-specific training. Additionally, as we utilized the openly available ShareGPT corpus as part of the training data, the model may produce identity-related information resembling the GPT series models.