It's Time to Unpack the Small Language Model, MiniCPM, Plagiarized by Stanford Undergraduates, and the Team Behind It
See their alternative way for practicing "scaling law"
A Stanford undergraduate research team is accused of plagiarizing the MiniCPM-Llama3-V 2.5 project for their Llama 3-V project, sparking backlash from open-source and AI communities like Hugging Face, Github, and X.
The MiniCPM team deserved and received an apology. So, the trio of young devs involved in the whole Llama 3-V thing have a bit of a road ahead. But no finger-pointing here. Let's get why MiniCPM-Llama3-V 2.5 was so cool that it was worth copying, and shine a light on the squad behind it.
Who is behind?
It is often mentioned by developers and researchers in open-source communities that MiniCPM-Llama3-V 2.5 is supported by the NLP Lab of Tsinghua University, which is accurate. However, a more accurate statement might be that the intellectual property rights of MiniCPM-Llama3-V 2.5 are jointly owned by Tsinghua NLP Lab and Modelbest Inc, a Beijing-based AI startup.
I had the opportunity to meet almost all of the key researchers involved in the MiniCPM projects. Most of them are concurrently researchers at Tsinghua NLP Lab and Modelbest. The exception is the CEO of Modelbest, who has a mathematics degree from Peking University-makes the team formation a bit more ‘diversified’ 😉
What distinguishes MiniCPM's research?
Here is the mission of MiniCPM team - to unveil the potential of Small Language Models (SLM).
They never voice it loudly, but updates to different model sizes are clearly competing with larger models in terms of efficiency and overall performance. Meanwhile, both the MiniCPM-Llama3-V 2.5, an 8B parameter model, and the MiniCPM-V2.0, a 2.8B parameter model, clearly demonstrate their ambition to challenge Mistral for a worldwide SLM laurel.
SLM breakouts now take place in both Paris and Beijing, apart from San Francisco. This geographical diversification presents numerous possibilities, challenging the idea that larger models are always superior. It almost seems too good to be true.
Generally, MiniCPM is composed of small language models. It exhibits scalability in terms of both model and data size, and its performance is comparable to that of larger models.
Their approach makes model training an experimental science. They use a method called Model Sandbox for experiments on models with fixed parameters to find optimal performance and universal training principles for larger models. This isn't just about scaling with more GPUs. The result includes innovative tools like an improved WSD scheduler, new tokenizers, and a paper detailing the strategies used in MiniCPM to outperform larger models. (https://arxiv.org/pdf/2404.06395)
What is the performance of their models like?
Here are the details of their models:
MiniCPM-Llama3-V 2.5: This 8B parameter model surpasses models like GPT-4V-1106, Gemini Pro, and Claude 3. It supports 30+ languages and is optimized for end-side devices.
MiniCPM-V 2.0: Designed for efficient deployment, this model outperforms larger ones like Qwen-VL-Chat 9.6B on the OpenCompass benchmark and matches GPT-4V in accuracy.
MiniCPM-DPO, MiniCPM-MoE, and MiniCPM-128K: These variants highlight the versatility and potential of SLMs in various domains.
And most importantly, MiniCPM provides its own solution for achieving the so called ‘scaling law’ without maximizing the size of models.
So, MiniCPM's take on the scaling law is pretty cool. It shows that even the little guys can keep up with the big ones if they're trained well. This could mean we're moving away from those big, bulky models, and towards smaller ones that can still get the job done, but with less resources. Sounds like a win-win to us.
Why their SLMs matter?
No doubt about it, these tools are all set to rock the AI world. They're custom-built for specific tasks and domains, and they don't make your computer work up a sweat. So, if you're thinking that we're heading towards a world with billions of AI-powered devices (and who isn't, right?), these smaller language models are about to make a big splash.
In the end, I've gotta give a shoutout to those three young develoopers from Stanford. They've managed to make the achievements of the MiniCPM team on SLM more noticeable. There's probably a whole bunch of hidden gems in the open-source models and tools that need to be unearthed, even if they're 'borrowed' without getting the credit they deserve.