“We’ve achieved peak data,” former OpenAI chief scientist Ilya Sutskever said onstage at the NeurIPS conference last year. “We have to deal with the data that we have, and there’s only one Internet.”
Sutskever’s comment comes amidst speculation that the speed of progress in large language model (LLM) was hitting a wall as scaling was reaching its digital end. The prevailing idea has been that the bigger an AI model, the smarter it is.
The race towards building large AI models has been building up ever since OpenAI released their 175 billion parameter LLM, GPT-3, in 2020. In the next three years, the company’s LLMs further increased in size with GPT-4 trained on 1.7 trillion parameters.
But, in 2024, researchers started to look at language models differently as scaling training data, scoured from the Internet, was giving marginal gains. The idea of building smaller language models emerged.
This is evident in announcements made by Big Tech firms. Most of them released a nifty language model alongside their flagship AI models.
Google DeepMind released Gemini Ultra, Nano and Flash models, while OpenAI and Meta launched their GPT-4o mini and Llama 3 models. Amazon-backed Anthropic AI’s launched Claude 3 and Haiku alongside its Opus.
Small Language Models (SLMs) are cheaper and ideal for specific use cases. For a company that needs AI for a set of specialised tasks, it doesn’t require a large AI model. Training small models require less time, less compute and smaller training data.
French start up Mistral AI, an SLM provider, pitched its AI model to be as efficient as LLMs for specialised, focused applications. Microsoft released a family of small language models called Phi. (The latest Phi-3-mini comprised 3.8 billion parameters)
Apple Intelligence, the AI system deployed in the latest iPhones and iPads, runs on-device AI models that can sort of match the performance of top LLMs.
If LLMs are intentionally built to achieve Artificial General Intelligence (AGI), small language models are made for specific use cases.
“Small language models are perfect for edge cases,” said Rahul Dandwate, ML engineer at Adobe. “When I am using WhatsApp or any Meta application which is powered by the Llama 8B model, I am trying to learn a new language because its reasonably good at translation and other basic tasks like this.”
“But they wouldn’t do well at most benchmarks that large language models are measured against like coding or logical problems. There still isn’t a small language model that’s as good at solving more complex problems,” he said.
We still aren’t fully aware why this bottleneck exists. “But the best way we can understand this is just as human beings have brains with a massive number of neurons, a smaller animal has a limited number of neurons. This is why human brains have the capacity for far more complex levels of intelligence. This is similar to how small language models and large language models work,” he said.
In a country like India, where the scope of AI adoption is immense but resources are constrained, the diminutiveness of small language models is perfect.
Another AI initiative from IIIT Hyderabad, Visvam, is building datasets from the ground up to build small language models that can be used in healthcare, agriculture, education and to “promote and preserve language and cultural diversity through AI,” their website stated.
As the world of language model develops, it’s not just enough to build frontier models from scratch. Sarvam AI’s co-founder Vivek Raghavan said, “We want to build GenAI that a billion Indians can use.”
Published - January 09, 2025 03:50 pm IST