In April 2024, Meta introduced Llama 3, its next generation of open-source large language models (LLMs). Initially, the Llama 3 8B and Llama 3 70B models set new performance benchmarks for LLMs of their size. However, within just three months, other models have surpassed these early iterations.
Meta has announced that its largest model, Llama 3, will feature over 400 billion parameters and is currently under training. Today, early benchmarks of the upcoming Llama 3.1 8B, 70B, and 405B models were leaked on the LocalLLaMA subreddit. These benchmarks suggest that the Meta Llama 3.1 405B model could outperform OpenAI’s GPT-4o in several key AI metrics. This development represents a significant milestone for the open-source AI community, potentially marking the first time an open-source model surpasses a state-of-the-art closed-source LLM.
During the Llama 3 launch, Meta emphasized its commitment to the open AI ecosystem:
We’re committed to the continued growth and development of an open AI ecosystem for releasing our models responsibly. We have long believed that openness leads to better, safer products, faster innovation, and a healthier overall market. This is good for Meta, and it is good for society.”
Meta Llama 3.1 Benchmarks
The leaked benchmarks indicate that Meta Llama 3.1 outperforms GPT-4o in several tests, including GSM8K, Hellaswag, BoolQ, MMLU-humanities, MMLU-other, MMLU-STEM, and Winograd. However, it underperforms in the HumanEval and MMLU-social sciences categories.
These results stem from the base models of Llama 3.1. Instruction-tuning could further enhance the model’s performance. The Instruct versions of the Llama 3.1 models are expected to improve these metrics significantly.
While OpenAI’s forthcoming GPT-5, with its anticipated advanced reasoning capabilities, may challenge Llama 3.1’s potential leadership, the strong performance of Llama 3.1 against GPT-4o underscores the potential of open-source AI. This advancement could democratize access to cutting-edge AI technology, fostering faster innovation across the tech industry.