• author: Matthew Berman

The Battle Between Large Foundational Models and Open Source Models in AI

Artificial intelligence has been the subject of some heated debate lately. One of the main points of contention is the battle between large foundational models (LFMs) and smaller open source models. The question is, which one is better? Over the past few weeks, there have been several new papers that have added to the conversation and challenged existing viewpoints.

We Have Nomo and the False Promise of Imitating Proprietary LLMs

One such document, called "We Have Nomo," was a leaked internal memo from Google that highlighted the rapid proliferation and iteration of open source models. It suggested that the gap between models like gpt4 and palm 2 was closing quickly due to new techniques to train and fine-tune these smaller models. It also challenged the idea that these large foundational models were at the forefront of AI progress. However, a research paper out of UC Berkeley challenged the assertions of "We Have Nomo" by claiming that open source models are simply imitating the outputs of LFMs without actually understanding the logic involved.

The research paper, titled "The False Promise of Imitating Proprietary LLMs," argues that these open source models are limited to pattern-matching prompts and responses. They may perform well on tests where there is a clear question-answer pair, but they struggle when the questions vary even slightly. Understanding the reasoning process is key to true intelligence in AI.

Orca: The Promising Direction in Improving Model Capabilities and Skills

A new research paper, "Orca: Progressive Learning from Complex Explanation Traces of gpt4," was just released by Microsoft Research. It boasts an exciting new technique to make smaller open source models extremely powerful. Orca challenges the idea that open source models can only imitate answers and get thrown off by any variation in the prompts themselves. By explaining the reasoning process step-by-step, these smaller models can learn how to reason and step-by-step arrive at the answer.

Orca is a 13 billion-parameter model that learns to imitate the reasoning process of LFMs. It outperforms every other open-source model, including Chat GPT, which is GPT 3.5 in several benchmarks. It is also fascinating to note that Orca model is only 13 billion parameters, which means it can run on virtually any modern hardware.

Orca learns from rich signals (step-by-step thought processes and complex instructions) from gpt4, including explanation traces, guided by teacher assistants from Chachi BT. These teacher assistants take millions of examples to learn from, boil it down to the most important ones and use gpt4 to train on more complex examples.

Through Orca, we see that learning from step-by-step explanations, whether generated by humans or more advanced AI models, is a promising direction for the improvement of model capabilities and skills. Large language models, like humans, are able to understand a topic only when they know how it works in detail, rather than being limited to pattern matching questions and answers.

Orca shows competitive performance in professional and academic examinations like SAT, LSAT, GRE, and GMAT both in zero-shot settings without Chain of Thought while trailing behind gpt4. However, with research indicating that learning from step-by-step explanations is the way forward, it seems only a matter of time before the gap between open source models and LFMs will close completely.


Artificial intelligence has come a long way, but there is still much debate about the right approach to take in creating models that truly understand complex processes. While LFMs are the source of some of the most impressive models in AI, it's clear that open source models can be powerful as well. Orca offers a promising glimpse into the future, where open source models can learn how to reason and understand complex reasoning. By learning from step-by-step explanations, we may one day have models that are even more powerful than their LFM predecessors.

The Advancements in Open Source NLP Models: A Close Look at Orca 13B

Natural Language Processing (NLP) is an ever-evolving field that has seen significant advancements in recent years. With the introduction of large-scale language models (LLMs), such as GPT-3, the capabilities of NLP models have been expanded to an unprecedented level. However, these models are proprietary, and access to them is limited. As a result, there has been much research focused on developing open-source NLP models to democratize access to these technologies.

The Orca 13B paper, authored by a group of researchers from Georgia Tech and Microsoft, presents an in-depth analysis of their open-source language model. In this article, we will explore the key contributions of the Orca 13B model, as well as its capabilities and limitations compared to other open-source NLP models.

Key Contributions of Orca 13B Model

The Orca 13B paper highlights three significant contributions of their language model:

  1. Explanation Tuning - This involves fine-tuning the model based on the step-by-step explanation of the reasoning and the logic of how to arrive at a solution. The paper explains that detailed responses from GPT-4 were collected to augment the queer response pairs, which explain the reasoning process of the teacher as it generates the response.

  2. Scaling of Tasks and Instructions - Orca 13B utilizes a vast dataset of tasks and instructions from the Flan 2020 collection, which consists of tens of millions of instructions. This addresses the data scaling issue faced by other open-source models that have access to significantly limited datasets.

  3. Evaluation Techniques - Orca 13B claims to solve evaluation issues faced by other open-source models, such as Auto-evaluation with GPT-4, Academic benchmarks like Big Bench Hard and Truthful QA, and Professional and Academic exams like the SAT and LSAT. They also use Safety evaluation from Toxic-gen to filter responses containing toxic language.

Performance Comparison with Other Open-Source Models

The Orca 13B model was evaluated against other open-source NLP models, such as Vicuna, DALL-E 2, Alpaca, and Wizard LM. The paper presents comparative evaluations carried out on zero-shot tasks, academic exams, complex reasoning tasks, and open-ended conversational chat-like tasks.

Their findings indicate that Orca 13B surpasses these models in most categories. For instance, when evaluated by GPT-4, Orca 13B beats models like Chachi BT, Bard, and llama-based open-source models. In complex zero-shot reasoning tasks on big bench hard work, Orca 13B performs at a level of parity with Chachi PT. Additionally, Orca 13B's performance is much better than Bacuna 13B.

Limitations of Open-Source NLP Models

While open-source models have made significant progress in NLP, there are still limitations to their performance and capabilities. For example, the paper highlights that models trained on natural conversations may capture their style but not the reasoning process. Therefore, these models can pattern match but not truly understand the logic behind arriving at a particular solution.

Moreover, these models often face a challenge in acquiring enough data to perform well. As the Orca paper points out, broadly matching chat GPT using purely imitation requires tremendous effort to collect enormous imitation data sets, which is currently not available. Therefore, it is still challenging for open-source models to perform at the same level as their proprietary LLM counterparts.


The Orca 13B paper provides valuable insights into advancements in open-source NLP models. The model's key contributions, such as explanation tuning, scaling of tasks and instructions, and evaluation techniques, demonstrate its superiority over other open-source models. However, there are still limitations to open-source NLP models, such as their ability to capture reasoning processes and acquire enough data. Nonetheless, as the research in this area continues, it is safe to say that these models will become increasingly sophisticated, benefiting NLP applications across various fields.

The Power of Chachi PT in Progressive Learning

In a recent study, researchers from Microsoft Research sought to evaluate the effectiveness of Chachi PT as a teaching assistant to GPT4. Chachi PT is an intermediate step between the Orca and GPT4 models, and the study found that it significantly improves the performance of Orca in tasks that require human-like behavior.

Reasons Behind the Power of Chachi PT

The researchers identified two main reasons why Chachi PT is a powerful method for teaching a model like Orca. Firstly, there is a significant capacity gap between Orca and GPT4, which makes it difficult for Orca to imitate GPT4 directly. By teaching Orca progressively through Chachi PT, the model is able to learn and improve incrementally, which leads to better performance.

Secondly, there is the simple pragmatic reason of cost and time. Chat GPT 3.5 Turbo (used to train Chachi PT) is much faster and less expensive than GPT4, so the researchers used 5 million examples for Chachi PT and only 1 million examples for GPT4.

Performance Comparison

The researchers compared the performance of Orca, Vacunya, and Chat GPT on various tasks such as LSAT and SAT questions. Orca performed significantly better than Vacunya, and although it lagged behind GPT4, the study showed that the progressive learning technique employed in Chachi PT improved Orca's performance by a significant margin.

In fact, using only GPT4, the researchers achieved a score of 37.18, whereas using Chachi PT as an intermediate step, they were able to achieve a score of 41.7. This may seem like a small difference, but it is a significant improvement in the context of large language models.


One implication of this study is that open-source models continue to improve at a rapid pace thanks to new fine-tuning and training techniques that are being developed regularly. Furthermore, there seems to be a "secret sauce" that sets GPT4 apart from other models, and OpenAI seems to have a significant moat to work with.

It is worth mentioning that Microsoft Research, which conducted this study, owns a significant portion of OpenAI. The fact that they are making research gains in open-source models is impressive, and OpenAI's decision to release their own open-source model suggests that these large language models will continue to get better and cheaper over time.

Although Orca's code and dataset are not yet released, we will be reviewing them as soon as they are available. Stay tuned for updates on how to use them and their performance.

If you found this article informative, please consider liking and subscribing for more content. Thanks for reading!

Previous Post

Exploring the Naus Hermes Model

Next Post

Automating Data Analysis with Notable Plugin

About The auther

New Posts

Popular Post