OpenAI’s Project Strawberry Said to Be Building AI That Reasons and Does ‘Deep Research’

Despite their uncanny language skills, today’s leading AI chatbots still struggle with reasoning. A secretive new project from OpenAI could reportedly be on the verge of changing that.

While today’s large language models can already carry out a host of useful tasks, they’re still a long way from replicating the kind of problem-solving capabilities humans have. In particular, they’re not good at dealing with challenges that require them to take multiple steps to reach a solution.

Imbuing AI with those kinds of skills would greatly increase its utility and has been a major focus for many of the leading research labs. According to recent reports, OpenAI may be close to a breakthrough in this area.

An article in Reuters this week claimed its journalists had been shown an internal document from the company discussing a project code-named Strawberry that is building models capable of planning, navigating the internet autonomously, and carrying out what OpenAI refers to as “deep research.”

A separate story from Bloomberg said the company had demoed research at a recent all-hands meeting that gave its GPT-4 model skills described as similar to human reasoning abilities. It’s unclear whether the demo was part of project Strawberry.

According, to the Reuters report, project Strawberry is an extension of the Q* project that was revealed last year just before OpenAI CEO Sam Altman was ousted by the board. The model in question was supposedly capable of solving grade-school math problems.

That might sound innocuous, but some inside the company believed it signaled a breakthrough in problem-solving capabilities that could accelerate progress towards artificial general intelligence, or AGI. Math has long been an Achilles’ heel for large language models, and capabilities in this area are seen as a good proxy for reasoning skills.

A source told Reuters that OpenAI has tested a model internally that achieved a 90 percent score on a challenging test of AI math skills, though it again couldn’t confirm if this was related to project Strawberry. But another two sources reported seeing demos from the Q* project that involved models solving math and science questions that would be beyond today’s leading commercial AIs.

Exactly how OpenAI has achieved these enhanced capabilities is unclear at present. The Reuters report notes that Strawberry involves fine-tuning OpenAI’s existing large language models, which have already been trained on reams of data. The approach, according to the article, is similar to one detailed in a 2022 paper from Stanford researchers called Self-Taught Reasoner or STaR.

That method builds on a concept known as “chain-of-thought” prompting, in which a large language model is asked to explain the reasoning steps behind its answer to a query. In the STaR paper, the authors showed an AI model a handful of these “chain-of-thought” rationales as examples and then asked it to come up with answers and rationales for a large number of questions.

If it got the question wrong, the researchers would show the model the correct answer and then ask it to come up with a new rationale. The model was then fine-tuned on all of the rationales that led to a correct answer, and the process was repeated. This led to significantly improved performance on multiple datasets, and the researchers note that the approach effectively allowed the model to self-improve by training on reasoning data it had produced itself.

How closely Strawberry mimics this approach is unclear, but if it relies on self-generated data, that could be significant. The holy grail for many AI researchers is “recursive self-improvement,” in which weak AI can enhance its own capabilities to bootstrap itself to higher orders of intelligence.

However, it’s important to take vague leaks from commercial AI research labs with a pinch of salt. These companies are highly motivated to give the appearance of rapid progress behind the scenes.

The fact that project Strawberry seems to be little more than a rebranding of Q*, which was first reported over six months ago, should give pause. As far as concrete results go, publicly demonstrated progress has been fairly incremental, with the most recent AI releases from OpenAI, Google, and Anthropic providing modest improvements over previous versions.

At the same time, it would be unwise to discount the possibility of a significant breakthrough. Leading AI companies have been pouring billions of dollars into making the next great leap in performance, and reasoning has been an obvious bottleneck on which to focus resources. If OpenAI has genuinely made a significant advance, it probably won’t be long until we find out.

Image Credit: gemenuPixabay



* This article was originally published at Singularity Hub

Post a Comment

0 Comments