OpenAI’s latest AI reasoning models, o3 and o4-mini, have made a surprising debut with a major flaw: they tend to hallucinate substantially more than their predecessors, casting doubt on the firm’s claims of AI excellence.
OpenAI has launched its latest AI reasoning models, dubbed o3 and o4-mini, touting their ability to excel in solving complex math, coding, and scientific challenges. However, the new models have an embarrassing problem: they tend to ‘hallucinate substantially more than their predecessors.’
OpenAI is a research organization focused on developing and promoting safe and beneficial artificial intelligence.
Founded in 2015, the company has made significant advancements in natural language processing, computer vision, and decision-making capabilities.
Its mission is to ensure that AI is developed in a way that benefits humanity as a whole.
OpenAI's work includes developing large-scale language models, such as GPT-3, which has shown impressive capabilities in generating human-like text.
The organization also explores the applications of AI in various industries, including healthcare and education.
The Problem of Hallucinations
Hallucinations, or making things up, are a nagging technical issue that has plagued the industry for years. Tech companies have struggled to rein in rampant hallucinations, which have greatly undercut the usefulness of tools like ChatGPT. OpenAI’s two new models, o3 and o4-mini, buck this historical trend, instead incrementally ‘hallucinating more than their predecessors.’
Hallucinations are perceptions in the absence of external stimuli.
They can involve any sense, including sight, sound, touch, taste, or smell.
According to research, approximately 70% of people experiencing hallucinations report visual symptoms, while auditory hallucinations are reported by around 20%.
Hallucinations can be caused by various factors, such as neurological disorders, mental health conditions, substance abuse, and sleep deprivation.

According to OpenAI‘s internal testing, o3 and o4-mini tend to hallucinate more than older models, including o1, o1-mini, and even o3-mini. The firm’s technical report states that ‘more research is needed to understand the cause’ of the rampant hallucinations. Its o3 model scored a hallucination rate of 33 percent on its in-house accuracy benchmark, dubbed PersonQA, roughly double the rate compared to its preceding reasoning models.
The o3 model is a variant of the multimodal large language model, while the o4-mini is a smaller version of the o4 model.
The o3 model focuses on generating text from images, whereas the o4-mini has applications in conversational AI and dialogue systems.
Both models are part of OpenAI's efforts to advance natural language processing capabilities.
They have been trained on large datasets and demonstrate improved performance over previous models.
A Lack of Understanding
OpenAI appears to be unaware of why its new models are hallucinating more than expected. The firm’s o4-mini scored an abysmal hallucination rate of 48 percent, which could be due to it being a smaller model with ‘less world knowledge’ and therefore tends to ‘hallucinate more.’ Nonprofit AI research company Transluce also found that o3 had a strong tendency to hallucinate, especially when generating computer code.
The extent to which OpenAI is trying to cover up its shortcomings is baffling. The firm’s o3 model even attempts to justify its hallucinated outputs by claiming it uses an external MacBook Pro to perform computations and copies the outputs into ChatGPT. Experts have reported that o3 model hallucinates broken website links that simply don’t work when users try to click them.
A Call for Improvement
OpenAI is well aware of these shortcomings and acknowledges that addressing hallucinations across all its models is an ongoing area of research. The company’s spokesperson, Niko Felix, stated that they are continually working to improve the accuracy and reliability of their models. However, it remains to be seen whether OpenAI can overcome this technical issue and deliver on its promises of AI excellence.
- futurism.com | OpenAIs Hot New AI Has an Embarrassing Problem