top of page

OpenAI o1 achieves top marks on the Mensa Norway IQ Test

Foto del escritor: Miguel Ángel LiébanasMiguel Ángel Liébanas

OpenAI's newly released o1 model is already making waves in the AI landscape, showcasing extraordinary reasoning abilities in benchmark tests. In the latest Mensa Norway IQ test, o1 earned an impressive IQ score of 133, positioning itself as the most capable AI model in the reasoning category to date.


Breaking down o1’s performance


The Mensa Norway test, known for its rigorous assessment of cognitive skills, particularly targets logical reasoning and pattern recognition. OpenAI's o1 delivered:

  • Raw Score: 29 out of 35 correct answers (83%)

  • IQ Score: 133 (averaged over recent testing sessions)

  • Rank: First among AI models, outperforming leading alternatives such as GPT-4 and Claude-3.


The o1 edge: "Thinking Before Answering"


One key factor in o1’s dominance is its innovative chain-of-thought reasoning, which allows it to "think" step-by-step before generating answers. This deliberate computation process improves accuracy, particularly in complex reasoning tasks involving mathematics and logic. By spending additional time on reasoning, o1 has achieved near-PhD-level performance in benchmark science and mathematical exams.


o1 vs. Competitors


When compared to other top AI models:

  • o1 Pro: Achieved a close second with an IQ of 121.

  • Claude-3 Opus: Scored an IQ of 99.

  • GPT-4: Delivered a respectable IQ of 86, but trailed significantly behind o1.


Why This Matters


The results highlight a paradigm shift in AI development: reasoning power now rivals scale and training data as critical components for advanced AI performance. OpenAI's o1 is not just faster—it’s smarter, excelling at problems that demand thoughtful analysis.


With OpenAI's ongoing focus on AI safety and precision, o1’s results on the Mensa test underscore its potential as a powerful tool for scientific, technical, and problem-solving applications.

18 visualizaciones0 comentarios

Entradas recientes

Ver todo

Comments


bottom of page