If the Nobel Prize were an exam, what would be the score of a laureate?

Earlier, inTruth, What is it?, I hypothesized that science is the final arbiter between true and false. Religion, another source of truth, is now disrupted by technology.

The question opening this post came to me after reading “Evaluating Large Language Models in Scientific Discovery”, published recently by no less than 58 authors representing 30 institutions. These authors aimed to evaluate if ChatGPT, Claude, DeepSeek, and Grok could solve problems as scientists do.

For the task, the authors submitted research problems in biology, chemistry, materials science, and physics to the AI models. The tasks were designed by scientific experts. They asked the AI models to formulate problems and hypothesis. Models had to propose experiments, interpret data, and reach to conclusions. More interestingly, those authors devised scores to evaluate how the AI models performed as scientists.

LLMs are good at answering static questions. They perform well on pre-trained information, scoring high in general science, mathematics, and coding. They are good at retrieving pre-existing public information. However, with respect to contextualization and the production of new knowledge, those models have significant limitations. On this exam for scientists, graded from 0 to 10, those LLMs scored below 2.

The above publication was very insightful for me. As I am not an LLM, I could contextualize the following three hypotheses. First, most successful scientists do not rely solely on rationality to revise their findings. They decide what to do next in pursuing their hypotheses by considering many factors. Serendipity involves significant elements of intuition and unconscious decision-making based on context, not merely rationality. The second and obvious question is, if LLMs have a score so low in “scientific” capabilities, what score would top-notch scientists have? Third and most impactful for leadership: How long will the traditional scientific method remain as the ultimate arbiter of true and false? When will technology disrupt the scientific method as we know today? What does it mean for law and order when a non-human defines right and wrong? What does it mean for progress and stability when this happens?

Get the Book

How does one prepare for leadership in complex systems? What education, training, or content should one follow?

Be Part of the Movement

Regularly, Marcos shares fresh perspectives, insights, and cases to uncover the hidden patterns of systems around us.

← Back

Thank you for your response. ✨

Warning