Popular AIs head-to-head: OpenAI beats DeepSeek on sentence-level reasoning

Jelina Preethi via Getty Images

COMMENTARY | Large language model AIs can ingest long documents and answer questions about them, but a key question is how well they ‘understand’ individual sentences in the documents.

ChatGPT and other AI chatbots based on large language models are known to occasionally make things up, including scientific and legal citations. It turns out that measuring how accurate an AI model’s citations are is a good way of assessing the model’s reasoning abilities.

An AI model “reasons” by breaking down a query into steps and working through them in order. Think of how you learned to solve math word problems in school.

Ideally, to generate citations an AI model would understand the key concepts in a document, generate a ranked list of relevant papers to cite, and provide convincing reasoning for how each suggested paper supports the corresponding text. It would highlight specific connections between the text and the cited research, clarifying why each source matters.

The question is, can today’s models be trusted to make these connections and provide clear reasoning that justifies their source choices? The answer goes beyond citation accuracy to address how useful and accurate large language models are for any information retrieval purpose.

I’m a computer scientist. My colleagues − researchers from the AI Institute at the University of South Carolina, Ohio State University and University of Maryland Baltimore County − and I have developed the Reasons benchmark to test how well large language models can automatically generate research citations and provide understandable reasoning.

We used the benchmark to compare the performance of two popular AI reasoning models, DeepSeek’s R1 and OpenAI’s o1. Though DeepSeek made headlines with its stunning efficiency and cost-effectiveness, the Chinese upstart has a way to go to match OpenAI’s reasoning performance.

Sentence Specific

The accuracy of citations has a lot to do with whether the AI model is reasoning about information at the sentence level rather than paragraph or document level. Paragraph-level and document-level citations can be thought of as throwing a large chunk of information into a large language model and asking it to provide many citations.

In this process, the large language model overgeneralizes and misinterprets individual sentences. The user ends up with citations that explain the whole paragraph or document, not the relatively fine-grained information in the sentence.

Further, reasoning suffers when you ask the large language model to read through an entire document. These models mostly rely on memorizing patterns that they typically are better at finding at the beginning and end of longer texts than in the middle. This makes it difficult for them to fully understand all the important information throughout a long document.

Large language models get confused because paragraphs and documents hold a lot of information, which affects citation generation and the reasoning process. Consequently, reasoning from large language models over paragraphs and documents becomes more like summarizing or paraphrasing.

The Reasons benchmark addresses this weakness by examining large language models’ citation generation and reasoning.

Testing Citations and Reasoning

Following the release of DeepSeek R1 in January 2025, we wanted to examine its accuracy in generating citations and its quality of reasoning and compare it with OpenAI’s o1 model. We created a paragraph that had sentences from different sources, gave the models individual sentences from this paragraph, and asked for citations and reasoning.

To start our test, we developed a small test bed of about 4,100 research articles around four key topics that are related to human brains and computer science: neurons and cognition, human-computer interaction, databases and artificial intelligence. We evaluated the models using two measures: F-1 score, which measures how accurate the provided citation is, and hallucination rate, which measures how sound the model’s reasoning is − that is, how often it produces an inaccurate or misleading response.

Our testing revealed significant performance differences between OpenAI o1 and DeepSeek R1 across different scientific domains. OpenAI’s o1 did well connecting information between different subjects, such as understanding how research on neurons and cognition connects to human-computer interaction and then to concepts in artificial intelligence, while remaining accurate. Its performance metrics consistently outpaced DeepSeek R1’s across all evaluation categories, especially in reducing hallucinations and successfully completing assigned tasks.

OpenAI o1 was better at combining ideas semantically, whereas R1 focused on making sure it generated a response for every attribution task, which in turn increased hallucination during reasoning. OpenAI o1 had a hallucination rate of approximately 35% compared with DeepSeek R1’s rate of nearly 85% in the attribution-based reasoning task.

In terms of accuracy and linguistic competence, OpenAI o1 scored about 0.65 on the F-1 test, which means it was right about 65% of the time when answering questions. It also scored about 0.70 on the BLEU test, which measures how well a language model writes in natural language. These are pretty good scores.

DeepSeek R1 scored lower, with about 0.35 on the F-1 test, meaning it was right about 35% of the time. However, its BLEU score was only about 0.2, which means its writing wasn’t as natural-sounding as OpenAI’s o1. This shows that o1 was better at presenting that information in clear, natural language.

OpenAI Holds the Advantage

On other benchmarks, DeepSeek R1 performs on par with OpenAI o1 on math, coding and scientific reasoning tasks. But the substantial difference on our benchmark suggests that o1 provides more reliable information, while R1 struggles with factual consistency.

Though we included other models in our comprehensive testing, the performance gap between o1 and R1 specifically highlights the current competitive landscape in AI development, with OpenAI’s offering maintaining a significant advantage in reasoning and knowledge integration capabilities.

These results suggest that OpenAI still has a leg up when it comes to source attribution and reasoning, possibly due to the nature and volume of the data it was trained on. The company recently announced its deep research tool, which can create reports with citations, ask follow-up questions and provide reasoning for the generated response.

The jury is still out on the tool’s value for researchers, but the caveat remains for everyone: Double-check all citations an AI gives you.

The Conversation

Manas Gaur is an assistant professor of computer science and electrical engineering at University of Maryland, Baltimore County

X
This website uses cookies to enhance user experience and to analyze performance and traffic on our website. We also share information about your use of our site with our social media, advertising and analytics partners. Learn More / Do Not Sell My Personal Information
Accept Cookies
X
Cookie Preferences Cookie List

Do Not Sell My Personal Information

When you visit our website, we store cookies on your browser to collect information. The information collected might relate to you, your preferences or your device, and is mostly used to make the site work as you expect it to and to provide a more personalized web experience. However, you can choose not to allow certain types of cookies, which may impact your experience of the site and the services we are able to offer. Click on the different category headings to find out more and change our default settings according to your preference. You cannot opt-out of our First Party Strictly Necessary Cookies as they are deployed in order to ensure the proper functioning of our website (such as prompting the cookie banner and remembering your settings, to log into your account, to redirect you when you log out, etc.). For more information about the First and Third Party Cookies used please follow this link.

Allow All Cookies

Manage Consent Preferences

Strictly Necessary Cookies - Always Active

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Sale of Personal Data, Targeting & Social Media Cookies

Under the California Consumer Privacy Act, you have the right to opt-out of the sale of your personal information to third parties. These cookies collect information for analytics and to personalize your experience with targeted ads. You may exercise your right to opt out of the sale of personal information by using this toggle switch. If you opt out we will not be able to offer you personalised ads and will not hand over your personal information to any third parties. Additionally, you may contact our legal department for further clarification about your rights as a California consumer by using this Exercise My Rights link

If you have enabled privacy controls on your browser (such as a plugin), we have to take that as a valid request to opt-out. Therefore we would not be able to track your activity through the web. This may affect our ability to personalize ads according to your preferences.

Targeting cookies may be set through our site by our advertising partners. They may be used by those companies to build a profile of your interests and show you relevant adverts on other sites. They do not store directly personal information, but are based on uniquely identifying your browser and internet device. If you do not allow these cookies, you will experience less targeted advertising.

Social media cookies are set by a range of social media services that we have added to the site to enable you to share our content with your friends and networks. They are capable of tracking your browser across other sites and building up a profile of your interests. This may impact the content and messages you see on other websites you visit. If you do not allow these cookies you may not be able to use or see these sharing tools.

If you want to opt out of all of our lead reports and lists, please submit a privacy request at our Do Not Sell page.

Save Settings
Cookie Preferences Cookie List

Cookie List

A cookie is a small piece of data (text file) that a website – when visited by a user – asks your browser to store on your device in order to remember information about you, such as your language preference or login information. Those cookies are set by us and called first-party cookies. We also use third-party cookies – which are cookies from a domain different than the domain of the website you are visiting – for our advertising and marketing efforts. More specifically, we use cookies and other tracking technologies for the following purposes:

Strictly Necessary Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Functional Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Performance Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Sale of Personal Data

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.

Social Media Cookies

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.

Targeting Cookies

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.