Alpaca was the least reliable generative AI program in 2023, with an accuracy score of 20 percent or lower in nearly all categories. ChatGPT, made by OpenAI, Claude, and Claude 2, made by Anthropic, were the most reliable generative AI programs overall. Davinci002 was the most reliable model in the general aspect, but it suffers considerably from hallucinations in summarization of topics.
HaluEval hallucination classification accuracy benchmark of generative artificial intelligence (AI) models globally in 2023
Adjust the presentation of the statistic and data points.
Share the statistic on social media channels or embed the statistic in your
website using "Embed Code", where available.
Cite this statistic and select one of the following formats: APA, Chicago, Harvard, MLA & Bluebook.
Print the statistic including description and metadata.
Chart type
HaluEval hallucination classification accuracy benchmark of generative artificial intelligence (AI) models globally in 2023
Share this statistic
You have no right to use this feature.
Make sure to contact us if you are interested in scientific citation.
You can upgrade your account to enable this functionality for all statistics.
This feature is not available with your current account.Request access
The survey was administered to census-targeted samples of over 1,000 people in each of 21 countries, for a total of 23,882 surveys conducted in 12 languages.
Source breaks down the functions as follows: "SO = Service operations", "M&S = Marketing and sales", and "R&D = Research and development".
Profit from the additional features of your individual account
Currently, you are using a shared account. To use individual functions (e.g., mark statistics as favourites, set
statistic alerts) please log in with your personal account.
If you are an admin, please authenticate by logging in again.