Vectara Unveils Open-Source Hallucination Evaluation Model To Detect and Quantify Hallucinations in Top Large Language Models

Vectara Unveils Open-Source Hallucination Evaluation Model To Detect and Quantify Hallucinations in Top Large Language Models Groundbreaking Model and Leaderboard Provide New Transparency into Risks Associated with GenAI Chatbots from OpenAI, Anthropic, and Others, Enabling Safer Enterprise Adoption and Objective Government Oversight GlobeNewswire November 06, 2023

SANTA CLARA, Calif., Nov. 06, 2023 (GLOBE NEWSWIRE) -- Large Language Model (LLM) builder Vectara, the trusted Generative AI (GenAI) platform, released its open-source Hallucination Evaluation Model. This is a first-of-its-kind initiative to proffer a commercially available and open-source model that addresses the accuracy and level of hallucination in LLMs, paired with a publicly available and regularly updated leaderboard, while inviting other model builders like OpenAI, Cohere, Google, and Anthropic to participate in defining an open and free industry-standard in support of self-governance and responsible AI.

By launching its Hallucination Evaluation Model, Vectara is increasing transparency and objectively quantifying hallucination risks in leading GenAI tools, a critical step toward removing barriers to enterprise adoption, stemming dangers like misinformation, and enacting effective regulation. The model is designed to quantify how much an LLM strays from facts while synthesizing a summary related to previously provided reference materials.

"In order to realize the true promise of Generative AI, we first have to tackle the challenge of hallucinations,” said Matei Zaharia, CTO and Co-Founder of Databricks. “The launch of the Hallucination Evaluation Model to the Hugging Face community encourages industry co-innovation and accountability through a powerful measurement tool accessible for all LLM builders."

The Hallucination Evaluation Model launch includes releasing Vectara’s measurement code base as an open-source model on Hugging Face as well as a publicly accessible Leaderboard available from Vectara. The Leaderboard serves as a quality metric for LLM factual accuracy, similar to how credit ratings or FICO scores function for financial risk, giving businesses and developers insight into the realities of different GenAI tools before implementing them.

“For organizations to effectively implement Generative AI solutions including chatbots, they need a clear view of the risks and potential downsides," said Simon Hughes, AI researcher and ML engineer at Vectara. "For the first time, Vectara’s Hallucination Evaluation Model allows anyone to measure hallucinations produced by different LLMs. As a part of Vectara’s commitment to industry transparency, we’re releasing this model as open source, with a publicly accessible Leaderboard, so that anyone can contribute to this important conversation.”

Key Features of Vectara’s Hallucination Evaluation Model:

Objective Measurement: This model provides much-needed visibility into the LLMs' ability to synthesize data without introducing hallucinations. Many LLM vendors make claims about their capabilities to mitigate the impact of hallucinations, but until now, there have been no objectively verifiable methods for detecting and quantifying instances of irrelevant or incorrect data in model outputs. For the model, Vecatara built a machine-learning model, tuned for real world performance and using the latest advancements in hallucination research, to evaluate LLM summarizations without requiring objective scoring or influence.

Transparency Through Open Source: The Hallucination Evaluation Model is available for developers and industry stakeholders to integrate into their own pipelines through an Apache 2.0 License on Hugging Face. Developers can also use the open-source evaluation model to verify the accuracy of Vectara’s platform.

Dynamic Leaderboard: Vectara’s AI researchers and ML engineers (in collaboration with the open source community) will maintain and continually update the Leaderboard, showcasing the hallucination impact of different LLMs and offering a clear comparative perspective as new models emerge. The Leaderboard lists the accuracy and hallucination rates for each model tested in response to the same set of prompts.

The Leaderboard shows that OpenAI’s models have the strongest performance, followed by the Llama 2 models, Cohere and Anthropic. Google’s Palm models scored lower on the Leaderboard.

“Hallucination is one of the most serious issues to consider when deploying production LLMs. Having an open source benchmark model that can evaluate factual accuracy in a quantifiable way will allow developers to directly address the problems,” said Waleed Kadous, Chief Scientist at Anyscale. “Vectara’s new model sets the industry standard for measuring the extent to which LLMs hallucinate, and we’re excited to work with them as a launch partner.”

Vectara has led industry efforts to address hallucinations as a critical barrier to the safe, effective, and accurate use of GenAI. The model doesn’t solve hallucinations directly but rather enables more informed adoption and better decision-making by measuring the frequency and severity of this phenomena. Greater transparency into the quality of LLM-produced summarizations allows LLM users to evaluate GenAI solutions according to the risk profile of the intended use case.

GenAI adoption in highly regulated industries like legal, healthcare, finance, energy, and government will hinge upon vendors' ability to provide solutions with low to nearly zero risk of factual inaccuracies. Hallucinations have already been raised by stakeholders in these sectors as a serious issue. Until now, however, there has been no way to objectively compare the performance of available models outside of academic benchmarks, which don’t always translate to real-world settings.

Hallucinations also factor heavily in ongoing dialogue about GenAI regulation. Effective government oversight requires measurement tools universally recognized as transparent and objective. Vectara’s open-source model serves as an industry standard, providing the missing link to legislation that virtually all industry leaders agree is needed. With concerns around misinformation and other AI risks rising ahead of the U.S. presidential election and other geopolitical events, the Hallucination Evaluation Model and Leaderboard provide a tangible step toward data-driven and accessible oversight mechanisms.

About Vectara
Vectara is an end-to-end platform that empowers product builders to embed powerful Generative AI features into their applications with extraordinary results. Built on a solid hybrid-search core, Vectara delivers the shortest path to an answer or action through a safe, secure, and trusted entry point. Vectara is built for product managers and developers with an easily leveraged API that gives full access to the platform's powerful features. Vectara’s Retrieval Augmented (Grounded) Generation allows businesses to quickly, safely, and affordably integrate best-in-class conversational AI and question-answering into their application with zero-shot precision. Vectara never trains their models on customer data, allowing businesses to embed generative AI capabilities without the risk of data or privacy violations. To learn more about Vectara, visit www.vectara.com.

Media Contact
Carly Bourne
carly@bulleitgroup.com
423-443-0449


Primary Logo

This business broadcast service is brought to you by GlobeNewswire through syndication. We have not reviewed or endorsed the content. For any corrections and clarifications, please send it to GlobeNewswire Contact Us Page. If you still require further assistance, please contact our support team at businessbroadcast@asiaone.com.

homepage

trending

trending
    'It was so gross': Man left disgusted after finding maggots in meal at Hougang restaurant
    Chinese EV brand JMEV officially launches in Singapore with the Elight sedan
    Safeguards in place to deter fraudulent injury claims at workplace: MOM
    Gossip mill: Seventeen's Mingyu in Singapore for event, Babymonster's Chiquita receives hate presumably over Thai nationality, Jeon Somi recounts long chat with ghost
    E-Junkies: J-pop group Psychic Fever talk global goals and new EP
    Malaysia tourism group says LTA crackdown on illegal cross-border ride services at Changi Airport 'inconveniences travellers'
    Support local: FairPrice launches farmers market with Singapore-grown produce, includes exclusive plushies and more
    Edwin Goh and Rachel Wan's wedding to be for next year: 'There's still a lot of things we need to figure out'
    $12.8m Toto jackpot won by single ticket bought online
    28 arrested, luxury cars seized during anti-vice raids
    Part-time PHV driver who stopped suicide attempt among 38 recipients of MHA’s public spiritedness award
    JB car wash operators say 'unfair' after business declines amid govt clampdown over prioritising Singapore-registered cars

Singapore

Singapore
    • Tanjong Katong Road South repair works completed, to reopen in phases from Aug 2: LTA, PUB
    • PM Lawrence Wong to deliver National Day message on Aug 8
    • Tan Kiat How 'heartened' as vape disposal bin in Bedok half-filled in just 4 days
    • 'Proud of what they've done': Jetstar Asia CEO expresses gratitude to crew on airline's final day of operations
    • SGH medical team comes home after Dhaka mission to help treat burn victims from fighter jet crash
    • Man accused of raping woman who hired him to fix lights in her flat claims she made first move
    • Nearly 27kg of cocaine found in stuffed toys at Changi Airport, 5 foreigners arrested
    • ICA to issue no-boarding directives to prevent high risk travellers from entering Singapore
    • Primary school student approached by vape peddlers at Dover Road; school alerts authorities
    • Water supply issues during Toa Payoh blaze affected firefighting operations; SCDF investigating

Entertainment

Entertainment
    • 'I'm happy taking the audience seat': Andrew Seow, now auxiliary police officer, reflects on past acting career
    • Blackpink's Rose has a Singapore pop-up where you can recreate APT music video and pick up merch
    • Cha Eun-woo's Memories VR concert: Become his 'girlfriend' in romantic fantasy show
    • Joanne Peh opens up about dealing with fame and controversies
    • Blake Lively accused of harassment and intimidation
    • Miley Cyrus has special plans for Hannah Montana's 20th anniversary
    • Hulk Hogan secretly battled blood cancer before his death
    • Justin Timberlake diagnosed with 'relentlessly debilitating' Lyme disease
    • Pamela Anderson, reportedly dating Liam Neeson, says he puts her at ease during The Naked Gun filming
    • Anthony Mackie gives an update on Avengers: Doomsday

Lifestyle

Lifestyle
    • I try 11 new Michelin Bib Gourmand 2025 eateries to see if they're worth the hype, here's my honest take
    • Bak kut teh ramen, laksa shakshuka and chilli crab burgers: Celebrate National Day with these exclusive SG60 meals
    • Japanese restaurant Umi Nami to shutter, in yet another F&B business closure at Holland Village
    • Uniqlo launching T-shirt collection in collab with Pokemon Trading Card Game
    • Sierra Leone chimp refuge shuts doors to tourists to protest deforestation
    • A slice of America: Corvette makes its long-awaited debut in Singapore
    • Michelin-starred restaurant Alma by Juan Amador to shutter in August, plans to reopen with new concept
    • Second-generation owner of kueh tutu store Tan's Tu Tu Coconut Cake dies aged 63
    • Punggol Coast Hawker Centre just opened, look out for names like Singapore Fried Hokkien Mee and South Buona Vista Braised Duck
    • Premium Automobiles launches Chinese luxury EV brand Avatr in Singapore

Digicult

Digicult
    • Slim, sleek, but slightly too short-lived: Samsung Galaxy S25 Edge review
    • World's best Dota 2 teams to compete for $1m prize pool in Singapore in November
    • Sony RX1R III brings back the compact full-frame but not the Sony playbook
    • China's Premier Li proposes global AI co-operation organisation
    • 'They don't gaslight you': Why some Singaporean women like to spend on these virtual men
    • Elon Musk's Starlink network suffers rare global outage
    • Spy cockroaches and AI robots: Germany plots the future of warfare
    • 'Give a positive review': Hidden AI prompt found in academic paper by NUS researchers
    • 'Report 1 shop, another 10 appear': Hoyo Fest artists on copyright struggles
    • NTU penalises 3 students over use of AI tools; they dispute university's findings

Money

Money
    • Up 4.3%: Singapore's economy grew in Q2 despite US tariff fears
    • Trump says US will set 15% tariff on South Korean imports under new deal
    • Cathay Cineplexes operator mm2 hires debt restructuring specialist as it faces more payment demands; CEO Chang Long Jong to retire
    • 6 best travel insurance plans in Singapore (July 2025)
    • How to claim travel insurance? A comprehensive beginner's guide (2025)
    • Britain and India sign free trade pact during Modi visit
    • Long-time tech executive and Microsoft Singapore managing director Lee Hui Li dies
    • HDB launches 10,209 BTO and balance flats, as priority scheme for singles kick in
    • US-Philippines trade talks yield modest tariff shift after Trump-Marcos meeting
    • Indonesia to cut tariffs, non-tariff barriers in US trade deal

Latest

Latest
  • Putin, facing Trump deadline, signals no change in Russia's stance on Ukraine
  • Musk's X must face part of lawsuit over child pornography video
  • France starts airdrops of humanitarian aid into Gaza
  • Daily roundup: Edwin Goh and Rachel Wan's wedding to be for next year — and other top stories today
  • Relief in Southeast Asia as Trump's tariffs level playing field
  • Flooding leaves 14 dead, missing in Vietnam's Dien Bien
  • Germany to respond to any unilateral Israeli moves on Palestinian territories, minister warns
  • Chongqing residents seek shelter as heatwave hits China's southwest
  • Indonesian President Prabowo pardons political opponents

In Case You Missed It

In Case You Missed It
  • Discrimination and bias less likely than violence and insults to be viewed as unacceptable conduct between races: AsiaOne poll
  • 3-room and bigger Tampines, Toa Payoh BTO flats most popular with first-timers in July HDB launch
  • 'Count his lucky stars': Youth struck by taxi while dashing across Yio Chu Kang Road, netizens react
  • Tanjong Katong sinkhole: ItsRainingRaincoats raises $72,000 within 2 days for migrant workers who rescued woman
  • Tanjong Katong sinkhole: It should not have happened, says Grace Fu as panel convened to probe incident
  • Love scam: Man transfers $120k to online 'China girlfriend' of 2 years after sale of Ang Mo Kio flat
  • Toddler wanders out of home in Selangor, mauled by stray dogs
  • Mid-air brawl erupts on AirAsia X flight from KL to Chengdu over loud conversation
  • Robber drops gun and misfires after failed clinic robbery in JB
This website is best viewed using the latest versions of web browsers.