Anti-hallucination Benchmark Confirms CustomGPT.ai...

Jul 18, 2024, 14:16 ET

Nearly 1,000 questions across diverse datasets used to measure answer reliability and response time

BOSTON, July 18, 2024 /PRNewswire/ — Just months after demonstrating answer quality superiority over OpenAI, Google, Amazon, Cohere, and others, CustomGPT.ai again excelled in RAG benchmark analysis comparing its generative AI platform with OpenAI’s Assistant API V2. In testing involving 945 questions across nine diverse data sets, CustomGPT.ai outperformed OpenAI by achieving a 10 percent lower hallucination rate, 13 percent higher accuracy rate, and 34 percent faster average response time.

“In today’s AI race, companies must adopt an ‘anti-hallucination first’ focus,” said CustomGPT.ai founder and CEO Alden Do Rosario. “We founded our company on this premise, and we’re thrilled new research further validates our technology, especially for the 6,000-plus customers we now serve.”

As organizations bring AI into their operations, they take on responsibility for the information it generates. “To reduce risk, entities should adequately vet foundational AI technology and use solutions that are proven,” added Do Rosario.

As skeptics in both B2C and B2B question AI’s reliability, precision, and performance, Do Rosario believes these findings will especially resonate in industries where accuracy is paramount such as legal sectors, finance, healthcare, and education.

Hallucinations — instances where AI generates information not grounded in reality or provided context — can contribute to misinformed decision-making, compliance issues, safety risks, and erosion of trust in AI. This research highlights nuanced differences in context and reliability by showing more comprehensive and accurate responses generated by CustomGPT.ai compared to OpenAI’s answers which often lack detail or completely miss the mark.

Validated by Tonic.ai, a pioneer in data mimicking and de-identification, the research supports the use of Retrieval-Augmented Generation (RAG) to help mitigate AI hallucinations and support delivery of more precise and reliable information.

Foundationally, this benchmark went far beyond recent research by using 945 rather than 55 questions and testing against nine datasets spanning topics from public health to literature rather than one single dataset. An ‘answer consistency binary’ metric was also used whereby any deviation from the expected answer resulted in a failed response.

Do Rosario said this research significantly ups the ante for statistical significance, data diversity, and scoring rigor.

“Gone are the days of organizations needing to settle for chatbots that generate inaccurate responses, especially from short-sighted, underperforming, or overpriced AI vendors,” he stated. “The future is wide open for gen AI to responsibly deliver comprehensive and contextually accurate information in order to truly help organizations advance decision-making capabilities, improve operational efficiency and increase revenues.”

About CustomGPT.ai

CustomGPT.ai offers a novel, business-grade, privacy-first, no-code/low-code generative AI platform. The technology makes it quick, easy, and affordable for anyone — regardless of technical expertise — to ingest their own content and data, to build custom bots and other GPT agents, and to deploy these solutions with confidence. CustomGPT.ai leverages advanced large language models (including OpenAI’s GPT-4) to offer the industry’s best accuracy and anti-hallucination protection. Nearly 6,000 entities rely on CustomGPT.ai to deliver SOC2-compliant solutions that improve operational efficiency, enhance customer engagement, and increase sales – including Adobe, the Massachusetts Institute of Technology, the Dominican Republic’s GPTLegal, and the UK’s DivorceOnline. REST APIs and SDKs are available for developers, ISVs, digital agencies, and resellers. Visit https://customgpt.ai or contact hello@customgpt.ai.

Contact:
Beth Strohbusch
beth@customgpt.ai
(414) 213-8818

SOURCE CustomGPT.ai