3 Ways to Stop LLM Hallucinations

How retrieval-augmented generation, reasoning and iterative querying help large language models reply accurately to prompts.

Sep 15th, 2023 10:15am by Alan Ho

Featued image for: 3 Ways to Stop LLM Hallucinations

Large language models have become extremely powerful today; they can help provide answers to some of our hardest questions. But they can also lead us astray: They tend to hallucinate, which means that they give answers that seem right but aren’t.

LLMs hallucinate when they encounter queries that aren’t part of their training data set — or when their training data set contains erroneous information (this can happen when LLMs are trained on internet data, which, as we all know, can’t always be trusted). LLMs also don’t have memory. Finally, “fine tuning” is often regarded as a way to reduce hallucinations by retraining a model on new data — but it has its drawbacks.

Here, we’ll look at three methods to stop LLMs from hallucinating: retrieval-augmented generation (RAG), reasoning and iterative querying.

Retrieval-Augmented Generation

With RAG, a query comes into the knowledge base (which, in this case, is a vector database) as a semantic vector — a string of numbers.

The model then retrieves similar documents from the database using vector search, looking for documents whose vectors are close to the vector of the query.

Once the relevant documents have been retrieved, the query, along with these documents, is used by the LLM to summarize a response for the user. This way, the model doesn’t have to rely solely on its internal knowledge but can access whatever data you provide it at the right time. In a sense, it provides the LLM with “long-term memory” that it doesn’t possess on its own. The model can provide more accurate and contextually appropriate responses by including proprietary data stored in the vector database.

An alternate RAG approach incorporates fact-checking. The LLM is prompted for an answer, which is then fact-checked and reviewed against data in the vector database. An answer to the query is produced from the vector database, and then the LLM uses that answer as a prompt to discern whether it’s related to a fact.

Reasoning

LLMs are good at a lot of things. They can predict the next word in a sentence, thanks to advances in “transformers,” which transform how machines understand human language by paying varying degrees of attention to different parts of the input data. LLMs are also good at boiling down a lot of information into a concise answer, and finding and extracting something you’re looking for from a large amount of text. Surprisingly, LLMS can also plan — they can gather data and plan a trip for you.

And maybe even more surprisingly, LLMs can use reasoning to produce an answer, in an almost human-like fashion. Because people can reason, they don’t need tons of data to make a prediction or decision. Reasoning also helps LLMs to avoid hallucinations. An example of this is “chain-of-thought prompting.”

This method helps models to break multistep problems into intermediate steps. With chain-of-thought prompting, LLMs can solve complex reasoning problems that standard prompt methods can’t (for an in-depth look, check out the blog post Language Models Perform Reasoning via Chain of Thought from Google).

If you give an LLM a complicated math problem, it might get it wrong. But if you provide the LLM with the problem as well as the method of solving it, it can produce an accurate answer — and share the reason behind the answer. A vector database is a key part of this method, as it provides examples of questions similar to this and populates the prompt with the example.

Even better, once you have the question and answer, you can store it in the vector database to further improve the accuracy and usefulness of your generative AI applications.

There are a host of other reasoning advancements you can learn about, including tree of thought, least to most, self-consistency and instruction tuning.

Iterative Querying

The third method to help reduce LLM hallucinations is interactive querying. In this case, an AI agent mediates calls that move back and forth between an LLM and a vector database. This can happen multiple times iteratively in order to arrive at the best answer. An example of this is forward-looking active retrieval generation, also known as FLARE.

You take a question and then query your knowledge base for similar questions. You’d get a series of similar questions. Then you query the vector database with all the questions, summarize the answer, and check if the answer looks good and reasonable. If it doesn’t, repeat the steps until it does.

Other advanced interactive querying methods include AutoGPT, Microsoft Jarvis and Solo Performance Prompting.

There are many tools that can help you with agent orchestration. LangChain is a great example that helps you orchestrate calls between an LLM and a vector database. It essentially automates the majority of management tasks and interactions with LLMs and provides support for memory, vector-based similarity search, advanced prompt-templating abstraction and a wealth of other features. It also helps and supports advanced prompting techniques like chain-of-thought and FLARE.

Another such tool is CassIO, which was developed by DataStax as an abstraction on top of our Astra DB vector database, with the idea of making data and memory first-class citizens in generative AI. CassIO is a Python library that makes the integration of Cassandra with generative artificial intelligence and other machine learning workloads seamless by abstracting the process of accessing the database, including its vector search capabilities, and offering a set of ready-to-use tools that minimize the need for additional code.

Putting It All Together: SkyPoint AI

SkyPoint AI is a SaaS provider specializing in data, analytics and AI services for the senior care and living industry. The company leverages generative AI to enable natural and intuitive interactions between seniors, caregivers and software systems. By simplifying complex applications and streamlining the user experience, SkyPoint AI empowers seniors and caregivers to access information and insights effortlessly, which helps enhance care.

The company pulls from a wide variety of data that is both structured and unstructured to provide AI-generated answers to prompts like “How many residents are currently on Medicare?” said SkyPoint chief executive Tisson Mathew. This helps care providers make informed decisions quickly, based on accurate data, he said.

Getting to that point, however, was a process, Mathew said. His team started by taking a standard LLM and fine-tuning it with SkyPoint data. “It came up with disastrous results — random words, even,” he said. Understanding and creating prompts was something SkyPoint could handle, but it needed an AI technology stack to handle generating accurate answers at scale.

SkyPoint ended up building a system that ingested structured data from operators and providers, including electronic health-care record and payroll data, for example. This is stored in a columnar database; RAG is used to query it. Unstructured data, such as policies and procedures and state regulations, is stored in a vector database: DataStax Astra DB.

Tisson posed a question as an example: What if a resident becomes abusive? Astra DB provides an answer that is assembled based on state regulations and the users context, and a variety of different documents and vector embeddings in natural language that’s easy for a senior-care facility worker to understand,

“These are specific answers that have to be right,” Tisson said. “This is information an organization relies on to make informed decisions for their community and for their business.”

Conclusion

SkyPoint AI illustrates the importance of mitigating the risk of AI hallucinations; the consequences could be potentially dire without the methods and tools available to ensure accurate answers.

With RAG, reasoning and iterative querying approaches such as FLARE, generative AI — particularly when fueled by proprietary data — is becoming an increasingly powerful tool to help enterprises serve their customers efficiently and effectively.

Learn more about how DataStax helps you build real-time, generative AI applications.

Alan Ho is a product manager and entrepreneur with a passion for advancing artificial intelligence and computing. He is currently working on adapting DataStax technology for generative AI. Alan founded a startup company that provides application performance management for mobile...