SAI Notes #08: LLM based Chatbots to query your Private Knowledge Base.
And how you can build one easily.
👋 I am Aurimas. I write the SwirlAI Newsletter with the goal of presenting complicated Data related concepts in a simple and easy-to-digest way. My mission is to help You UpSkill and keep You updated on the latest news in Data Engineering, MLOps, Machine Learning and overall Data space.
This week in the weekly SAI Notes:
Why can’t you use available commercial LLM APIs out of the box to query your Internal knowledge base.
Different ways you could try to implement the use case.
Easy way to build a LLM based Chatbot to query your Internal Knowledge Base using Context Retrieval outside of LLM.
There is an increasing number of new use cases discovered for Large Language Models daily. One of the popular ones I am constantly hearing about is document summarisation and knowledge extraction.
Today we look into what an architecture of a chatbot to query your internal/private Knowledge Base that leverages LLMs could look like and how a Vector Database facilitates its success.
The article assumes that you are already familiar with what a LLM is. What is important here is the most simple definition of it - you provide context/query in the form of a Prompt to the LLM and it returns an answer that you can use for the downstream work.
Let’s return to our use case, a Chatbot that would be able to answer questions only using knowledge that is in your internal systems - think Notion, Confluence, Product Documentation, PDF documents etc.
Implementation approaches one could consider.
Why can't we just use a pure commercial LLM API like ChatGPT powered by e.g. GPT-4 to answer the question directly?
What are the problems with this approach?
Commercial LLMs have been trained on a large corpus of data available on the internet. The data has more irrelevant context about your question than you might like.
The data contained in your internal systems of interest might have not been used when training the LLM - it might be too recent and not available when the model was trained. Or it could be private and not available publicly on the internet. At the time of writing only data up until September 2021 is included in training of OpenAI GPT-3.5 and GPT-4 models.
Currently available LLMs have been shown to produce hallucinations inventing data that is not true.
The next approach one could think off:
Formulate a Question/Query in a form of a meta-prompt.
Pass the entire Knowledge Base from internal documents together with the Question via a prompt.
Via the same meta-prompt, instruct the LLM to only use the previously provided Knowledge Base to answer the question.
What are the problems with this approach?
There is a Token Limit of how much information you can pass to LLM as a prompt, it will vary for different commercially available LLMs.
At the time of writing the input context length varies from ~4.000 to ~33.000 tokens for different types of LLMs commercially available from OpenAI.
1.000 tokens map roughly to 750 words, so the maximum prompt length is ~25.000 words. It is a lot and could fit large amounts of information but will most likely not be enough for larger Knowledge Bases.
There are workarounds of how you could push more context to LLM like trying to compress the context or split it into several inputs.
Even if you would be able to pass the entire knowledge base to the prompt, Commercial LLM interface providers charge you for the amount of tokens you provide as a prompt. Additionally, there is yet no way to cache the knowledge base on the LLM side that would allow you to pass the relevant data corpus only once. Consequently, you would need to pass it each time you ask the question, which would inflate the price of API use tremendously.
One more approach could be:
Use an OSS model.
Fine-tune it on the corpus of your internal data.
Use the fine-tuned version for your chatbot.
What are the problems with this approach?
OSS models are inherently less accurate than commercial solutions.
You will need specialized talent and a large amount of time to solve the fine tuning challenge internally.
Fine Tuning does not solve the hallucination challenge of LLMs.
Hosting the LLM in house will be costly and require dedicated talent as well.
Input size limit is only partially solved, passing the Knowledge corpus each time when asking a question would be still too expensive.
So what is an easy to implement alternative?
Keep reading with a 7-day free trial
Subscribe to