SAI Notes #08: LLM based Chatbots to query…

Jun 25, 2023

And how you can build one easily.

9 Comments

Jun 27, 2023

Largely the accepted way of doing this for now, but it's far from perfect. In my experience your system prompt has to be very exact in order to stop it hallucinating. The other problem is with chunking the data itself - if you're passing several large documents and the chunks are of set length, it can and often will infer bad context just based on chunk cutoffs. Small, semi-structured individual documents seem to perform far better than chunking.

Expand full comment

Reply (2)

Colin

Jun 30, 2023

This could be mitigated by breaking longer documents into overlapping chunks. This would certainly increase the number of "documents" in the vector DB, but it would make bad context based on bad chunking much less likely.

Expand full comment

Aurimas Griciūnas

Jun 27, 2023

Yes, I agree 100 % that we are bound to adopt new ways of doing Knowledge Base search in the next year or two. All of the limitations are current.

Expand full comment

RLLMing in time.

Jun 25, 2023

Can you an end to end project that covers:

RAG assistance In LLM based chatbots for context and reducing mistakes

Fine tuning open source LLM mode

Mixe media vector embedding

Re ranking things like that..

It will be onebigassproject for sure.

Expand full comment

Reply (1)

Aurimas Griciūnas

Jun 25, 2023

I will be doing a lot of e2e projects as part of this newsletter starting from next month :)

Expand full comment

Reply (2)

RLLMing in time.

Jun 25, 2023

Please breez over "chat with your pdfs".. "use llms for emails " things like that..

Even if you do one project let it be realistic and end to end which can be used in say like a real company setting.

I would pay for such thing (~100 bucks)

Expand full comment

RLLMing in time.

Jun 25, 2023

Also I would like to contribute for free towards e2e projects if you like.

Expand full comment

Eros

Mar 25, 2024

Hi Aurimas, have there been any updates to the process you described above since last June? Or is this still the widely accepted process/method of building this out? Curious to hear your thoughts.

Expand full comment

person456

Jul 19, 2023

Perhaps newb question: for step 9. on the last solution, is the LLM being described a commercial public API? Assuming it doesn't mean an independently trained one. Thanks

Expand full comment

SwirlAI Newsletter

SAI Notes #08: LLM based Chatbots to query…