Reliable, fully local RAG agents with LLaMA3
LLaMA3のリリースにより、信頼性の高いローカルエージェント(たとえば、あなたのノートパソコン上で実行可能なもの)への関心が高まっています。ここでは、LangGraphとLLaMA3-8bを使用して信頼性の高いローカルエージェントをゼロから構築する方法を紹介します。私たちは、3つの先進的なRAG論文(Adaptive RAG、Corrective RAG、Self-RAG)からアイデアを組み合わせて単一の制御フローを作成します。この制御フローは、@nomic_ai&@trychromaによるローカルベクトルストア、@tavilyaiによるウェブ検索、そして@ollama経由でLLaMA3-8bを使ってローカルで実行されます。
この記事では、LangGraphとLLaMA3-8bを使用した信頼性の高いローカルエージェントの構築方法やその重要性について詳しく解説しています。AI技術がどれだけ革新的で魅力的かを伝えるポイントとしては、自己修正能力や適応能力など最新の研究成果を取り入れた制御フローが挙げられます。また、開発者向けにGitHub上で公開されているコードへのリンクも提供されており、興味がある読者は実際に手を動かすことが可能です。AI技術の最先端を追求する方々にとって必見の記事と言えるでしょう。
This is insanely good! I had a idea similar to this but never such well implemented as a AI new Bee.
dont bullshit us with stupid tutorials thatd doesnt serve us any purpose ! go and buil;d a fuckng end to end conversational bot with ur langraph thta takes input, process it as more queries, send to differenet nodes asks more questions with llm and generate result with grader
🚀🚀
Cant see the code well, can you make it bigger please
no files in you submit github😁
code work flawless on mac m2. But, fails at vector store indexing step on windows PC i9 processor, 64GB RAM, with nvidia 4090.. error "12:21:14.611 [error] Disposing session as kernel process died ExitCode: 3221225477, Reason: Failed to load llamamodel-mainline-cuda-avxonly.dll: LoadLibraryExW failed with error 0x7e
Failed to load llamamodel-mainline-cuda.dll: LoadLibraryExW failed with error 0x7e"
That’s really awesome and very useful! I literally have implemented a similar flow today, using another langraph use-case, but the fallback workflow at the end makes much more sense to increase answer quality. Thanks and brilliant communicated.
Thats okay, but can you make model to use specific role?
Yes. Very Useful. Especially running 'reliably' on my local machine (in this case MS_Win with NVidia GPU") !
Thank You. Yet Again !!!!
How to integrate a knowelge graph to increase accuracy
Great video. Advanced concepts but simple to understand.
Excellent video! Thank you. Would you know how to handle the potential case that the agent goes into infinite loop, e.g. it gets stuck at the hallucinating check. I can only think of keeping track of the threshold for number of checks, and am wondering if there's a more elegant way to do that in Langchain.
succinct!
Thanks for the amazing video Lance! Very clear explanation, this is really helpful to my work too.
I really like the graphic for the workflow, what tools that you used for that?
i have a question, for example i build a agentic RAG application, this application has multiple LLMs working together (router, grader, generater, hallucination_checker, etc…) is every single LLM are called Agent or the whole application is an Agent ? (because i saw an information that agents break a task into multiple tasks).
also the chat prompt template for each LLM in the application, is it considered as prompt engineer ?
amazing video now imagine if you can implement Self-Supervised Learning to check it self and make sure the output will delivered witout errors and witout missuse of resource…
Incredible. Great stuff brotha. Thank you.
Link to the code? Thanks for the video.
Nice. This is great content. I am gonna run it with phi-3. One Question:
Can I use a ReactAgent and provide multiple control flows as tools?
How important is the chunk size and what is the best way t set it up?
a great challenge would be to accurately ascertain whether the model is capable of answering the question/topic itself or whether external tooling such as web browsing is required. I haven't been able to do this yet with llama3. I guess I haven't managed to find the correct routing prompt (a stage after the initial routing)