Agentic AI: On Evaluations
This article will go through common metrics for multi-turn chatbots, RAG, and agentic applications. While also reviewing frameworks like DeepEval, RAGAS, and OpenAI’s Evals library, so you know when to pick what.
Explore trending keywords in tech with this API → Try the playground
I'm Ida, self-taught developer and technical blogger (within AI) based in Stockholm. Professional experience in product management and marketing. Educated as an economist. Generalist that loves to work on several parts of a business. Always happy to be creative and work on challening problems.
This article will go through common metrics for multi-turn chatbots, RAG, and agentic applications. While also reviewing frameworks like DeepEval, RAGAS, and OpenAI’s Evals library, so you know when to pick what.
If you have ever worked with LLMs, you know they are stateless. That is, they have no short-term memory. To make sure the LLM can pull up the right facts, understand previous conversations, and connect information, we need to build some fairly complex systems.
An AI knowledge agent can dig through internal documentation — websites, PDFs, random docs — and answer employees in Slack (or Teams/Discord) within a few seconds. I will go through what it takes to build one using using LlamaIndex and Modal.
Did you know there are dozens of open-source agentic frameworks out there? I have briefly tested some of the more popular ones to get a feel for how they work and how easy they are to get started with.