I just like to write about agents. I promised one or two friends to do an introductory post and share articles and videos that I find useful. This post is divided into seven parts:
What is an AI agent?
History of agents
Examples of agents
Problems of agents
Building an agent
What’s next for agents
What to follow to stay updated on agents.
All of these would be topics for separate articles, so take this as a starting point and for each section I provided some resources that I have read that might help you learn more.
Enjoy!
1. What is an AI agent?
Don’t worry if you don’t know the official and rigorous definition that everyone agrees on… Because it doesn’t exist.
Agents have only started to emerge more widely around April of 2023, so no surprise that the definitions are still quite empirical and there are more ways to characterize them.
To my understanding, an AI agent is an autonomous assistant that works on different tasks (often discussing them in a loop with a human user) and is powered by a large language model (LLM).
Agents are more powerful than simple assistants like all those customer support chatbots that just answer given questions and that’s it.
Agents can have longer memory and even load data and documents you provide them, they can plan, but most importantly, they use LLM tools. A “tool” (also called “action” or “function”, depending on the setting) can be the agent running code, searching the internet, connecting to your calendar, using your e-mail, or analyzing your data.
When I knew absolutely nothing about agents, I insisted on seeing how they looked “from the inside”. So technically, AI agents are just a piece of software. For example, this is a screenshot of the first (and very simple) AI agent I made. It was an agent that helped me write and publish code on GitHub.
From the outside, agents often have a nice UI so the end user can interact with them like a normal person, and not like developers. This is an example of an agent called “Cognosys” that can summarize daily news or draft emails for you.
To summarize, an AI agent is a piece of software that uses an LLM as its “brain” to complete a user’s tasks. It also often has long-term memory and uses various tools, like internet search, file management, or running data analysis tasks. It usually works in the loop with the human user, informing them about the steps it is going to take, or asking for feedback and the next steps.
Here is some of my favorite reading explaining the basics of agents.
Reading
LLM Powered Autonomous Agents by Lilian Weng. This post defines an agent as being able to plan, having short and long-term memory, and using tools.
What Are AI Agents — And Who Profits From Them? by Evan Armstrong. This explains what agentic workflow looks like, how good are agentic companies now, and what are they focusing on.
The Complete Beginners Guide To Autonomous Agents by Matt Schlicht. This is even less technical than Lilian’s post. “It is remarkable that simply wrapping an LLM inside a loop gets you an autonomous agent that can reason, plan, think, remember, learn — all on its own.”
Well WTF Are AI Agents? by Travis Fisher. Simply explained agents, why should you care
How will AI change our jobs? by Brian Hyeon-woo Cheong. I recommend this for more practical examples of AI agents (and generative AI apps in general), and whether they will replace us.
AI is about to completely change how you use computers by Bill Gates. Bill Gates emphasizes that agents are not just bots… And they are coming!
2. History of agents
It’s fun to call something “history” when it was last year, but here we go.
The first popular AI agent that made the technical community say “Wow, we have something new” was AutoGPT. It still exists, it’s an open-source project on GitHub, calling itself experimental, and it has a general purpose, meaning it aims to achieve any task.
As a side note, the benefit of products that have their code open-sourced is that everyone sees how it really works, can freely suggest improvements, or build something on top of it. I recently talked to a developer who is building a successful agent company and it all started with him playing with AutoGPT.
AutoGPT caught developers’ attention and the hype around agents started. The beginnings were crazy, and everyone was building their own agent.
After the first hype, the community went through a cool-down and developers realized that agents cannot replace humans (at least not yet) and they aren’t reliable at all or aren’t even that autonomous.
These old posts from 2023 are just a sneak peek of how people already saw that AI agents (but definitely not only that!) have problems. We will learn about the problems (and solutions) later in this text.
There has been a lot of research and papers done on agents. One of the early ones that is popular is the Voyager paper that introduced the idea of a curriculum, an iterative prompting mechanism, and a skills library for agents. I am adding it because it was made in Minecraft and it demonstrates examples of how agents use “tools” I mentioned in part 1. In this example, the tools are skills in Minecraft and the agent is discovering them.
The first agents had another big drawback for the general public: They often lacked a nice UI, such as a desktop or mobile app, so you had to run them via a terminal.
Below is the illustration of how you set up AutoGPT. If I didn’t code I probably wouldn’t understand this at all. Many other agents also look like this and lack a good user experience - building complex products around agents is in my opinion still the most underrated part.
Reading
The Anatomy of Autonomy: Why Agents are the Next AI Killer App after ChatGPT by Swyx. I recommend the part about AutoGPT which helps you understand how the early agents looked.
AI Agents in the Wild by me and the E2B team. One of the early articles covering the agents' landscape and how it looked in 2023.
… And some interviews with founders that show the early struggles (that were later overcome and these are very popular products a year later!)
About deployment, evaluation, and testing of agents with Sully Omar, the CEO of Cognosys AI.
David Zhang from Aomni gives his view on agents' reliability, debugging and orchestration.
Examples of agents
Explaining something on examples often helps the most. Just a few examples of the most used agents today are:
Perplexity for answering your question using also internet search
Cognosys for automating your daily tasks
Flint for education
GitHub Copilot for coding
Gumloop for automating your business workflows.
Here is the whole map of different agents, sorted by use cases.
You can see that common use cases are coding, data analysis, or productivity (e.g. helping you with emails, communication, and generating or correcting texts).
There are many agents in the “general purpose” category, but that is mostly because at the beginning, many people started building something generic. We are not there yet with building agents that can do anything, so many developers choose to build specialized agents that are great at particular tasks, or build deterministic software where agents are just a small part of a bigger workflow.
Note that often, the product or app leveraging AI agents is not called “agent”, but agents are there as an underlying feature.
Reading
Awesome AI Agents. A database of 200+ agents sorted by use cases.
My selection of daily-life AI tools. I tested and rated over 30 AI agents and tools. There are already many new since then, but you can still check it to see what’s out there.
Problems of agents
As you can guess, many of the challenges AI agents face stem from the properties of the underlying LLM.
We are experiencing an interesting shift from “traditional” software to LLM-powered software where AI agents serve as an intermediary between a human user and a computer performing tasks. Some of the agents are even able to make changes in your files, access your apps, or control your operating system. Below is an example of a popular AI agent called Open Interpreter that has capabilities exactly like this, and explicitly mentions the risks.
The potential of agents is great, but there are many problems that prevent agents from faster adoption by the general public and enterprises:
Unreliability and unpredictability. You are never 100% sure what the agent decides to do, so hypothetically, it can decide to do a very risky step like delete something important from your computer if you grant it access to your system.
Hallucinations of LLM. This happens especially if the agent doesn’t have a code interpreter type of engine that would improve its reasoning. If agents cannot run code (e.g. perform more sophisticated analysis or do calculations), they often rely on providing an answer that is made up.
Security and data privacy concerns
Cost of running the underlying LLM
The entry barrier for nontechnical people (e.g., missing UI)
Observability and monitoring
The AI agents market is in its early stages. The necessary tools for agents and the whole ecosystem are still to be improved, we need more iterations and more educational content … We need just a few more months or years.
Reading
The State of AI Agents. A blog post mentioning the hesitation of enterprises to adopt agents and the need for developing agent-specific solutions.
Why AI Agents Don't Work (yet) - with Kanjun Qiu of Imbue. Why Kanjun believes we are still in the “bare metal phase” of agent development.
Limitations of Running AI Agents Locally Developers are equipping their agents with the ability to execute LLM-generated code output. But doing this locally in your computer, instead of remotely in the cloud can pose some risks.
Amdahl's Argument For AI by Max Rumpf. I recommend this article about the problem of humans in the loop with AI agents and how it limits speed.
Mitigate GPT-4 Hallucinations using Code Interpreter by Aditya Advani. About the problem that all LLMs have: Hallucinations.
GoEx: Mitigating the Risk of LLM-Generated Output. This blog post about a new paper with LLM runtime starts with summarizing the negatives of LLM-powered apps.
Building an agent
What do you need to build AI agents?
I won’t go into detail about building agents as this blog post is introductory, but these are some important parts of agents’ anatomy that you might need when building it.
The arms and legs represent examples of tools (that I mentioned at the beginning.)
Frameworks
One way of building agents is starting with agentic frameworks (like templates with some things already prepared), which give you some functionalities for the beginning, but don’t allow that much freedom in setting up your agent exactly how you like it.
The most popular framework for building agents is LangChain. Its benefit is in being easy to switch between LLMs, but it has a lot of new concepts and terms in its docs, so for me personally, it was difficult in the beginning.
There are also multi-agent frameworks where you can create agents with different roles (like Scientist, Manager, or Product Designer) working together. The two most popular are AutoGen and CrewAI. The drawback might be that they have their own concepts which you might get used to and then struggle to switch.
My favorite way is building agents without any framework, just with code. You will usually need just an API key for the LLM you decide your agent to use.
However, if you have a non-technical background, there are also no-code frameworks where you can set up your own agent and use it.
LLMs
These are some popular LLMs that I personally tried as a beginner. There are more of them and also, they differ in performance, speed, and cost. I have played with
Databases
Databases are for giving the agent long-term memory, and not just a short context window.
I don’t personally have enough experience with connecting databases, but the popular ones for agents are Pinecone, Activeloop, Weaviate, or MindsDB.
A popular solution to give agents memory is called RAG (Retriever-Augmented Generation) and you will hear about it a lot if you start building agents.
Code interpreter
You can give your AI agent a code interpreter by adding E2B sandboxes where the agent can securely run its code in an isolated cloud environment. There are also alternatives to do it, for example with Docker, but that doesn’t provide the same solution to problems like security.
Reading
Practices for Governing Agentic AI Systems by Yonadav Shavit, Sandhini Agarwal, Miles Brundage. This paper describes good practices and considerations for building with agents.
Microsoft’s AutoGen — A guide to code-executing agents. A simple guide to installing AutoGen and doing data analysis with it.
Better RAG 1: Basics by Hrishi Olickel. These are series on how to build memory systems for AI agents.
How to add code interpreter to Llama 3. A simple guide to making your AI agent more powerful with code interpreter capabilities.
Guide to AI Developer that Makes Pull Requests for You. This is about how I built my first agent. The agent isn’t that good, but it was also my first coding project in general, so I tried to make it beginner-friendly.
What’s next for agents
Some strong trends today are building very powerful AI software engineer agents, which started with Devin by Cognition Labs, exploring generative UI (apps that use AI to change their interface based on your exact needs), and giving agents code interpreters for better performance. Agents with code interpreters have the ability to run code (not only generate it for you as a text) and provide you with better output, like a chart it plotted based on your data, or a whole app that it also tested.
I think we are still close to the experimental phase but moving towards agents being adopted by bigger companies and becoming “legit”.
If we discuss the future of agents, it’s fun thinking about the future of the whole software. This diagram of future software has been discussed a lot. It has LLMs as a center, and LLMs are exactly what powers AI agents.
Reading
Future of Autonomous Agents by Yohei Nakajima. This is a nice summary of where agents are going
LLM-powered code interpreters. A quite simple intro to what it means that agents have code interpreters.
Open-Source Alternatives to Devin. Yes, there are controversies around how well Devin actually works. I don’t care, but I do care about open-source “devins” that everyone can try.
How Do AI Software Engineers Really Compare To Humans? by Harry Tormey. About
Four Reasons Your Agent Needs Code Interpreter by me and Vasek Mlejnsky. About how code interpreters improve reasoning, reduce hallucinations, and more.
AI Agents Need Brains. This article presents the results of agents with or without code interpreters and why it makes agents much better.
What to follow to stay updated on agents
I am not a fan of following news just for the sake of staying updated. I think the best way to get into AI is to start building something or find a way to smartly incorporate AI into whatever you are already doing in your life.
However, if I had to choose a source for daily AI news, I would suggest building a quality X (Twitter) feed over alternatives like subscribing to newsletters. On X, you can see more “behind the scenes” thinking, and ask questions. These are some of my favorites.
X (Twitter) profiles
AI YouTube channels
I am more of a reading/writing than listening/talking type, but one channel I recommend for very practical AI knowledge is the YouTube channel of James Murdza. He is teaching people to code basically anything, and some of the videos are on generative AI apps and AI assistants. If you have an idea for an AI agent you want to build, you can apply there (even without any knowledge of programming).
AI newsletters and podcasts
To be honest, I don’t read newsletters and I am not a fan of that, because it’s often what I have already seen on Twitter, or it’s many pieces of high-level information that I would need to study more to get something out of it.
I think the Latent Space newsletter and podcast are of really great quality though.
Thank you for reading!
I will be grateful if you share any good learning resources on AI agents or thoughts you have on this post.
You can also follow me on X (Twitter) for more content.
Thank you )