AI agents need brains

Why most AI-powered assistants cannot really think

Mar 14, 2024

When you ask ChatGPT what is the meaning of life, it can start an interesting discussion. Similarly, AI assistants help you well with tasks like

Please write me an essay on the topic of sustainable energy transition.
Please act as my therapist.
Please write me a receipt for cooking pasta.

What do these tasks have in common? The AI assistant uses words and sentences it already “heard” somewhere. (Which technically means it has been shown a lot of data, and based on that, it tries to mathematically predict what words to tell you so it most likely makes sense in the situation).

Yes, the agent or assistant can still be given access to the internet to find some fresh data for you, but that doesn’t change the fact that it only parrots what it has seen somewhere.

But what if you want your AI agent to help with tasks like this:

Please simulate 100 games of poker. Tell me the outcomes.
Please write the Guess the Number game and write a test for it, make sure all tests pass using your Python environment.
Please analyze NVIDIA stock and visualize its development until 2030.
Please play a Blackjack game with me.

These kinds of tasks that require some reasoning demonstrate how the AI agent without the brain fails. You have to apply more reasoning and do these tasks smartly.

AI agents equipped with code interpreters

What is that “brain exactly in the context of AI agents and assistants? It is precisely a code interpreter powering the agent.

A code interpreter is a type of program that reads and executes instructions written in a programming language.

Since the boom of AI agents in 2023, developers have been building code interpreters that include an intermediary in the form of an AI agent or assistant translating human prompts written in natural language into the code instructions for computers.

Read more about how code interpreters work inside of AI agents in my other article.

How to test whether an AI agent has a “brain”

I will now compare three very popular AI agents:

Perplexity. This AI-powered chatbot has garnered about 10 million monthly users and is equipped with a search connected to the internet, which means access to real-time data.
ChatGPT with Data Analysis plugin. Everyone knows ChatGPT, but it is even better with the data analysis feature.
Open Interpreter. This 100% open-source agent which reached over 40 000 GitHub stars, is runnable from a terminal but soon in a desktop app.

The four tasks I am going to give them are fun ways to distinguish what underlying processes the agent uses and whether it can “think” (e.g. is powered by a code interpreter). Let’s see how the agents deal with them and understand the process of what is going on inside the “mind” of each of them.

1. Please simulate 100 games of poker. Tell me the outcomes.

In this task, you can observe how Perplexity immediately starts talking about how it would do it. A lot of talking, but no doing, right?

Perplexity:

Even though it understands the task and answers to its best knowledge, Perplexity couldn’t actually run any simulations, it would have to do it all in text format, which would be incredibly annoying and long.

ChatGPT:

Tasks like this require some level of automation, which often means using code. The ChatGPT Data Analyst first outlines the steps to completing the tasks…

… But then it proceeds with running the simulation with code. Note that it needs the ability to execute the code, in order to give you the result.

Open Interpreter:

I am testing the third agent, Open Interpreter, in my terminal, so my apologies that it doesn’t have a nice web UI and I cannot give you a link to the result. Anyway, it also manages to complete the task with code, and outlines the steps similarly to ChatGPT…

…But it runs really much more code than ChatGPT, so the result can be more precise. (ChatGPT made a mistake and only provided a simulation of 10 runs, even if it used code too).

The point is, that both ChatGPT and Open Interpreter can use “brain” (underlying code interpreter) to perform the task, while Perplexity doesn’t have this capability at this moment, so it only relies on searching on the internet and talking.