When you ask ChatGPT what is the meaning of life, it can start an interesting discussion. Similarly, AI assistants help you well with tasks like
Please write me an essay on the topic of sustainable energy transition.
Please act as my therapist.
Please write me a receipt for cooking pasta.
What do these tasks have in common? The AI assistant uses words and sentences it already “heard” somewhere. (Which technically means it has been shown a lot of data, and based on that, it tries to mathematically predict what words to tell you so it most likely makes sense in the situation).
Yes, the agent or assistant can still be given access to the internet to find some fresh data for you, but that doesn’t change the fact that it only parrots what it has seen somewhere.
But what if you want your AI agent to help with tasks like this:
Please simulate 100 games of poker. Tell me the outcomes.
Please write the Guess the Number game and write a test for it, make sure all tests pass using your Python environment.
Please analyze NVIDIA stock and visualize its development until 2030.
Please play a Blackjack game with me.
These kinds of tasks that require some reasoning demonstrate how the AI agent without the brain fails. You have to apply more reasoning and do these tasks smartly.
AI agents equipped with code interpreters
What is that “brain exactly in the context of AI agents and assistants? It is precisely a code interpreter powering the agent.
A code interpreter is a type of program that reads and executes instructions written in a programming language.
Since the boom of AI agents in 2023, developers have been building code interpreters that include an intermediary in the form of an AI agent or assistant translating human prompts written in natural language into the code instructions for computers.
Read more about how code interpreters work inside of AI agents in my other article.
How to test whether an AI agent has a “brain”
I will now compare three very popular AI agents:
Perplexity. This AI-powered chatbot has garnered about 10 million monthly users and is equipped with a search connected to the internet, which means access to real-time data.
ChatGPT with Data Analysis plugin. Everyone knows ChatGPT, but it is even better with the data analysis feature.
Open Interpreter. This 100% open-source agent which reached over 40 000 GitHub stars, is runnable from a terminal but soon in a desktop app.
The four tasks I am going to give them are fun ways to distinguish what underlying processes the agent uses and whether it can “think” (e.g. is powered by a code interpreter). Let’s see how the agents deal with them and understand the process of what is going on inside the “mind” of each of them.
1. Please simulate 100 games of poker. Tell me the outcomes.
In this task, you can observe how Perplexity immediately starts talking about how it would do it. A lot of talking, but no doing, right?
Perplexity:
Even though it understands the task and answers to its best knowledge, Perplexity couldn’t actually run any simulations, it would have to do it all in text format, which would be incredibly annoying and long.
ChatGPT:
Tasks like this require some level of automation, which often means using code. The ChatGPT Data Analyst first outlines the steps to completing the tasks…
… But then it proceeds with running the simulation with code. Note that it needs the ability to execute the code, in order to give you the result.
Open Interpreter:
I am testing the third agent, Open Interpreter, in my terminal, so my apologies that it doesn’t have a nice web UI and I cannot give you a link to the result. Anyway, it also manages to complete the task with code, and outlines the steps similarly to ChatGPT…
…But it runs really much more code than ChatGPT, so the result can be more precise. (ChatGPT made a mistake and only provided a simulation of 10 runs, even if it used code too).
The point is, that both ChatGPT and Open Interpreter can use “brain” (underlying code interpreter) to perform the task, while Perplexity doesn’t have this capability at this moment, so it only relies on searching on the internet and talking.
2. Please write Guess the Number game and write a test for it, make sure all tests pass using your Python environment.
Note that even though Perplexity could generate code too, it can’t execute it (like the other two can). This is nicely shown by the next task.
Perplexity:
Perplexity can generate code for the game, and if you copy the code into your editor and run it, it could work well. But you would have to do it yourself, which is an additional step.
ChatGPT:
In comparison, ChatGPT writes the plan first…
… And then writes code for the game, but also executes a test that checks that the game is functioning.
Open Interpreter:
The same basically happens with Open Interpreter. It‘s not visible in the screenshot, but it also decided to even test the game after creating it. Great agents think (but also act) alike.
3. Please analyze NVIDIA stock and visualize its development until 2030.
Perplexity:
This is another task where Perplexity doesn’t stand a chance. Again, it provides text answer.
ChatGPT:
ChatGPT Data Analyst produced a nice chart…
… And used code for that, including for example installing Python packages required for data analysis.
Open Interpreter:
Open Interpreter did the same…
… It even provided historical data first, which opened in my browser without warning. (As a side note, running AI-generated code locally on your computer can be even risky)…
… And finally, it created a chart with prediction. See that the result is different from ChatGPT, but not that different.
4. Please play a Blackjack game with me.
Perplexity:
I made multiple runs of this prompt, and in some, Perplexity tried to play with me and asked me for the next moves. But even there, it was struggling with keeping in “mind” which cards were played already, and it usually failed.
ChatGPT:
ChatGPT played a nice game with me…
This is what it coded for the game to work…
… It was even able to recapitulate the cards that were played in the game.
Open Interpreter:
I played another game with Open Interpreter (and won!!!)…
… Again, you can observe how it wrote the game rules into code and had everything prepared so it could then just run it.
Conclusion
Don’t get me wrong, Perplexity is still massively popular as a search agent and brings value in different areas. The point of this post was to highlight the differences from agents with code interpreter capabilities, like Open Interpreter and ChatGPT Data Analyst. If I were to guess, most AI assistants, chatbots, and AI agents still lack this ability to this day. Many products I have seen and tested still only generate text or search for you.
The message for AI agents is clear. Being able to do practical tasks provides better value than just answering with text output. Actions speak louder than words!
Did you like this story? Check out the other blog I write if you want to learn about AI tools, AI agents, interviews with tech founders, and coding guides.