Openai's operator agent helps me move, but I must also provide help

Openai gave me a week to test its new AI agent and operator that can perform tasks independently for you on Internet.

The operator is the vision of the most closest technology industry for the AI agent that I see. These systems can automate the boring part of life, so that we can make out what we really like. However, from the experience of my agent and OPENAI agent, the real “automatic” AI system is still out of reach.

Openai has trained a new model to combine the new model of power operators with O1's reasoning capabilities.

The model seems to work well in basic tasks; I watched the operator's click button, browsed the menu on the website, and then fill in the form. AI occasionally succeeded in taking action independently, and was much faster than the web -based agents I saw from Anthropic and Google.

But in my trial, I found that I provided more help for Openai's agents than I wanted. It feels like I am a coach operator on every issue, and I want to push some tasks away from the plate.

During the test period, I often have to answer a few questions, grant permission, fill in personal information, and help agents when they are in trouble.

In terms of cars, operators are like driving cars with cruise control-occasionally leaving your feet away from the pedal and letting the car drive it-but it is far from a comprehensive autonomous driver.

In fact, OpenAi said that the frequent pause of operators is the design.

AI power operators are like AI power supply chat robots. For example, OpenAI's ChatGPT cannot work independently for a long time, and it is easy to have the same hallucinations. Therefore, OpenAI does not want to provide too much decision -making ability or sensitive user information for the system. Maybe this is the security choice of OpenAI, but it reduces the practicality of the operator.

In other words, for the front -end AI of any website, the first agent of OpenAI is an impressive concept and interface proof. However, to create a truly independent AI system, technology companies will need to establish more reliable AI models that do not need so much steering.

A little “both hands”

My operator test coincides with the week when I moved to the apartment, so I have an OpenAI agent to help relocate logistics.

I asked the operator to help me buy a new parking license. Openai's agent told me: “Of course,” and then opened the window of the browser on my PC screen.

Then, the operator searched the San Francisco parking license in the browser and took me to the right city website, even the right page.

The operator can still use the remaining computers at work, which cannot be said to Google's Project Mariner. This is because OpenAI's agent is not really working on a computer, but falling off in the clouds somewhere.

Operator interface (Credit: Maxwell Zeff/Openai)

I have to grant the operator's permit for my parking permit to start many different processes. It also stops asking me to fill in the form, such as my name, phone number and email address. Sometimes, the operator also lost, forcing me to control the browser and brought the agent back to the right track.

In another test, I asked the operator to book in a Greek restaurant. It is praised that the operator found that the price of me in my area is reasonable. But I have to answer more than six questions throughout the process.

Some steps to book with the operator (credit: Maxwell Zeff/Openai)

If you only need to book six or more or more times through the AI agent, when is it easier to do it yourself? This is what I often ask myself when testing the operator.

Agent -AS-A-Platform

In some of my tests, I encountered a website where the operator was prevented by any reason. For example, I tried to use Taskrabbit to book electrician, but OpenAI's agent told me that it encountered errors and asked if it could use alternative services. Expedia, Reddit and YouTube also prevented AI agents from accessing its platform.

However, other services are to embrace the operator with open arms. Instacart, Uber, and EBAY and Openai are launched for operators, allowing agents to navigate their websites on behalf of humans.

These businesses are preparing for AI agents to promote user interaction.

Daniel Danker, chief product officer of Instacart, said in an interview with TechCrunch: “Customers are using Instacart through various entry points.” “We think operators may be another entrance point.”

Let OpenAI's agents on behalf of a person using Instacart alone seems to separate Instacart from customers. However, Danker said that Instacart hopes to meet with customers no matter where you are.

EBAY's chief AI officer Nitzan Mekel-Bobrov said in an interview with TechCrunch: “We do have a significant impact on our beliefs (similar to Openai) to interact with consumers and digital properties.”

Mekel-Bobrov said that even if the popularity of AI agents is getting higher and higher, he hopes that users will always enter the eBay website and point out that “the online destination will not go anywhere.”

Trust

After several hallucinations, I encountered some problems of trustworthy operators, almost hundreds of dollars.

For example, I asked agents to find a parking lot near the new apartment. Eventually I hinted that two garages said that it only took a few minutes to walk.

The hallucination of parking space distance (Credit: Maxwell Zeff/Openai)

In addition to beyond my price range, the garage is actually far from my apartment. It can be arrived in one walk for 20 minutes, and the other walks for 30 minutes. It turned out that the operator placed the wrong address.

This is why OpenAI does not give your agent your credit card number, password or access to email. If OpenAI does not allow me to intervene here, the operator will waste hundreds of dollars on the parking spaces I don't need.

Such hallucinations are actually key obstacles to the actual independent agent-they can figure out the task of troubles from the board. If the agent is easy to make basic errors, especially the errors with real world consequences, no one will believe them.

With the operator, Openai seems to have built some impressive tools to let the AI system browse the network. However, unless the AI foundation can reliably do what the user requires it, these tools will not constitute too much. Before that, humans will be trapped in auxiliary agents, not the opposite. That defeated this.