AI Agents for Coding

August 9 2025 - João Porto

A brief report about testing AI agents for coding

The spark

Yesterday I watched OpenAI’s video promoting GPT-5 and highlighting how it leverages Cursor. I was very skeptical initially, but interested. Then I did a little experiment: I got Cursor’s trial plan, installed its IDE in my personal environment and inserted the exact same inputs that I saw on video (a wireframe and a prompt asking for an email app for devs). As I expected, the result was not equal to video; actually, the first result was not acceptable at all, but after some additional prompts asking to it fix the issues, it really provided a good prototype for an email app using Vite + Tailwind. The result was not as refined as video’s result and took considerable time (maybe it would be faster using background mode), but was pretty good for me. Hence, my excitement with AI agents for code increased in that moment.

Testing

After that, I came back for another personal side experiment: use Prolog to solve a problem that I found. I was wondering how to set it up again in Emacs and was using my blog post about that as a reference. When reading, I noticed some issues on my blog’s UI that I already knew but ignored previously because I had other priorities. But, since the excitement about AI agents for coding increased, I thought it would be good to use them to solve those issues.

Actually, I had already used Claude Code (CC) to improve some UI issues in the past, then, I naturally turned to it again. The main issue was related to heading behavior on scroll events and, after some prompts, CC indeed solved it. But I tried to go further, solving color and typography issues, and it was not satisfactory after several prompts.

Then I tried Cursor and it was incredibly good at solving these issues. I did not need to create detailed prompts regarding colors; with a few statements it understood and adjusted header buttons and background colors for dark mode and also enhanced contrast for reading in light mode. Adjusting typography was a bit harder and needed more prompts, but I also reached a satisfactory point. On the other hand, maybe I would achieve the same with CC, since the prompts that I provided to Cursor were more refined based on CC’s prompts.

I tested Cursor using its IDE, web, and CLI versions, exclusively with GPT-5. Using its IDE is like using VS Code + Copilot extension (Code+Pilot) and its CLI is like CC, so they are good, but not a great surprise up to the point that I tested. Actually, I did the test using Code+Pilot (with GPT-5) to create the email app with the same input that I provided to Cursor and the UI was quite better and did not require additional prompts – an unique prompt was enough to produce a much better result.

However, Cursor web version was a valuable find, because I was able to prompt and deploy from my mobile device (its integration with GitHub is very good). I had already reached that before (in March) using CC, but I needed to install Termux in Android and within it install Node.js, install CC, create a new Anthropic API Key to set up CC, and create GitHub SSH Key to clone from and push to the remote repo. That was almost replicate my desktop environment in Android, so Cursor web is very handy. Excited with it, I also tested Google Jules that is great too and finally I remembered that I had already tested Codex recently (in June), but for some reason I did not engage in using it in that moment. Now is clear to me that all three of these are very useful and I will keep using them.

Conclusion

Solving minor issues vibe coding from mobile while lying in my hammock was great; I enjoyed it and I will extend this experience to my other projects. Furthermore, I will try to see to what extent AI agents can solve my programming issues and then delegate as many as possible to them, because my interest now is in things that they cannot solve (or almost that, there are a couple of things they do that I still enjoy to doing as leisure).

References

OpenAI. “Coding with GPT-5.” Www.youtube.com, 7 Aug. 2025, www.youtube.com/watch?v=PQUcIbSEBCM. Accessed 8 June 2025.

Cursor. “Cursor on Web and Mobile.” Www.youtube.com, 30 June 2025, www.youtube.com/watch?v=hHTKtrSO6os&t=4s. Accessed 8 Aug. 2025.

Anysphere, Inc. “Cursor.” Cursor.com, cursor.com/home. Accessed 8 Aug. 2025.

Anthropic PBC. “Claude Code: Deep Coding at Terminal Velocity \ Anthropic.” Anthropic.com, www.anthropic.com/claude-code. Accessed 9 Aug. 2028.

Microsoft. “Visual Studio Code.” Visualstudio.com, Microsoft, code.visualstudio.com. Accessed 9 Aug. 2025.

GitHub. “GitHub Copilot · Your AI Pair Programmer.” GitHub, github.com/features/copilot. Accessed 9 Aug. 2025.

OpenAI. “Introducing Codex.” Openai.com, 3 June 2025, openai.com/index/introducing-codex. Accessed 9 Aug. 2025.