Dom.Vin
AI Design Journal

GPT-4.1 Prompting Guide from OpenAI:

We expect that getting the most out of this model will require some prompt migration. GPT-4.1 is trained to follow instructions more closely and more literally than its predecessors, which tended to more liberally infer intent from user and system prompts.

The One-Shot Paradigm with Agents by Helena Zhang:

Over the past two months, we have studied the designs behind powerful AI agents like Cursor and Claude Code. These tools have created new ways for AI to interact with codebases.

It’s extraordinary how tools like Cursor have embedded themselves so centrally into so many devs workflows without many of us having a clear understanding of how they actually work.

Chatbot Arena:

Large Language Models (LLMs) have unlocked new capabilities and applications; however, evaluating the alignment with human preferences still poses significant challenges. To address this issue, we introduce Chatbot Arena, an open platform for evaluating LLMs based on human preferences.

Check out the Prompt to Leaderboard feature. Very cool.

The Bitter Lesson: Rethinking How We Build AI Systems:

Recently, I was tending to my small garden when it hit me - a perfect analogy for this principle. My plants don’t need detailed instructions to grow. Given the basics (water, sunlight, and nutrients), they figure out the rest on their own. This is exactly how effective AI systems work.

When we over-engineer AI solutions, we’re essentially trying to micromanage that plant, telling it exactly how to grow each leaf. Not only is this inefficient, but it often leads to brittle systems that can’t adapt to new situations.

This is beautifully articulating something I have been wrestling with for a while. What’s the best optimisation strategy for refining agentic experiences? To what extent can throwing compute at a problem solve for complexity?

Ankit is really onto something here, wonderful framing.

UnFuckIt.AI:

Vibe-code it today, UnFuckIt.AI tomorrow.

Regretting your AI-driven junior dev hires? We'll get your code back on track. Whether you're drowning in unsalvageable commits or your project is on fire, a senior dev might rescue you.

Clever idea for pitching an AI-era agency. Vibe-coding is becoming a sort of cultural battle line.

Must have at least 10 pre-AI coding years

Dead internet theory. Dead programmer theory.

AI Slop Is a Brute Force Attack on the Algorithms That Control Reality by Jason Koebler at 404 Media:

The best way to think of the slop and spam that generative AI enables is as a brute force attack on the algorithms that control the internet and which govern how a large segment of the public interprets the nature of reality. It is not just that people making AI slop are spamming the internet, it’s that the intended “audience” of AI slop is social media and search algorithms, not human beings.

There’s a devaluation here, of something. The algorithms will surely respond. The Turing test becoming the Turing defence.

UK Prime Minister: AI should replace some work of civil servants:

Officials will be told to abide by a mantra that says: “No person’s substantive time should be spent on a task where digital or AI can do it better, quicker and to the same high quality and standard.”

This is a big shift in strategy for the British Government, and a sign of how quickly this ecosystem has matured. I worked for a few UK government departments in the past, and I think agentic AI, consciously applied, has a huge potential to make government more efficient, effective and responsive.

It’ll be interesting to see how these changes are actually applied at an operational level. I am not someone who generally believes that AI is threatening to replace ‘jobs’, I believe it will replace discreet tasks and processes within jobs.

Think of the hundreds of small, discreet threads of function that weave themselves together into a job role; some of those threads are replaceable now, some in a year, some in ten, some never. How do we start to replace individual threads in isolation? Thats the hard part. And the fun part.

Mermaid AI:

Mermaid Chart is a diagramming tool that allows you to create diagrams using text. It is built by the team behind the award-winning open-source project, Mermaid JS.

‘Diagrams’ is underselling the power of Mermaid. They have native text representations of everything from Quadrant charts to Kanban boards.

When we think about designing complex agentic flows, time-and-time-again we run into the same issue; it’s really hard for LLMs to stay on track over long inferential distances. Imagine a simple representation of a task that you might want to give to an agent:

Contact everybody on the project and get consensus on point x.

In order to solve this, as the LLM starts sending emails and aggregating information, the simple representation above evolves into a highly complex one; even for a simple task.

After a few emails back and forth to the project members, the LLM would be sitting on a web of context; requests for information, dependencies between facts, different perspectives between departments, sub-tasks, red-herrings, meta conversations, links, citations, decisions, documents on remote servers, perhaps multiple versions of each.

Maintaining sufficient context to actually process these in the correct order is really hard.

This is actually a depth problem, rather than a breadth problem, the issue isn’t how long can the LLM process sequential items—it can do that forever. The question is actually, how can the LLM move up and down the ladder of abstraction to understand the current state of its world, to not get lost down dead ends, to understand the bigger picture and how that has been decomposed. How it can be recomposed.

I think Mermaid might have an opportunity to help here, it’s a concise way of storing complex information as a set of incredibly flexible text-based representations.

Why AI will never replace human code review by Greg Foster, CTO at AI Code Review people Diamond:

A truly effective code review is about more than just scanning for bugs or missteps, it’s about exchanging ideas, shaping architectural decisions, and building a shared understanding of the system. That’s the stuff that an LLM, for all its fancy generative abilities, simply cannot replicate.

Never is a strong word, but this archive image of an IBM memo is a gem!

Accountability remains a human moat. My friend Tavish writes beautifully about implementating an AI code reviewer if you’re interested in a technical deep dive into these types of systems.

CopilotKit looks awesome:

Effortlessly enhance your app with powerful AI-driven capabilities.

This is the first tool I’ve found that defines the Transport Layer as a first-class citizen of agentic design.

When building an AI stack today the standard process is to first pick a backend AI framework, and then pick a frontend component library, and then crudely stitch the two together. It’s an artificial distinction. Copilot Kit is a true end to end solution that’s attempting to fuse these two worlds.