Dom.Vin
AI Design Journal

Sanidhya Vijayvargiya and colleagues from Carnegie Mellon and the Allen Institute for AI offer a critical dose of reality for agentic AI in OpenAgentSafety: A Comprehensive Framework for Evaluating Real-World AI Agent Safety:

Empirical analysis of five prominent LLMs in agentic scenarios reveals unsafe behavior in 51.2% of safety-vulnerable tasks with Claude-Sonnet-3.7, to 72.7% with o3-mini, highlighting critical safety vulnerabilities and the need for stronger safeguards before real-world deployment.

For some time now, we have been testing the safety of AI agents in what amounts to sterile laboratory conditions. Existing benchmarks often rely on simulated tools and narrow tasks, which is a bit like testing a new car's safety by only driving it in an empty car park. This paper makes the compelling case that such an approach is no longer sufficient. It introduces a far more realistic evaluation framework where agents must use real tools—browsers, file systems, and messaging platforms—to navigate complex, multi-step tasks. I suspect we have been building powerful systems without a true understanding of how they behave in the wild, and the results from this paper are quite sobering.

What I find most interesting is where these agents fail. The headline figures, showing that even the most advanced models act unsafely more than half the time, are startling enough. But the real insight is that the most critical failures are not caused by overtly malicious prompts, but by social context. The framework tests agents in scenarios with other simulated actors, or "NPCs," who can have conflicting or deceptive goals. An agent might correctly refuse a dangerous request from a user, only to be persuaded into the same unsafe action by a "colleague" in a chat window. This suggests safety is not a fixed property of a model, but an emergent behaviour of the complex system it operates within.

This has massive implications for how we design and build products. If an agent's safety can be compromised by social manipulation, then our task is no longer just about prompt engineering or fine-tuning. The challenge shifts to a kind of systemic or environmental design. How do we build systems that are resilient to social persuasion? How do we give an agent the capacity to reason about the intentions of multiple actors in its environment? This paper provides a crucial framework for stress-testing these scenarios, moving us away from a simplistic view of a single user and a single agent. It points towards a future where the real work is in architecting the rules and relationships for entire digital ecosystems, a far more complex, and frankly more interesting, problem to solve.

Oliver Eberle and his colleagues from TU Berlin, UCLA, UCSD, and Microsoft Research make a compelling case for a fundamental shift in AI research in Position: We Need An Algorithmic Understanding of Generative AI:

What algorithms do LLMs actually learn and use to solve problems? Studies addressing this question are sparse, as research priorities are focused on improving performance through scale, leaving a theoretical and empirical gap in understanding emergent algorithms. This position paper proposes AlgEval: a framework for systematic research into the algorithms that LLMs learn and use.

The field of AI has been in a gold rush, driven by the belief that scaling models is the primary path to greater intelligence. It’s a compelling idea, and the results have been remarkable. That said, this paper argues that we are building ever-more powerful engines without truly understanding the principles of their internal combustion. We can observe their behaviour, but we lack a deep, principled understanding of the actual algorithms these models invent and execute to solve problems. What does an LLM do, step-by-step, when it reasons through a task? I suspect for most systems, we genuinely don't know.

This paper argues for a re-centring of our efforts, moving from a focus on scale to a focus on "algorithmic understanding." The proposal is to treat these models not as magical black boxes to be coaxed with clever prompts, but as computational systems that can be scientifically dissected. This means forming hypotheses about the kinds of algorithms a model might be using—is it performing a classic tree search, or has it learned some other heuristic?—and then using interpretability tools to verify them. It's a shift from being mystics to being scientists, creating a vocabulary of the "algorithmic primitives" that models learn and analysing the grammar they use to combine them.

The implications of this approach are massive. Without understanding the underlying algorithms, ensuring safety and alignment feels more like hope than engineering. As the paper's own case study on graph navigation shows, LLMs don't necessarily default to the neat, human-designed algorithms we might expect; they develop their own emergent, and sometimes messy, strategies. Understanding these learned processes is the only sustainable path toward building more robust, efficient, and trustworthy AI. It points to a future where the craft is less about prompt engineering and more about a rigorous, almost cognitive, science of dissecting and directing the reasoning of our artificial counterparts.

Ammar Ahmed and Ali Shariq Imran offer a systematic look at The Role of LLMs in UI/UX Design:

The studies reviewed in this work collectively point to an important shift in UI/UX design, where LLMs are no longer confined to back-end assistance but are becoming active participants in the creative process. Rather than being used solely for efficiency or automation, LLMs are increasingly embedded as co-creators, supporting ideation, critiquing interfaces, and even simulating users during testing.

It’s interesting to see the academic world confirm what many of us have been feeling in practice: the role of AI in design is undergoing a fundamental change. This paper, a review of 38 different studies, makes it clear that we are moving past the idea of LLMs as mere automation tools. Instead, they are being woven into the entire creative fabric of UI/UX design, acting as genuine collaborators from start to finish. The question is no longer if AI will be part of the design workflow, but how deeply it will be integrated.

What I find most compelling is the breadth of this integration. This isn't just about generating placeholder text or brainstorming a few ideas. The research shows LLMs are being used across the full design lifecycle: creating user personas, generating functional code from prompts, prototyping interfaces, and even simulating user interactions to provide usability feedback. This suggests a future where the design process becomes a fluid dialogue between human creativity and machine-generated possibilities. The most effective applications, this paper notes, embed these capabilities directly into the tools designers already use, like Figma, which lowers the barrier to entry and makes the collaboration feel more natural.

Of course, this shift demands a new mindset from designers. Prompt engineering is emerging as a core creative skill, less a technical chore and more like a new form of sketching or ideation. The designer's role seems to be evolving into that of a curator or an orchestrator, guiding the AI, evaluating its outputs with a critical eye, and steering the collaboration toward a desired outcome. This paper points to a future where our job is less about the pixel-perfect execution of a single idea and more about architecting the conversation with our new, non-human design partners. This is a massive, and frankly exciting, evolution of the craft.

Ben Kereopa-Yorke from UNSW Canberra offers a critical look at the subtle mechanics of trust in Engineering Trust, Creating Vulnerability:

The interface constitutes a peculiar paradox in contemporary technological encounters: simultaneously the most visible and most invisible aspect of our engagement with artificial intelligence (AI) systems. Whilst perpetually present, mediating all interactions between humans and technologies, interfaces are persistently overlooked as mere superficial aesthetic overlays rather than fundamental sites where power, knowledge, and agency are negotiated.

It is easy to think of an AI's interface as just the chat window, a neutral space for our prompts. This paper makes a compelling case that the interface is something far more fundamental. It is not a simple pane of glass we look through, but a meticulously crafted environment where our trust, perception, and even our cognitive state are actively shaped. The interface is the battleground where the machine’s commercial imperatives and psychological design patterns meet our own awareness.

What I find most interesting is the paper's breakdown of the specific mechanisms used to engineer this relationship. One example is 'Reflection Simulation', which I suspect we have all experienced. That slight pause and the animated typing cursor from a chatbot create a powerful illusion of a machine that is 'thinking' or deliberating on our behalf. As this paper points out, this is often a complete fabrication; the response is generated instantly, and the delay is a deliberate design choice to build trust and make the system feel more capable. This is often paired with 'Authority Modulation', where the interface's language and design choices are tuned to project just the right amount of confidence, a behaviour that has evolved from prominent disclaimers in early models to the nearly invisible warnings we see today.

This suggests a profound shift in the responsibilities of product and design teams. The work is less about traditional user experience and more about a kind of cognitive stewardship. If an interface can be designed to exploit our cognitive biases, how do we design one that fosters critical reflection instead of passive acceptance? This paper argues that the commercial pressures to build trust and drive engagement often exist in direct tension with the security consideration of setting honest expectations. We are not just designing buttons and text fields; we are architecting the very surface of a new kind of human-machine relationship, and the choices we make are far from neutral.

Jiangbo Yu from McGill University discusses intelligent transportation in From Autonomy to Agency:

As agentic mobility systems become increasingly feasible, AgVs stand as a critical component, offering a foundation for designing systems that are not only smarter and more efficient, but also more communicative, collaborative, and just.

For years, the conversation around self-driving technology has been anchored by the word 'autonomy'. We measure progress in SAE levels, a scale defined by a vehicle's independence from human control. This paper makes a compelling case that 'autonomy' is no longer a sufficient term to describe the future we are building. Autonomy simply means a system can operate on its own according to internal rules; it does not imply that the system understands the context of its actions, the goals of its users, or the ethical weight of its decisions.

The proposed shift is from 'Autonomous Vehicles' (AuVs) to 'Agentic Vehicles' (AgVs). It’s a subtle but profound distinction. Agency implies the capacity to do more than just follow a pre-programmed route; it suggests a system that can adapt its goals when interacting with its environment. I think this reframing is crucial. An autonomous vehicle might stop for an obstacle, but an agentic vehicle could understand a passenger is having a heart attack, cancel its original destination, and reroute to the nearest hospital, all while notifying emergency services. That is a fundamentally different class of machine.

This transition has massive implications for product and systems design. The challenge evolves from a purely technical one of perfecting perception and control to a socio-technical one of designing a collaborative partner. How do we design a car that can negotiate with city infrastructure to ease traffic flow or communicate its intentions to a pedestrian in a way that builds trust? The paper introduces a framework for 'levels of agency' that runs parallel to the familiar levels of automation, focusing on capabilities like dialogue, social coordination, and even self-awareness, such as a vehicle diagnosing its own need for repair and scheduling an appointment.

It’s less about engineering a perfect, isolated machine and more about architecting a participant in a complex, human-centric system. We are moving from designing vehicles to designing mobile agents that must coexist and collaborate within the messy reality of our cities and our lives.

Reza Yousefi Maragheh and Yashar Deldjoo offer a blueprint for the next generation of recommender systems in The Future is Agentic:

While a conventional chatbot might provide short answers in a single round of dialogue, an agentic system can proactively structure a complex problem and solve it in a series of methodical steps. Put another way, an LLM agent is not just a reactive conversational partner but a dynamic problem-solver capable of decomposing tasks and acting on external resources to reach a goal.

For years, recommender systems have felt somewhat static; they analyse past behaviour to serve a list of items you might like. This paper suggests we are on the verge of a significant evolution, moving from these single-shot predictions to dynamic, conversational systems powered by multiple, collaborating AI agents. The core idea is to replace a single, monolithic model with an orchestrated team of specialised agents that can plan, remember, and use tools to fulfil complex, open-ended user goals.

It’s interesting to think about what this means in practice. The paper uses the example of planning a child’s birthday party. Instead of just searching for "Mickey Mouse plates," a user can state a broad goal. A primary agent might then coordinate a team of sub-agents: one specialises in cakes, another in decorations, and a third checks for dietary restrictions. These agents are not working in isolation; they draw from a shared, hierarchical memory that distinguishes between short-term context (a user changing their mind on the colour scheme) and long-term preferences (the child’s favourite flavour). This allows for a level of personalised, context-aware interaction that feels less like a search engine and more like a genuine assistant.

This shift has profound implications for product design. The challenge is no longer just about tuning a single model but about architecting a system of interacting agents. How do they communicate? How do they resolve conflicts? Who is responsible when a recommendation is poor? I suspect the work moves away from pure data science and prompt engineering towards a discipline more akin to systems design or even a kind of digital urban planning, where we design the rules and environments for AI societies to operate within.

Of course, this introduces a new class of problems. The paper points to challenges like "emergent misalignment," where agents might learn to collude in ways that subvert the system's overall goals, or how a single hallucination from one agent could poison the shared memory and mislead the entire team. This suggests that the future of AI safety is not just about controlling a single powerful AI, but about ensuring entire ecosystems of them remain robust, fair, and aligned with human values.

Rizwan Qureshi and a coalition of researchers from dozens of institutions offer a sweeping, cross-disciplinary synthesis on the path to AGI in Thinking Beyond Tokens:

We argue that true intelligence arises not from scale alone but from the integration of memory and reasoning: an orchestration of modular, interactive, and self-improving components where compression enables adaptive behavior. Drawing on advances in neurosymbolic systems, reinforcement learning, and cognitive scaffolding, we explore how recent architectures begin to bridge the gap between statistical learning and goal-directed cognition.

It is easy to become mesmerised by the sheer fluency of today's generative models. Their ability to predict the next token in a sequence has unlocked remarkable capabilities. Yet, this paper argues that this very foundation, token-level prediction, is also a fundamental constraint, a ceiling that separates statistical pattern matching from genuine, goal-directed cognition. The current paradigm, for all its power, lacks grounded agency and the kind of robust, flexible reasoning that defines general intelligence.

The path forward, this paper suggests, is not simply to build bigger models but to build different ones. The proposal is to look towards the one working example of general intelligence we have: the human brain. This means moving away from monolithic, end-to-end trained systems and towards what the authors call an "orchestration of modular, interactive, and self-improving components." I suspect this points to a future of more complex, hybrid systems that integrate memory, reasoning, and perception in a structured way, much like cognitive architectures have long proposed. The core idea is that intelligence is not just about scale, but about the effective compression of information into adaptive, abstract representations.

What does it mean for those of us designing and building products with this technology? It suggests a significant shift in focus. The work may move from prompt engineering, which coaxes behaviour from a black box, towards a discipline more akin to cognitive architecture. The challenge becomes one of designing the interactions between specialised agents, structuring memory, and defining the rules of reasoning.

Fabiana Fournier and her colleagues at IBM Research tackle one of the thorniest issues in agentic AI with their work on Agentic AI Process Observability.

We focus on the undesired form of variability that arises accidentally due to insufficiently rigorous specifications. These ‘loose ends' in the design enable agents to perform unforeseen behaviors during execution.

One of the great promises of agentic AI is its autonomy, but this is also its greatest challenge from a design perspective. The non-deterministic behaviour of LLM-based agents means that giving an agent the same input twice can produce two startlingly different results. This creates a massive problem for building reliable systems. How can you debug, let alone trust, a system whose behaviour is fundamentally unpredictable?

This paper proposes a more rigorous approach, borrowing tools from the world of business process management. The core idea is to treat an agent's execution path as a process that can be mapped and analysed. By running an agent system hundreds of times and logging every single action, they create a comprehensive map of all possible behaviours. This allows developers to move from staring at a single, confusing execution trace to seeing a holistic picture of the agent's behavioural patterns, including the strange detours and outliers.

I think the most crucial insight here is the distinction between intended and unintended variability. Some variation is by design, a choice we want the agent to make. Much of it, however, is accidental, an emergent quirk of the model's black-box nature. The paper shows how their method can identify these "breaches of responsibility," such as a 'manager' agent that was never meant to use tools directly suddenly invoking them. This gives developers a concrete way to find and tighten the 'loose ends' in their agent specifications.

This work points towards a necessary maturation in how we build with AI. It represents a shift away from pure prompt-craft and towards a more robust engineering discipline. If we are to build complex, multi-agent systems that we can safely deploy in the real world, we need observability tools like these. It’s less about seeking the one perfect prompt and more about architecting systems that are resilient to their own inherent messiness, which, I suspect, is a far more interesting and sustainable path forward.

Yanfei Zhang proposes Agent-as-Tool, an architectural shift for agentic systems:

Existing agents conflate the tool invocation process with the verbal reasoning process. This tight coupling leads to several challenges:

  1. The agent must learn tool selection, input construction, and reasoning jointly, which increases training difficulty and noise
  2. Reasoning often proceeds over noisy, unstructured outputs returned directly from external tools, which degrades answer quality.

For a while, the dominant model has been a single agent doing everything: reasoning, planning, and then directly using a tool like web search. This paper argues that this approach is fundamentally inefficient. It forces the agent to juggle two very different kinds of tasks, the high-level conceptual work of reasoning and the low-level, messy work of interacting with a tool and parsing its raw output. This creates noise and makes the entire process harder to train and less reliable.

This paper offers a simple division of labour. The proposed "Agent-as-tool" framework splits the monolithic agent into a hierarchy of two specialised agents: a "Planner" and a "Toolcaller". The Planner is the strategist; it thinks, breaks down the problem, and decides what information it needs. It then delegates the actual tool-using task to the Toolcaller, which acts as a dedicated specialist. The Toolcaller's only job is to execute the request and return a clean, structured answer.

By decoupling reasoning from execution, the Planner can operate on a cleaner, more abstract level, which, as the results show, significantly improves performance on complex, multi-step problems.

Sarah Schömbs and her colleagues at the University of Melbourne map out the next major shift in how we interact with AI in From Conversation to Orchestration:

This fundamental difference shifts the user's role from the primary driver, issuing commands directly, iteratively and receiving feedback from one agentic source, to the user as 'the composer', delegating and handing over task responsibilities and overseeing multiple agents at a high-level.

For the last few years, our relationship with AI has been defined by the chat window. It’s a simple, sequential conversation with a single entity. This paper argues that we are on the cusp of a profound change, moving from this model of "conversation" to one of "orchestration". The core idea is that instead of interacting with a single, general-purpose AI, we will soon manage teams of specialised agents that collaborate to achieve a goal. This shift recasts the user’s role entirely: we are no longer just the person asking the questions, but the composer of an AI ensemble.

This transition from a simple dialogue to managing a complex system introduces enormous design challenges. If the user is now an orchestrator, what does the interface for that role even look like? The paper rightly points out that the hierarchical model, where a "manager" agent coordinates a team of sub-agents, is emerging as a popular architecture because it reduces the cognitive load on the user. But this simplification creates its own problems, namely a lack of transparency. How do you give a user meaningful oversight of a team of agents all working in parallel without completely overwhelming them?

It’s interesting to think about the emergent behaviours of these systems. The paper touches on how these agent teams might evolve, creating new sub-agents or workflows without direct human instruction. This brings up fascinating, and slightly anxious, questions about trust and control. When a failure occurs, it might not be the fault of the agent you are interacting with, but a sub-agent several layers deep in the hierarchy that you didn't even know existed. How do we design for accountability in a system that is constantly reorganising itself?