Dom.Vin
AI Design Journal

Nishant A. Parikh at Capitol Technology University proposes a co-evolutionary model for agentic AI in product management:

This study explores agentic AI's transformative role in product management, proposing a conceptual co-evolutionary framework to guide its integration across the product lifecycle. Agentic AI, characterized by autonomy, goal-driven behavior, and multi-agent collaboration, redefines product managers (PMs) as orchestrators of socio-technical ecosystems.

The days of treating AI as a fancy automation tool are over. The timing couldn't be more critical—while McKinsey projects generative AI could add $4.4 trillion to the global economy, a staggering 80% of AI projects still fail to deliver expected outcomes.

The paper's central insight is brilliantly simple: successful AI integration isn't about humans managing AI systems—it's about humans and AI systems evolving together. Traditional product management frameworks, designed for human-centric workflows, break down when faced with AI that can "generate product concepts, experiment autonomously, personalize features at scale, and adapt functionality in near real time."

"Rather than being displaced, product managers are emerging as orchestrators of complex, adaptive ecosystems that integrate human judgment with machine autonomy."

This shift represents more than a role evolution—it's a complete paradigm change. Product managers can no longer be gatekeepers of linear processes. Instead, they become conductors of "socio-technical ecosystems" where autonomous AI agents collaborate alongside human teams across discovery, development, testing, and launch phases.

The research, spanning 70+ sources and case studies from leading tech firms, identifies three critical competencies for this new reality:

AI Orchestration: Understanding how to direct and coordinate multiple AI agents working toward product goals, rather than simply prompting individual tools.

Ethical Oversight: Ensuring AI systems align with human values and business objectives as they gain increasing autonomy in decision-making.

Systems Governance: Managing the complex interactions between human judgment and machine autonomy across the entire product lifecycle.

What makes this framework particularly compelling is its emphasis on "mutual adaptation." Both humans and AI systems must evolve their capabilities in tandem. AI learns from human feedback and strategic direction, while humans develop new skills in AI literacy and systems thinking.

"The proposed co-evolutionary model emphasizes the mutual adaptation between humans and AI, where both systems evolve in tandem to achieve strategic alignment and organizational learning."

The practical implications are immediate. Product teams can't simply bolt AI features onto existing workflows and expect transformation. They need to redesign their entire approach around collaborative intelligence—systems where human creativity and judgment enhance AI capabilities, while AI autonomy and scale amplify human strategic thinking.

This isn't just academic theorizing. Companies like Airbnb, Duolingo, and Intuit are already demonstrating early versions of this co-evolutionary approach, with AI systems that don't just automate tasks but actively participate in product strategy and execution.

The 80% AI project failure rate exists largely because organizations try to force AI into human-designed processes. Parikh's co-evolutionary model suggests the opposite approach: redesign processes around the unique strengths of human-AI collaboration.

The research calls for urgent real-world validation of these frameworks across different industries and product types. But the core insight is already clear: the future belongs to product managers who can orchestrate intelligence—both human and artificial—rather than simply manage traditional development cycles.

Mohammad Azarijafari, Luisa Mich, and Michele Missikoff propose a radical rethink of business process design:

We propose a method in which BPs are not defined by fixed workflows but rather by business goals, information objects, and autonomous agents responsible for achieving them. This represents a shift from a task-based model to an agent-based goal-driven model, in which workflows emerge from agent interactions rather than being predesigned.

Business processes today are like sheet music for an orchestra that never changes tempo. Every note prescribed, every pause planned, every flourish predetermined. Perfect for predictable performances, terrible for jazz.

The industrial world built itself on this predictability. Workflows as assembly lines of the mind. But markets don't move like assembly lines anymore.

This paper flips the script entirely. Instead of designing the how, you design the what. Instead of choreographing every step, you set the destination and let autonomous agents figure out the dance.

Think goal-driven rather than task-driven. You want to "Acquire Order" in your pizza business? An agent takes responsibility for making that happen. It might take the order via phone, app, or carrier pigeon. The method emerges from the situation.

The magic happens when agents collaborate. One agent can't fulfill a goal alone? It recruits others. The system becomes antifragile - it doesn't just survive unexpected situations, it improvises through them.

This isn't just workflow automation with fancier language. It's a fundamental architectural shift from brittle choreography to adaptive orchestration.

The autonomy of agentic systems raises crucial questions about safety, ethics, accountability, and control.

That's the uncomfortable question: if workflows emerge from agent interactions rather than human design, who's really in control?

We're not just automating tasks anymore - we're delegating entire decision-making processes to systems that might actually understand what they're doing.

How do you maintain accountability in a system where the process itself is an emergent property? And perhaps more fundamentally: how do you design governance that can evolve alongside the intelligence it's meant to constrain?

Keyan Ding at Zhejiang University wants to solve the scientific software mess:

While Large Language Models show promise in tool automation, they struggle to seamlessly integrate and orchestrate multiple tools for complex scientific workflows. Here, we present SciToolAgent, an LLM-powered agent that automates hundreds of scientific tools across biology, chemistry, and materials science. At its core, SciToolAgent leverages a scientific tool knowledge graph that enables intelligent tool selection and execution through graph-based retrieval-augmented generation.

Scientific software is a collection of brilliant, disconnected islands. We have incredibly powerful and specialised computational tools for everything from materials science to protein folding, but each exists in its own silo, demanding deep expertise to operate. This fragmentation is a massive barrier, effectively preventing researchers from easily combining these tools to solve complex, multi-step problems. It’s like having a workshop full of advanced machinery, but with no common language or system for making the machines work together.

Instead of building an agent that's expert in everything, build an expert orchestrator. The innovation isn't the agent - it's the knowledge graph that maps the entire scientific software ecosystem.

What each tool does, what data it needs, what it produces, how they connect. A rich map instead of just a list.

Xuetian Chen and Shanghai AI Lab built a reality check for computer-using agents:

Experiments show that even State-of-the-Art agent struggle with higher-level tasks involving perception, reasoning, and coordination—highlighting the need for a deeper understanding of current strengths and limitations to drive the future progress in computer-using agents research and deployment.

Current agent benchmarks are like judging a chef by how fast they can chop onions without ever asking them to cook a meal. Narrow skills in sanitized environments that completely miss the messy reality of how humans use computers.

There's a massive gap between lab performance and real-world utility.

OS-MAP evaluates agents on two dimensions: depth (how autonomous they are) and breadth (how well skills transfer across domains). From simple command execution to proactive assistance, from work to study to entertainment.

Instead of one-dimensional leaderboards, you get a two-dimensional capability map.

The results are sobering: even the most advanced agents only achieve 11.4% success rates. That's nowhere close to human performance. The challenge isn't building more powerful models. It's understanding where agents actually live on this capability map and designing systems that know their own limitations.

How do you design for an agent that's reliable at simple tasks but fails completely at complex orchestration? Maybe the future isn't one agent conquering everything, but systems that understand their boundaries and know when to hand off to humans.

Timothy T. Yu and team built a chatbot that can run complex operations software for you:

Our system provides operations planners with an intuitive, natural language chat interface, allowing them to make queries, perform counterfactual reasoning, receive recommendations, and execute scenario analysis (what-if and why-not analyses) on operational plans.

Enterprise operations software is incredibly powerful but completely inaccessible. It drives logistics and supply chains for huge companies, but you need deep expertise and expensive consultants to use it.

The value is locked behind complex interfaces and expert-only language. Most organizations that could benefit can't afford to get in.

This paper makes the case for a fundamental shift in how we interact with these systems. Instead of forcing the human to learn the language of the machine through an intricate graphical user interface, it proposes an AI that acts as a translator, understanding the natural language of the user. It’s a move away from designing the perfect dashboard and towards designing the perfect conversation. The agent, SMARTAPS, doesn’t attempt to be an operations research expert itself; it intelligently selects and uses a catalogue of specialised tools that have been built by human experts.

This changes product design completely. No more arranging controls on screens - you're architecting dialogue instead.

How do you navigate the ambiguity of human language and map it to precise technical functions? That's the design challenge we're moving toward.

Yusen Peng and Shuhua Mao think AI hallucinations might actually be features, not bugs:

As generative systems evolve, the question may no longer be how to suppress all hallucinations, but rather, how to recognize and refine the meaningful ones. In doing so, we open a path not just to generation, but to genuine creative evolution.

We've been treating AI hallucinations as bugs to squash. Errors, inconsistencies, unexpected outputs - all seen as failures to be eliminated through better alignment.

But what if we're throwing away the source of genuine creativity? What if those "errors" are actually where the interesting stuff happens?

This paper proposes a radical shift in perspective. It introduces a framework that, instead of suppressing unexpected outputs, actively seeks them out and treats them as raw creative material. The idea is to systematically generate deviant results, amplify their most promising aspects, and then refine them through a structured pipeline that includes human feedback. It’s a move away from designing for flawless execution and towards designing for productive imperfection.

How do you identify which hallucinations contain creative potential and which are just nonsense? How do you build tools that can spot the difference between interesting mistakes and useless ones?

Reza Vatankhah Barenji and Sina Khoshgoftar from Nottingham Trent think AI should fix problems automatically, not just detect them:

As Agentic AI continues to evolve, it holds the promise of redefining how complex systems are monitored, understood, and controlled—shifting the role of human operators from reactive problem-solvers to strategic supervisors in an ecosystem of intelligent, autonomous agents.

We've been managing complex systems reactively. Build dashboards, set up alerts, wait for humans to fix things when they break. AI helps with diagnosis, but humans still make the final call on intervention.

The authors argue this doesn't scale. Systems are getting too complex for human operators to manage every problem manually.

This paper describes a move away from simply detecting anomalies to building agentic systems that can independently reason about, plan, and execute a response. It’s a move from passive observation to active participation.

This changes the design challenge completely. Instead of building dashboards for humans, you're designing the autonomous agent itself. Its goals, boundaries, decision-making processes.

How do you build a system you can trust to act independently in high-stakes environments? Humans become "strategic supervisors" rather than reactive problem-solvers.

We're shifting from designing interfaces for people to designing the constitution of our autonomous digital colleagues. No pressure.

Hao Li, Haoxiang Zhang, and Ahmed E. Hassan from Queen's University studied what happens when AI becomes your coding teammate:

Although agents frequently outperform humans in speed, our analysis shows their pull requests are accepted less frequently, revealing a stark gap between benchmark performance and real-world trust and utility. Moreover, while agents can massively accelerate code submission—one developer submitted as many Agentic-PRs in three days as they submitted without Agentic help in the previous three years—these contributions tend to be structurally simpler.

They analyzed 456,000 pull requests from AI agents like Devin and GitHub Copilot. Real-world data on our new AI teammates.

The results are fascinating and slightly concerning. AI agents can generate massive volumes of code at incredible speed, but their PRs get accepted far less often than human contributions.

We've solved for speed but not for trust. AI can pump out code faster than ever, but much of it gets rejected. There's a fundamental tension here - quantity vs quality, speed vs acceptance.

The challenge isn't just making AI smarter. It's designing the entire human-computer collaboration around this new reality.

The bottleneck has shifted from writing code to reviewing it. One person can now generate the output of an entire team.

How do we manage that firehose of productivity? How do you review code when it's coming at you faster than you can possibly evaluate? That's the problem we need to solve next.

Xiaoyu Zhan and Nanjing University figured out how to build AI characters from modular components:

We posit that linguistic style can be conceptually and functionally decoupled from cognitive tendencies in dialogues, which are largely shaped by personality and memory.

Creating believable AI characters has been a monolithic nightmare. Either spend ages crafting the perfect prompt or fine-tune an entire model, treating identity as one big indivisible blob.

This paper proposes something much more elegant: modular identity architecture.

Instead of holistic personas, they built composable identity. Three independent components: personality, memory, and linguistic style. Mix and match.

It's like building characters from Lego blocks instead of carving them from stone. Remarkably elegant design pattern.

The process is clever: first determine what to say based on personality and memory. Then enrich with facts for consistency. Finally, apply the linguistic style - how to say it.

Style becomes a swappable layer, not part of core reasoning. Like having the same thought expressed by different people.

This points toward libraries of interchangeable personalities, memories, and styles. Instead of crafting individual characters, you architect systems that can assemble countless characters on demand.

How do you design for identity that reconfigures in real-time? We're moving from writing characters to engineering the components of their souls.

Sounds profound and slightly unsettling. What happens when identity becomes just another API?

Sizhou Chen and team built AI actors that can improvise theater performances:

Given a simple topic, the framework generates a narrative blueprint, guiding the subsequent improvisational performance. During the online performance, each actor is given an autonomous mind. This means that actors can make independent decisions based on their own background, goals, and emotional state. In addition to conversations with other actors, their decisions can also change the state of scene props through actions such as opening a letter or picking up a weapon.

We've mostly thought of AI as a generation tool. Give it a prompt, get back text or code or images.

But this is different. These AI agents aren't just writing scripts - they're performing them. Autonomous actors improvising in real-time, making independent decisions based on their goals and emotional states.

It's a shift from deterministic creation to probabilistic performance. Instead of "generate this," it's "act this out."

The clever bit is separating "offline planning" from "online performance." First, AI agents collaborate to create a narrative blueprint - characters, goals, setting, plot points. Then different AI actors improvise their way through the story in real-time.

You're not writing a perfect script anymore. You're designing conditions for compelling stories to emerge. Less writer, more director and world-builder.

But how do you design for emergent narrative? If the agents are truly autonomous, the story could go anywhere. You can set up the stage and motivations, but you can't control what happens.

The designer becomes an architect of systems that guide rather than script outcomes. Building the entire theater and the troupe, not just the characters.