How to Make AI Agents Work: Less Magic, More Harness Engineering

Mae Capozzi · March 25, 2026AI agents work better when given appropriate context and guardrails.

AI agents aren't magic, but they're powerful when given appropriate context and guardrails. Over the past year, I've read numerous takes that imply AI agents will replace software engineers. I remain skeptical, since I'm finding myself working more, not less as LLMs and agents become more effective. I can't tell the future, but it doesn't look like software engineers will be replaced. More likely, our role will continue to shift drastically towards context engineering and harness engineering –– enabling AI agents to act more effectively and autonomously through a combination of curating the correct information and designing the right guardrails.

Where AI agents can reduce toil

The sweet spot for AI agents is work that's tedious for humans but follows clear patterns. I've had the most success with code refactoring and migrations. I've written about this before, in How I Use Claude Code to Reduce Toil as a Platform Engineer and Tooling migrations don't have to take weeks anymore.

An individual's taste matters: agents work best when you already know what good looks like.

Poor use cases for agents

It's tempting to expand one's use of agents beyond pure execution. But use care for anything requiring deep institutional knowledge. Human taste remains critical, and a true differentiator. Don't outsource your brain!

Use care in the following scenarios:

Complex business logic decisions: Agents don't understand why a particular validation exists or how a feature interacts with three other systems built by different teams over the past two years. They can't read between the lines of Slack conversations or understand the political context behind technical decisions.
Cross-system integration planning: Agents will give you technically correct answers that completely ignore the landmines. They don't know that Team A is planning a major refactor next quarter, or that the API you're thinking of using has reliability issues that don't show up in the documentation.
Performance optimization without profiling data: Agents will suggest optimizations based on general principles, but they're essentially guessing. Without real data about where your bottlenecks actually are, their suggestions are often irrelevant or counterproductive.
Annotating traces for evals: If you want to evaluate the output of an LLM prompt, you need to deeply understand your own product and it's potential failure modes.
Architecture decisions with long-term consequences: Agents don't know that you're gearing up to increase headcount, or that your company is trying to operate as lean as possible. Agents optimize for the information in front of them right now, not for the system you'll have next year. They can't weigh the tradeoffs between technical debt and shipping speed in the context of your team's growth plans or product roadmap.

Still, with the right context and guardrails, you can ship high quality code at a higher velocity.

Give agents the right information, just in time

Agents, like humans, perform better when they have the information they need to make informed decisions. Anthropic calls this "context engineering": curating and maintaining the optimal set of information during inference while managing token consumption. We need to deliberately curate relevant information.

Think about debugging a customer-reported issue. If you just tell an agent "fix the issue," it's going to struggle. If you provide a ticket with reproduction steps, it'll do better. But if you feed it observability data showing exactly how the failure manifests across multiple users, along with relevant code paths and recent changes, it becomes dramatically more effective.

I've seen teams make the opposite mistake with AGENTS.md files. Those are configuration files that provide context about your codebase and conventions. The initial recommendation was to write lengthy files with philosophical guidance and detailed tool documentation. This backfires because agents get overwhelmed and start ignoring parts of the context.

Progressive disclosure works better. Instead of dumping everything upfront, design your context so agents can discover information just-in-time. Think of it like good API design: surface what's immediately relevant and provide clear paths to dig deeper when needed.

Building systems for repeatable wins

OpenAI talks about "harness engineering": providing agents with the tools, abstractions, and internal structure they need to make progress toward high-level goals.

Even exceptional human engineers need to understand your codebase conventions and validation processes to be effective. Agents are the same way. They need guardrails and validation mechanisms built into their environment.

I've set up custom linters as backstops to prevent agents from using dependencies we're actively migrating away from. It's like having a safety net that catches mistakes before they become problems.

Another example is building dependency review workflows that automatically check security, compatibility, and team preferences. Instead of manually reviewing every dependency update, agents can handle the mechanical verification while flagging complex decisions for human review.

Also, instrument your code and send it to a tool like Honeycomb. Give your agents access to the mcp server so that they can understand the impact of their changes. Don't allow your application to become a black box that you don't understand.

Remember, you're not replacing human judgment. You're amplifying it by removing the tedious work that keeps you from focusing on the problems that actually require your expertise.

How to Make AI Agents Work: Less Magic, More Harness Engineering

Where AI agents can reduce toil

Poor use cases for agents

Give agents the right information, just in time

Building systems for repeatable wins

Continue Reading

AI agents removed the friction from writing telemetry

How I Use Claude Code to Reduce Toil as a Platform Engineer

Tooling migrations don't have to take weeks anymore