Onboarding AI into your codebase

You build a project. It’s clean, well-structured, cohesive. You know every corner of it. Then life happens: the project moves to a different team for day-to-day maintenance. They bring their own style, their own habits. Things start to drift. Not dramatically, just… gradually. The codebase gets a little messier with every handover.

Now drop an AI agent into that codebase. Same story, amplified. It doesn’t know the history, the constraints, the “we do it this way because of that incident three years ago”. Every change it makes risks introducing side effects it can’t even reason about.

You’ve probably experienced this already: you ask AI to do something, it produces code that looks plausible but breaks your style, misses your constraints, creates subtle bugs. You fix it, try again, hit the same wall. And you start to wonder if this AI thing is all hype.

It’s not. But your codebase isn’t ready for it yet.

Why AI struggles on real codebases
#

You read the success stories. Someone vibe-coded an entire app in a weekend. Another person built a prototype in an afternoon. Brilliant. But those are almost always greenfield projects, simple structures, no legacy baggage.

Your actual codebase? It has layers of history. It’s been through at least two or three “generations” of developers. Think of it like a biathlon relay at the Winter Olympics (they’re on right now, so bear with the metaphor 🥇): each team member skis their leg flat out, then hands off to the next. But unlike a real relay, nobody passes a map of the course.

The original author who designed it with full context in their head.
The maintenance team who inherited it, brought their own style, and started drifting.
The AI agent you’re now asking to make changes.

Each handover loses knowledge. The AI is essentially the third team, except it has even less context than team two did. It doesn’t have the benefit of corridor conversations, onboarding sessions, or chatting with that one senior dev who remembers why the payment service has that weird retry loop.

And there’s a compounding problem: context windows. Even with models that advertise a million tokens, your codebase probably won’t fit. And even if it does, stuffing everything into context leads to compression, drift, and mistakes. The model ends up skimming rather than understanding.

There’s a bit of irony here. Tools like GitHub Copilot, despite being the poster child for AI-assisted development, can actually give AI a bad reputation when used on complex codebases with their limited context windows. You hit the ceiling fast, the model starts making nonsensical suggestions, and you walk away thinking AI doesn’t work. It does work, but the setup matters enormously.

Treat it like a knowledge transfer
#

Here’s the mental shift that changed everything for me: onboarding AI is the same as onboarding a new team.

You wouldn’t throw a new team at a codebase and say “figure it out”. You’d do a knowledge transfer. You’d walk them through the architecture, explain the design decisions, point out the gotchas, show them where the bodies are buried.

AI needs the same thing. The only difference is the format: instead of meetings and whiteboard sessions, you write it down in markdown.

If you’re also figuring out which model to use for different tasks, I wrote about choosing the right GitHub Copilot model for the job.

Step 1: Document your codebase
#

This is where most of the upfront work goes, and it’s the single biggest lever you can pull.

You need to capture the stuff that lives in people’s heads: what the project does, what the features are supposed to achieve, what the constraints are, why certain design decisions were made. The kind of knowledge that normally gets lost in the second handover.

Where to start:

AGENTS.md (or equivalent) as your entry point. This is becoming the standard that most tools understand. It’s your “read this first” file for AI agents.
A docs/ or specs/ folder for the detail. One file per feature, one file per architectural decision, one file for constraints.
An index structure that links things together. This is more important than you’d think.

You probably already have some of this. Maybe it’s in Confluence, maybe it’s scattered across wikis, maybe it’s in someone’s head. The job is to get it into markdown, into the repository, next to the code. You can leverage AI to do the conversion: “take this Confluence page and turn it into a clean markdown spec.” You can even dictate your thoughts stream-of-consciousness style and ask AI to structure them into proper documentation.

The key insight about index files: keep your documentation in small, focused files that link to each other. Don’t create one massive 2000-line document.

Why? Because of how current tools work. When an AI agent needs to understand something, it typically reads files linearly. If your documentation is one giant file, the agent reads the first hundred lines looking for what it needs, doesn’t find it, keeps reading, and before you know it, half your context window is full of documentation that isn’t relevant to the task at hand.

With small linked files, the agent reads the index, finds the pointer to what it needs, reads just that file. Much less context wasted, much better results.

Now, this is a simplification. How tools consume files depends on the model and the tooling, and it will get better over time. But there’s no downside to structuring things this way: every editor and GitHub render markdown links properly, so it’s just as easy for humans to navigate. Win-win.

If you’d like a deeper tactical guide on structuring AGENTS.md and documentation specifically for AI consumption, let me know in the comments. I’m considering a dedicated post on that.

The magic bit: once you have this documentation, instruct your AI agent (in the AGENTS.md itself) to keep it up to date. Every time it makes a change, it should update the relevant docs. You tell it once, and it happens automatically. This is the part that makes the whole thing sustainable. You’re not maintaining docs manually anymore; you’re telling the agent to maintain them as part of every task.

Step 2: Invest in tests before features
#

This one feels counterintuitive. You’ve got this shiny AI tool, and instead of asking it to build features, I’m telling you to use it to write tests? Yes. Exactly.

Here’s why: tests are your safety net for everything that comes after. In the past, writing thorough tests for every subtle constraint, every concurrency edge case, every weird integration scenario was the right thing to do but often too time-consuming. Teams made a reasonable trade-off: rely on experienced developers to keep those constraints in their heads and catch problems during code review. Now the dynamic is shifting. The AI agent doesn’t have that experience in its head, and it will happily introduce subtle breakage in areas you thought were safe. But the flip side is that the same AI can help you write those previously-too-expensive tests in a fraction of the time.

Use AI to improve your test coverage before you use it for features. Focus on:

Edge cases you never got around to
Concurrency and race condition scenarios
Constraint validation (the “this should never happen but what if it does” stuff)
Integration points between services

“But I’m spending time on work that doesn’t ship features!” I hear you. And yes, there’s a real cost here. But every time you adopt new tooling, new techniques, new ways of working, there’s an investment phase. You’re betting that the upfront time spent on tests and documentation will pay off in faster, safer AI-assisted development down the line. Sometimes you need to take that bet.

One more thing while we’re talking about code quality: instruct your agent to keep comments minimal. AI agents have a tendency to over-comment code. Like, really over-comment. You don’t need a comment explaining that getUserById gets a user by ID. Put it in your AGENTS.md: comments should be reserved for non-obvious constraints, the “why” behind a surprising decision. And when a comment feels necessary to explain a piece of code, treat that as a signal to refactor the code to be more readable first. This is the kind of instruction that makes the agent behave like a thorough, responsible engineer: the same discipline you’d want from any team member, applied consistently through documentation rather than code review feedback.

Step 3: Build the feedback loop
#

This is where it all comes together. Every interaction with AI should make the next interaction better.

Think of it as a closed-loop system. Every time you plan a feature, implement it, or review the output, you learn something. Maybe the AI missed a constraint. Maybe it designed something in a way that would break under load. Maybe it didn’t know about that shared library your team uses for validation. Whatever you learn, capture it.

The loop looks like this:

Plan (use plan mode, let the agent read your docs, identify what needs to change)
Implement (let the agent write the code, run the tests)
Review (you review the output, catch the things it got wrong)
Enrich (update docs and tests with what you learned during review)

That fourth step is the one everyone skips, and it’s the most important one. Every constraint you document, every test you add, every design decision you capture makes the agent smarter for next time. Not because the model itself learns (it doesn’t), but because the context it operates in gets richer and more accurate.

Put it in AGENTS.md: instruct the agent to maintain documentation and tests as part of every task. Something like: “every time you implement a feature, update the relevant spec file. Every new constraint you discover should have a corresponding test.” The docs and tests serve complementary roles: tests give the agent automatic validation so it can self-correct during implementation, while docs drive its reasoning and design decisions without having to read every test file (which would eat through context fast).

Use plan mode properly. When you’re about to implement something, don’t jump straight to coding. Let the agent plan first. During planning, it will read your documentation, look at the codebase, and propose an approach. This is where you catch problems early: “actually, that won’t work because of X” or “you need to consider Y”. And when you provide that feedback, tell it to capture those insights in the documentation too. Two birds with one stone.

This is also where well-structured, small documentation files really pay off. The agent can navigate your index, pull in just the relevant specs, and plan a solid approach without filling its context with everything it doesn’t need.

The “discard and redo” technique
#

Here’s one that surprised me with how well it works. Sometimes during review you find a pile of issues with the AI’s implementation. Instead of trying to patch everything, do this:

Take all the problems you found and use them to update your documentation and tests.
Throw away the implementation.
Ask the agent to implement it again from scratch, now with the enriched context.

The second attempt is almost always cleaner. The agent now knows about the constraints it missed the first time, and it can design the solution with those constraints in mind from the start, rather than trying to retrofit fixes.

This also serves as a useful signal: if you’re constantly having to discard and redo, your knowledge base still isn’t rich enough. The agent needs more context to work autonomously.

The goal: shift your effort left
#

Everything I’ve described is really about one thing: moving your critical thinking earlier in the process.

Right now, you’re probably spending most of your effort in review, catching mistakes after the code is written. The goal is to invest so heavily in documentation, tests, and feedback loops that by the time code reaches you for review, it’s already in good shape. You shift from “writing code” to “designing systems and reviewing output”.

This won’t happen overnight. It’s a gradual process:

Phase 1: You drive everything. Plan, implement, review, fix. AI helps with the mechanics.
Phase 2: You plan and review. AI implements. You catch and fix fewer issues each cycle.
Phase 3: You review. AI plans and implements. The docs and tests are rich enough that most output is already solid.

You might not get to phase 3 on every codebase, and that’s fine. If all you achieve is spending less time hand-holding the agent and more time thinking about architecture, you’ve already won. The point isn’t to reach full autonomy. It’s to notice that each cycle gets a little smoother than the last.

It’s OK to just code it yourself
#

One more thing. Sometimes it’s genuinely faster to just write the code yourself. If you’ve spent more time explaining what you want than it would take to type it out, stop. Write it. No shame in that.

But here’s the trick: when you do code it yourself, don’t let that knowledge disappear. Ask the AI to look at your changes, understand why you did it that way, and generate the corresponding documentation updates and tests. You can even say: “given this feature request, what would you need to know so that next time you could implement something like this yourself?”

Use AI where it complements your weaknesses. Maybe you’re great at coding but terrible at writing documentation. Let AI do that part. Maybe you’re great at architecture but hate writing tests. Let AI do that part. The best results come from honest self-assessment of where you actually need help.

Where to start
#

If this all sounds like a lot, here’s a minimal starting point:

Create an AGENTS.md at the root of your repo. Write a paragraph about what the project does, what matters, and where the important code lives.
Add a docs/ folder with one file per major feature or component. Even a few bullet points per file is a start. Better yet, ask AI to generate the initial structure for you: point it at your codebase and say “create a docs folder with one spec file per major component, based on what you can see.”
Tell the agent to maintain it. Add an instruction in AGENTS.md: “Keep documentation up to date with every change.”
Next time you fix a bug: before fixing it, ask AI to write a test that reproduces it. Then fix it. Then ask AI to update docs if the bug revealed a constraint that wasn’t documented.

That’s it. Four things. You can scaffold the initial structure in an afternoon, and then refine it over time as you keep building features. Every subsequent AI interaction will be a little better for it.

A glimpse ahead: automating the loop
#

Once the manual feedback loop feels natural, the next step is automation. Imagine a CI job that runs after every PR merge and asks an AI agent: “do any new tests imply constraints that aren’t documented? Does any new documentation imply tests that don’t exist?” The agent proposes updates, you approve. Over time, you rely on the AI to handle this as a regular part of every change, and the extra manual work of spotting those gaps simply falls away.

This shifts the philosophy from “use tools to catch problems” to “use tools to automatically fix problems”. I’m still experimenting with this, and it deserves its own write-up, but the direction is clear: the less manual work you need to keep docs and tests in sync, the more sustainable the whole approach becomes.

Final thoughts
#

Onboarding AI into your codebase is a knowledge transfer problem, not a tooling problem. The model doesn’t need a bigger context window or a fancier IDE plugin. It needs to understand how your software works, what matters, and what to watch out for. And the way you give it that understanding is the same way you’d give it to any new team member: documentation, tests, and a tight feedback loop.

But there’s also a mentality shift. Working effectively with AI means spending more time thinking about what to build and why, and less time on the mechanics of how. You become more of a system designer and reviewer than a line-by-line coder. That can feel uncomfortable at first, but it’s genuinely a better use of your expertise.

The good news? AI is actually great at maintaining the very documentation it needs. You just have to get it started.

Happy coding! 🤖

Why AI struggles on real codebases#

Treat it like a knowledge transfer#

Step 1: Document your codebase#

Step 2: Invest in tests before features#

Step 3: Build the feedback loop#

The “discard and redo” technique#

The goal: shift your effort left#

It’s OK to just code it yourself#

Where to start#

A glimpse ahead: automating the loop#

Final thoughts#