Codebase RAG for Coding Agent - OpenML's AI Blogs

Major Dev Milestones#

Is it necessary to keep a folder for major dev milestones in a software project for agent?

Maintaining a specific folder for major development milestones in a codebase is not necessary. AI agents rely on context and state, and dumping completed milestone files into the repository can actually reduce context window efficiency and cause confusion.

Instead of cluttering the working directory with physical milestone folders, streamline our AI workflow with these precise practices works better:

1. Leverage Source Control and Tags#

Use Git repository’s native tools instead of manual folders, such as

Release Tags: Attach semantic version tags (e.g., v1.2.0-beta) to specific commits for major milestones.
Branching Strategy: Keep milestones strictly organized as branches (e.g., milestone/feature-x).
Git Commit History: Let the agent query the Git history (such as git logs or commit diffs) using API tool integrations instead of relying on manually updated directories

2. Implement Architectural Decision Records (ADRs)#

If the milestone involves a major pivot or structural change, use an ADR instead of a milestone folder.

Store these in a single docs/adrs/ folder.
These text files give the agent direct instructions and explain the why and how behind a major milestone, which is exactly what LLMs need to make future decisions.

Architectural Decision Records (ADRs), however, are increasingly recognized as a foundational component of the agentic manifest ecosystem. While a “manifest” often refers to a single technical file (like AGENTS.md), in agentic workflows, it encompasses a broader set of structured knowledge artifacts that provide an agent with the “why” and “how” of a project. This “broader set of structured knowledge” is essentially the essence of RAG system

ADR as Codebase RAG#

Compared to agentic manifest files like CONTEXT.md and AGENTS.md, ADRs function as the “rationale and memory” layer within the agentic manifest ecosystem, whereas files like AGENTS.md and CONTEXT.md act as the “operational briefing”.

CONTEXT.md and AGENTS.md are designed for immediate “bootstrapping” when a session starts. ADRs are more useful for Retrieval-Augmented Generation (RAG); they allow an agent to query specific past decisions only when it needs to understand a specific constraint

Recent practices, therefore, suggest “implanting” these records directly into the codebase rather than isolating them in a separate wiki. By placing ADRs in a standard /docs/adrs directory or near relevant modules, we ensure the agent automatically retrieves these decision records as part of its RAG or system context. Let’s take the example case in which a team decides to migrate a project language from JavaScript to TypeScript and let’s also assume the project is somehow utilizing the three.js library to perform some 3D rendering at this moment. An ADR file might look like this:

1
---
2
id: ADR-001
3
title: "Migrate Frontend to TypeScript"
4
status: "Accepted"
5
date: 2023-10-27
6
tags: ["frontend", "dx", "typescript"]
7
---
8

9
ADR-001: Migration to TypeScript
10
================================
11

12
Context
13
-------
14

15
The JavaScript codebase lacked type safety, leading to runtime errors in `three.js` integrations and complex component
16
props. We need a way to enforce contracts between the 3D logic and the UI.
17

18
Decision
19
--------
20

21
We will migrate the entire React frontend to TypeScript.
22

23
Consequences (Agent Constraints)
24
--------------------------------
25

26
- __RULE-01__: All new components MUST be created as `.tsx`.
27
- __RULE-02__: Use of `any` is discouraged; specific types for `three.js` objects (like `UnrealBloomPass`) are required.
28
- __RULE-03__: Any legacy `.js` files encountered during feature work should be converted to `.ts/tsx`.
29

30
Key Implementation Details
31
--------------------------
32

33
### Phase 1: Dependencies & Configuration
34

35
1. __Install TypeScript and Type Definitions__:
36

37
   - `typescript`
38
   - `@types/node`
39
   - `@types/react`
40
   - `@types/react-dom`
41
   - `@types/jest`
42
   - `@types/three`
43

44
2. __Initialize `tsconfig.json`__:
45

46
   - Run `tsc --init` or create a standard React-compatible `tsconfig.json`.
47

48
3. __Add `react-app-env.d.ts`__:
49

50
   - Standard file for Create React App projects to handle asset imports (SVG, CSS, etc.).
51

52
### Phase 2: File Migration
53

54
1. __Rename Files__:
55

56
   - `src/index.js` to `src/index.tsx`
57
   - `src/App.js` to `src/App.tsx`
58
   - `src/setupTests.js` to `src/setupTests.ts`
59

60
2. __Fix Type Errors in `App.tsx`__:
61

62
   - Define types for `backgroundData`.
63
   - Handle `UnrealBloomPass` and `fgRef` types.
64
   - Ensure `styled-components` and `react-bootstrap` are correctly typed.
65

66
### Phase 3: Validation
67

68
1. __Run Development Server__: `yarn start` to ensure the project still runs.
69
2. __Run Build__: `yarn build` to ensure TypeScript compilation passes.

What this ADR highlight are:

The “Decision” Element: This ADR records why we chose TypeScript over alternatives (like JSDoc or Flow) and what the permanent status is (e.g., “Accepted”).
Explicit Rules: Agents need to know that “No new .js files are allowed” is now a hard rule.
Metadata: Adding a frontmatter block (YAML) allows an agent to “index” this decision without reading the whole file.

This TypeScript ADR is a “load-bearing” document for an agent. While AGENTS.md might simply say “Use TypeScript,” the ADR explains why we moved away from JavaScript, which prevents the agent from suggesting a revert to JS when it encounters complex type errors in three.js.

TIP
If a decision in an ADR becomes a permanent project law (e.g., “All new UI code MUST be TypeScript”), we should eventually promote that specific rule from the ADR into AGENTS.md

How to Prompt LLM to Generate ADR#

Here is a system prompt we can use whenever we want an LLM to generate a new Architectural Decision Record (ADR) for our a frontend project. It enforces the exact formatting standards, frontmatter metadata, and structural constraints from the example above

How to use it
General Rules below varies for different projects
When we paste this prompt into an LLM, just append our raw requirements at the bottom like this:
1
We need a new ADR for migrating our local state management to a lightweight custom hook context because prop drilling is
2
getting messy in our 3D visualization view. Give it ID 002."

1
You are an expert software architect and technical writer. Your task is to generate a new Architectural Decision Record
2
(ADR) for a codebase that treats ADRs as an active "Codebase RAG" and memory layer for AI coding agents.
3

4
You must strictly adhere to the following structural, formatting, and stylistic constraints based on our established
5
repository standards:
6

7
### 1. General Rules
8

9
- Do NOT use the `var` keyword in any code examples.
10
- Max line length for any code blocks is 120 characters. Long lines must be broken down cleanly.
11
- Do NOT introduce any third-party frameworks, external UI component libraries (e.g., Tailwind CSS, styled-components,
12
  react-bootstrap), or HashiCorp tools. Stick to pure, native configurations and open-source standards.
13

14
### 2. File Name Format
15

16
Output the suggested filename at the very top of your response using the format: `docs/adrs/NNNN-slugified-title.md`
17
(e.g., `docs/adrs/0002-implement-native-css-modules.md`).
18

19
### 3. Structural Format
20

21
The document must follow this exact Markdown structure, using identical header naming conventions:
22

23
---
24
id: ADR-NNN
25
title: "Short Title of the Decision"
26
status: "Accepted"
27
date: YYYY-MM-DD
28
tags: ["lowercase", "comma-separated", "tags"]
29
---
30

31
ADR-NNN: Short Title of the Decision
32
===================================
33

34
Context
35
-------
36

37
[Provide a clear, brief explanation of the technical problem, missing safety, or limitations prompting this change.]
38

39
Decision
40
--------
41

42
[State the exact architectural pivot or choice made clearly and directly.]
43

44
Consequences (Agent Constraints)
45
--------------------------------
46

47
[Define hard, strict rules that an AI agent reading this repository must obey. Format them exactly as follows:]
48
- __RULE-01__: Detailed rule text here.
49
- __RULE-02__: Detailed rule text here.
50

51
Key Implementation Details
52
--------------------------
53

54
### Phase 1: [Phase Name]
55
1. __Bold Title__: Description of steps or configuration.
56
2. __Bold Title__: Description of steps or configuration.
57

58
### Phase 2: [Phase Name]
59
1. __Bold Title__: Detailed action items.

How to Configure Agent to be Aware of the ADR’s#

To ensure our coding agent, such as Gemini CLI, is effectively aware of our Architectural Decision Records (ADRs) and utilizes them as a robust “Codebase RAG” system, we should implement the following strategies:

Standardized Placement and Discovery: Place ADRs in a dedicated, standard directory - specifically /docs/adrs/ - so the agent’s file-system crawler can easily index them.
Structured Metadata for Indexing: Use a YAML frontmatter block at the top of every ADR file. This allows the agent to parse the file’s context, status, and tags (e.g., ["frontend", "dx", "typescript"]) without needing to process the entire document, significantly increasing efficiency.
Direct System Instructions: If an ADR decision contains a “permanent project law” - such as a mandate that all new UI code must be TypeScript - promote that rule from the ADR into AGENTS.md file. This ensures the agent treats the constraint as a core operational rule during the bootstrapping phase.
Tool-Based Retrieval: Do not force the agent to keep all ADRs in its active context window, as this reduces efficiency. Instead, ensure agent workflow includes API tool integrations that allow it to query the /docs/adrs/ directory or search through commit history via Git logs and diffs whenever it needs to understand a specific architectural constraint.
Explicit Contextual Links: Treat ADRs as the “rationale and memory” layer. When the agent initiates a task, ensure the retrieval mechanism is configured to pull relevant ADRs based on the task’s subject matter (e.g., frontend tasks triggering retrieval of TypeScript migration ADRs), which prevents the agent from making decisions that contradict past architectural pivots.

By structuring your ADRs this way, they function as a targeted knowledge retrieval system rather than just static documentation, providing the agent with the “why” and “how” behind our project’s history.

Finally, we will bootstrap our AGENTS.md, which will be our single orchestration file in root directory, to make our ADR-driven workflow functional. Here is an example:

1
Agentic Manifest & Operational Briefing
2
=======================================
3

4
Welcome, Agent. You are operating within a structured, self-documenting codebase. To ensure alignment with architectural
5
decisions, constraints, and project rules, follow this bootstrap protocol.
6

7
1. Context Ingestion Protocol
8
-----------------------------
9

10
Before executing any task, refactoring code, or generating new modules, you MUST discover and analyze the project's
11
historical rationale layer.
12

13
- __Discovery Path__: Scan the `/docs/adrs/` directory.
14
- __Filtering__: Index all markdown files with the status `Accepted` in the YAML frontmatter.
15
- __Application__: Treat the `Consequences` or `Constraints` sections of these records as hard, permanent laws for your
16
  execution context.
17

18
2. Core Architectural Memory
19
----------------------------
20

21
Do not invent architectural patterns or revert past migrations. When working on specific modules, cross-reference your
22
task with the tags defined in the ADR frontmatter:
23

24
- If working on __Frontend, DX, or UI rendering__, you must strictly adhere to `docs/adrs/ADR-001.md`.
25
- Never attempt to bypass typing restrictions or downgrade files to older formats if an active ADR forbids it.
26

27
3. Tool Execution Strategy
28
--------------------------
29

30
If a task results in a complex type error or architectural conflict:
31

32
1. Do not immediately attempt to rewrite the architecture.
33
2. Use your file-searching tools to query `/docs/adrs/` for keywords matching the error context (e.g., "three.js",
34
   "TypeScript").
35
3. Read the "Context" and "Decision" blocks to understand *why* the code was structured this way.

Once we drop this into our project root, any capable coding agent reading AGENTS.md on startup will automatically know exactly how to pull, parse, and respect our decision records.