2782 words
14 minutes
AI Agent - An Overview
2025-11-12
2026-05-05

Overview of LLM Agents#

Before divining in deep theoretically and practically, it’s essential to grasp the handful of concepts that turn a “dumb” LLM (which just completes text) into an “agent” that can reason, plan, and act. An agent is not a single technology; it’s an architecture built around an LLM.

The core of this architecture is a reasoning loop. The most fundamental and important concept to learn here is ReAct (Reasoning and Acting). This idea, first published by Google researchers, is the foundation for almost every modern agent, including those in LlamaIndex and LangChain.

The ReAct loop works as follows:

  1. Thought: The agent is given a complex task (e.g., “What’s the weather in the capital of France, and who is the president of that country?”). The LLM’s first step is to think and create a plan. It will generate an internal monologue, like: “I need to solve this in two parts. First, find the capital of France. Second, find the president of France. Third, get the weather for that capital city. Fourth, combine the answers.”
  2. Act: Based on its first thought (“find the capital of France”), the agent decides to act. It chooses a tool from a list it has been given. For example, it might choose a search tool and generate the specific query: search("capital of France").
  3. Observation: The agent executes the action and gets a result (the observation). For example, the search tool returns: “The capital of France is Paris.”
  4. Repeat: This observation is fed back into the loop as new context. The agent thinks again: “OK, the capital is Paris. My plan said the next step is to find the president. I will act by using the search tool with the query search("president of France").”
  5. …and so on: This loop continues - Thought, Act, Observation, Thought, Act, Observation - until the agent’s final thought is: “I have all the information. The president is Emmanuel Macron and the capital is Paris. I will now act by using the search_weather tool with the query weather("Paris").” After that observation, its final thought will be, “I have all the answers and can now formulate the final response.”

This “Thought, Act, Observation” cycle is the essential theory. The agent is simply an orchestration loop that provides the LLM with a system prompt, a set of tools, and a “scratchpad” to write down its thoughts and observations. More advanced concepts like Reflexion simply add a step where the agent critiques its own past actions to improve its plan, and ReWOO optimizes this process.

Agent Libraries#

There are many frameworks that make agentic systems easier to implement. One interested in agents will be glad to know that the vast majority of the most popular and well-documented AI agent development is happening in an ecosystem centered around open models. Here is a guide to the most prominent frameworks, models, and learning resources:

  • LlamaIndex: The leading framework for building powerful Retrieval-Augmented Generation (RAG) agents (agents that can query our own data). While its name is inspired by Meta’s Llama model, it is fully model-agnostic. Its documentation and tutorials heavily feature Llama, OpenAI’s GPT, and Anthropic’s Claude.

  • OpenAI Agents SDK

  • LangGraph

    TIP
    • LangChain is for building a RAG (Retrieval Augmented Generation) system, a simple chatbot, or a data extraction pipeline where the steps are known in advance
    • LangGraph is for building an Agent that needs to use tools, handle ambiguous instructions, collaborate with other agents, or requires a “human-in-the-loop” workflow.
  • (Hugging Face) smolagents

Quick Start Explore in Kaggle#

These frameworks make it easy to get started by simplifying standard low-level tasks like calling LLMs, defining and parsing tools, and chaining calls together. However, they often create extra layers of abstraction that can obscure the underlying prompts and responses, making them harder to debug. They can also make it tempting to add complexity when a simpler setup would suffice.

OpenML suggests that developers start by using LLM APIs directly: many patterns can be implemented in a few lines of code. If you do use a framework, ensure you understand the underlying code. Incorrect assumptions about what’s under the hood are a common source of customer error.

Those who want to pick up of agent basics quickly can check out this Jupyter notebook Explore in Kaggle

Learning Agent Deeply#

Although LangChain and LlamaIndex are excellent for building agent, building from first principles gives us a durable understanding that frameworks alone cannot. To do that

  1. we must learn theory behind Agent: how it works; what are all the relevant concepts; then
  2. build an agent from scratch, totally free from any frameworks such as LlamaIndex so that we can get our hands dirty with all aspects of agent with deep understanding

The ReAct paper, however, omits lots of details for those who would like to study it from ground-up because it is, unfortunately, a conference paper, not a textbook. Its primary goal was to introduce a new synthesis of ideas and prove (through benchmarks) that this synthesis was effective. It assumes the reader is already an expert in the 3 distinct, advanced fields with their own deep theory it’s combining:

The “Inner Speech” Theory (Cognitive Psychology)#

This is the deepest, most foundational layer. The reason the ReAct architecture works so well is that it computationally mimics a core human cognitive function.

  1. Vygotsky’s Thought and Language: This is the origin of the “private speech” (thinking aloud) to “inner speech” (internal monologue) theory. It explains why language is a tool for self-regulation and planning.
  2. Baddeley’s “Working Memory” Model: This provides the cognitive architecture, explaining the “Phonological Loop” (the “inner voice”) as the specific mechanism for holding and manipulating verbal information (i.e., the plan) in short-term memory.

The ReAct paper essentially created a Vygotskian agent. It forces the LLM to use “private speech” (the Thought trace) to regulate its own behavior, which is a far more robust method than simple stimulus-response.

The “Agent/Policy” Theory (Reinforcement Learning)#

This is the most important piece. ReAct assumes we are fluent in the language of Reinforcement Learning (RL). When it uses words like “policy,” “agent,” “state,” “action,” and “observation,” it’s borrowing the entire formal mathematical framework of Markov Decision Processes (MDPs).

RL is a fascinating intersection of computer science, optimal control, and advanced mathematics. To learn it “deeply” means starting with the mathematical foundations before jumping into the “deep” (neural network) part. Reinforcement Learning: An Introduction by Richard S. Sutton and Andrew G. Barto is the foundational text for the entire field and formally introduces the core mathematical concepts including:

  • Markov Decision Processes (MDPs): The mathematical framework for all RL problems.

  • Bellman Equations: The fundamental equations that all RL algorithms try to solve.

  • Dynamic Programming: The theoretical (but often impractical) “perfect” solution.

  • Monte Carlo Methods & Temporal-Difference (TD) Learning: The breakthroughs that make RL practical (this includes Q-Learning and SARSA).

    TIP

    It is not an exaggeration to say that without the Monte Carlo (MC) method, modern AI simply would not exist. The fundamental reason is that AI is essentially the study of high-dimensional probability distributions.

    To systematically study Monte Carlo method, here is the recommended resource List:

    • Statistical Mechanics: Algorithms and Computations by Werner Krauth

      Understand MC as a method for solving definite integrals (calculating areas/volumes) in high dimensions. If we want to find the area of an irregular shape, we can’t use calculus. Instead, enclose it in a box, throw 10,000 random darts at the box, and count how many land inside the shape. The ratio gives us the approximated area. This is exactly how we calculate “Expected Value” (the core of RL). We don’t solve the integral; we sample rewards and average them.

As we read it, we will have a series of “aha!” moments. We will see that the ReAct loop is essentially an implementation of an RL “policy,” where:

  • State = The history of all past thoughts, actions, and observations (the “scratchpad”).
  • Action = The choice to either generate a Thought or an Act (like calling a tool).
  • Policy = The LLM itself, which has been “prompted” to decide the best action given the current state.

OpenML provides the following hand-made study materials to assist the study of Reinforcement Learning: An Introduction:

The “Reasoning” and “Acting” Theory (Other LLM Papers)#

Indicating in its title, ReAct didn’t invent “reasoning” or “acting” in LLMs. It was the first to synergize them. The paper was a direct response to two other lines of research that were popular at the time.

  • The “Reasoning” (Chain-of-Thought) Track: The ReAct paper is building directly on the Chain-of-Thought Prompting Elicits Reasoning in Large Language Models paper, which showed that we could get an LLM to “reason” by simply prompting it to “think step-by-step.” The ReAct Thought step is a more structured version of CoT.
  • The “Acting” (Tool-Use) Track: It also builds on papers like Toolformer and other research showing that LLMs could be prompted to use external tools (like a calculator or a search API).

The ReAct paper’s core argument is that both of these tracks are flawed on their own. “Reasoning-only” agents hallucinate, and “Acting-only” agents can’t plan.

Building Agent From Scratch#

Now we are in a position suitable for implementing agentic theory. “Building from scratch” simply means we will be the one to write the Python code that:

  • Manages the loop.
  • Formats the prompt that coaxes the LLM to “think.”
  • Provides and executes the tools.

This approach is 100% model-agnostic. We just need an API from a provider like OpenAI (GPT-4o), Anthropic (Claude 3), or we can run an open-source model like Meta’s Llama 3 locally.

Here are some of the best framework-free guides to get us started:

Building the Core ReAct Agent Loop#

A perfect starting point is a tutorial that implements the ReAct loop directly. These guides show us how to build the “brain” of the agent. We will write Python code to manage the “Thought, Act, Observation” cycle using nothing but a standard LLM API.

A great resource for this is “Building an AI agent from scratch in Python” by Leonie Monigatti. It’s a clean, simple implementation that uses an API (Anthropic’s, but it’s trivial to swap for OpenAI’s) to create the agent class, memory, and tool-use logic from the ground up.

Building a Core Capability: RAG from Scratch#

Most agents need to access knowledge. The most common way to do this is Retrieval-Augmented Generation (RAG). Before an agent can use a RAG tool, it helps to build that tool ourselves. This involves:

  • Loading documents (e.g., text files, PDFs).
  • Chunking them into small pieces.
  • Using an embedding model (like one from sentence-transformers) to turn those chunks into vectors.
  • Storing those vectors in a simple vector database (like ChromaDB or even a basic Python-based FAISS index).
  • Writing a search function that embeds a user’s query and finds the most similar chunks.

The Hugging Face blog has a fantastic, concise article titled “Code a simple RAG from scratch” that does exactly this. It shows us how to build the retrieval and generation parts using standard Python libraries, giving us a fundamental agent-ready tool.

By starting with these resources, we will build the two most important parts of any agent: the reasoning loop (ReAct) and the knowledge retrieval (RAG), all from first principles.

Coding Agent Solutions#

Mission Critical Projects - Paid Google Gemini CLI#

Google Gemini CLI is one of the best solutions for mission-critical coding projects because

  • it is very fast,
  • it understands your entire codebase, and
  • it offers frictionless user experience
TIP

Google Gemini web app is also a great alternative for great amount of coding scenarios:

  • prototyping
  • coding up parts of a large project in isolation
  • etc.

We can also upload the entire codebase folder in which case the Geminin web can largely perform the same task as Germini CLI does

In such cases, a great diff tool, such as Diffchecker would come handy when we want to compare local and remote code versions

To make sure we are using the most advanced model, we must enable preview features in the CLI configuration. We only need to do this once.

  1. Open your terminal and run gemini
  2. Type /settings and press Enter
  3. Locate Preview Features and toggle it to true.
  4. Restart the CLI (type /quit and run gemini again).

Once preview features are enabled, we can pick the Pro 3 model directly from the command line using the --model flag.

Terminal window
gemini --model gemini-3-pro-preview

Alternatively, if we are already inside the tool or want to set it permanently, we can use these methods:

  • Interactive Switch: Inside the CLI, type /model. We will now see “Auto (Gemini 3)” or “Pro (gemini-3-pro-preview)” as options. Select one of them

  • Environment Variable: To make this default without typing the flag every time, set the environment variable in our shell profile (.bashrc or .zshrc):

    Terminal window
    export GEMINI_MODEL="gemini-3-pro-preview"

Experimental Projects - Mixture of Gemini Free Tier and Local Agent#

The advanced model of Gemini CLI, however, is paid which is total okay; Keeping a local, cost-effective “sandbox” for experimental coding, however, prevents burning API credits on non-critical tasks. This section discusses an experimental approach to explore the free alternatives.

Using Gemini Free Tier#

To configure our experimental projects to use the free tier while keeping our mission-critical projects on the paid “Pro” models, we should use Project-Specific Configuration, which allows us to override our global “Paid” settings only when we are working inside specific directories.

The “Free Tier” is generally tied to specific authentication methods (like a personal Google Account or a free Google AI Studio API Key) and specific models (usually the “Flash” series).

  • Mission-Critical (Global): Uses your Paid/Vertex AI credentials + Pro Model.
  • Experimental (Local): Uses Personal Auth/Free API Key + Flash Model.

Here is how to set this up:

The Gemini CLI looks for a .gemini folder inside the current project directory. Settings found here override the global system settings.

  1. Navigate to the experimental project’s root folder.

  2. Create a directory named .gemini.

  3. Create a file inside it named settings.json with path of <project-root>/.gemini/settings.json

  4. Add the following configuration to force the use of the “Flash” model (which is typically the free/low-cost tier):

    settings.json
    {
    "model": "gemini-2.0-flash"
    }
    NOTE

    Check gemini models list in terminal to verify the exact name of the latest Flash model available to us, e.g., gemini-1.5-flash or gemini-2.5-flash.

Option 2: Use a Free API Key (Specific to Project)#

If our global setup uses a paid Vertex AI credential, simply switching the model might still incur Google Cloud charges. To ensure it is completely free, use an API Key from Google AI Studio (which has a generous free tier for Flash models).

  1. Get a Free Key: Go to Google AI Studio and create a new API Key (ensure it is on the “Free of Charge” tier plan).

  2. Configure Project: In your experimental project root, create a .env file:

    .env
    GEMINI_API_KEY="<YOUR_FREE_TIER_KEY>"
NOTE

The Gemini CLI typically automatically loads .env files in the current directory, allowing this key to take precedence over global authentication.

Option 3: Quick Command-Line Switch#

If we don’t want to create config files, we can simply pass the model flag when running commands for the experimental projects:

Terminal window
gemini chat --model gemini-2.0-flash

Local Agent#

WARNING
  • Expect config overhead and possible slowness in interacting with free agents

  • List of horrible agent products that must never be used:

    • Aider: hard to config; unexceptionally slow; generates code at wrong path; commits codes without your permission

At this point it is not hard to come to the conclusion that setting up a local coding agent is essentially having

  1. a local LLM running as the “brain”. A popular one would be Ollama, which can be easily set up by following this link
  2. a software as an interface that handles prompt, proxies the “Thought, Act, Observation” cycle, context, file editing, and history

The rest of the section primarily discusses the software.

Continue#

Continue is a good option if we develop codes in IDE, such as JetBrain, and can be installed as an IDE plugin.

Once the plugin is installed, configure it in the following way:

  1. Click the Continue icon in IDE sidebar (right).

  2. Pull the following ollama models:

    Terminal window
    ollama pull starcoder2:3b
    ollama pull llama3.1
  3. Click the gear icon (⚙️) to open config.yaml and make it editable by following IDE’s prompt

  4. Replace the config.yaml contents with the following

    ~/.continue/config.yaml
    name: Local Agent Config
    version: 1.0.0
    models:
    # ---------------------------------------------------------
    # PRIMARY AGENT (Use Llama 3.1 for stability)
    # ---------------------------------------------------------
    - name: Llama 3.1 (Agent)
    provider: ollama
    model: llama3.1:latest
    roles:
    - chat
    - edit
    - embed
    # Explicitly tell Continue this model handles tools well
    capabilities:
    - tool_use
    # ---------------------------------------------------------
    # SECONDARY (gpt-oss)
    # ---------------------------------------------------------
    - name: GPT-OSS
    provider: ollama
    model: gpt-oss:20b
    roles:
    - chat
    - edit
    # ---------------------------------------------------------
    # AUTOCOMPLETE (Starcoder2)
    # ---------------------------------------------------------
    - name: Starcoder2
    provider: ollama
    model: starcoder2:3b
    roles:
    - autocomplete
    data:
    debounceDelay: 500
    requestOptions:
    max_tokens: 1024
    # ---------------------------------------------------------
    # CONTEXT PROVIDERS
    # ---------------------------------------------------------
    contextProviders:
    - name: codebase
    params:
    nRetrieve: 25
    useReranking: true
    - name: docs
    - name: open
    - name: terminal
    - name: diff
    allowAnonymousTelemetry: false

Now we can start using it. By default, Continue plugin does not “knows” the project structure; this is a significant difference with Gemini CLI and is good in some cases where we don’t want agent to take full charge of all works but only offers advice and we are still the person who put down final lines of code

(To be continued…)

AI Agent - An Overview
https://blogs.openml.io/posts/agent/
Author
OpenML Blogs
Published at
2025-11-12
License
CC BY-NC-SA 4.0