Core Concepts: Agents, Trainers, and the Lightning Server

Core Concepts: Agents, Trainers, and the Lightning Server

Now that you have your environment set up, let’s explore the foundational concepts and key components that make Agentic Lightening so powerful. Understanding these building blocks is crucial for effectively leveraging the framework.

Agentic Lightening operates on a client-server architecture, enabling the decoupling of your agent’s execution logic from the optimization process. The main actors in this system are:

  1. LitAgent (The Agent Client): Your AI agent, often built with another framework, wrapped to interact with the Lightening system.
  2. AgentLightningServer (The Server): A central hub that manages tasks, resources, and orchestrates the training loop.
  3. Trainer (The Optimization Engine): The component that runs the training algorithms, leveraging data from LitAgent instances via the AgentLightningServer.
  4. LightningStore: A central repository (often backed by a database) that holds tasks, resources, and traces, facilitating the feedback loop.

Let’s break down each of these in detail.

1. LitAgent: Your Agent’s Interface to Learning

At the heart of Agentic Lightening’s flexibility is the LitAgent. This is the abstraction that allows any agent – whether it’s a LangChain agent, an AutoGen assistant, a custom Python script using OpenAI’s API, or even just a simple function – to become “trainable.”

The LitAgent acts as a client that communicates with the AgentLightningServer. When Agentic Lightening is tasked with training, it initiates “rollouts” where your LitAgent is given a task and executes its logic. During this execution, the LitAgent (or an attached “tracer”) records important information about its steps, observations, and decisions.

Key Aspects of LitAgent:

  • training_rollout(self, task, rollout_id, resources): This is the primary method you’ll implement or wrap. It defines how your agent handles a single task during a training run.
    • task: The specific problem or input the agent needs to address.
    • rollout_id: A unique identifier for the current execution trace.
    • resources: Any shared resources (like prompt templates, tool definitions, or even model weights) provided by the server, which the agent might use or update.
  • Decoupled Logic: The beauty is that the training_rollout method itself contains your agent’s normal operational logic. You’re not contorting your agent into an RL loop directly; you’re just providing an entry point for the training system to observe its behavior.
  • Reward Signal: Inside training_rollout, your agent will eventually produce a result. Crucially, you need to return a reward signal, often a float, that tells the trainer how well the agent performed on the given task. This reward is the primary feedback mechanism for optimization. It can be a simple binary (0 or 1 for success/failure), a scalar representing accuracy, or a more complex value.

Code Example: A Simple LitAgent

Let’s create a very basic LitAgent that simply tries to add two numbers and returns a reward based on correctness.

# Save this as simple_agent.py
import asyncio
from agentlightning.litagent import LitAgent
from agentlightning.types import AgentLightningTask, AgentResource

# We'll simulate an LLM call with a simple function for now.
# In a real scenario, this would be an actual LLM interaction.
async def mock_llm_add(input_str: str) -> str:
    """
    A mock LLM that tries to add two numbers from a string.
    Example: "Add 5 and 3" -> "Result: 8"
    """
    if "add" in input_str.lower():
        parts = input_str.lower().replace("add", "").replace("and", "").split()
        numbers = [int(s) for s in parts if s.isdigit()]
        if len(numbers) == 2:
            return f"Result: {sum(numbers)}"
    return "Result: I cannot perform that calculation."

class SimpleMathAgent(LitAgent):
    """
    A LitAgent that attempts to add two numbers from a task description
    and provides a reward based on correctness.
    """
    async def training_rollout(
        self,
        task: AgentLightningTask,
        rollout_id: str,
        resources: dict[str, AgentResource],
    ) -> float:
        print(f"[{rollout_id}] Agent received task: {task.name} - '{task.context}'")

        # In a real agent, 'resources' might contain prompt templates or tool definitions.
        # For now, we'll just simulate the agent trying to parse the context.

        # Our mock LLM call
        llm_response = await mock_llm_add(task.context)
        print(f"[{rollout_id}] Mock LLM response: {llm_response}")

        # Extract the expected answer and compare.
        # We assume tasks are like "Add X and Y (Expected: Z)"
        expected_part = ""
        if "expected:" in task.context.lower():
            start_index = task.context.lower().find("expected:") + len("expected:")
            expected_part = task.context[start_index:].strip()

        try:
            expected_answer = int(expected_part)
            if "Result:" in llm_response:
                actual_result_str = llm_response.split("Result:")[1].strip()
                if actual_result_str.isdigit():
                    actual_answer = int(actual_result_str)
                    if actual_answer == expected_answer:
                        print(f"[{rollout_id}] Correct! Actual: {actual_answer}, Expected: {expected_answer}")
                        return 1.0  # Full reward for correct answer
                    else:
                        print(f"[{rollout_id}] Incorrect. Actual: {actual_answer}, Expected: {expected_answer}")
                        return 0.0  # No reward for incorrect answer
            print(f"[{rollout_id}] Could not parse actual result from LLM response.")
            return 0.0
        except ValueError:
            print(f"[{rollout_id}] Could not parse expected answer from task context or LLM result.")
            return 0.0 # No reward if parsing fails

    # A simple run method for local testing without the server
    async def run_locally(self, task_context: str, expected_answer: int) -> float:
        mock_task = AgentLightningTask(name="Local Test", context=f"{task_context} (Expected: {expected_answer})")
        return await self.training_rollout(mock_task, "local_test_id", {})

async def main():
    agent = SimpleMathAgent()
    # Test cases
    print("\n--- Local Test 1: Correct ---")
    reward1 = await agent.run_locally("Add 10 and 5", 15)
    print(f"Local Test 1 Reward: {reward1}")

    print("\n--- Local Test 2: Incorrect ---")
    reward2 = await agent.run_locally("Add 7 and 3", 11)
    print(f"Local Test 2 Reward: {reward2}")

    print("\n--- Local Test 3: Invalid input ---")
    reward3 = await agent.run_locally("Tell me a joke", 0) # Expecting 0 reward
    print(f"Local Test 3 Reward: {reward3}")

if __name__ == "__main__":
    asyncio.run(main())

To run this example:

  1. Save the code as simple_agent.py.
  2. Make sure your Agentic Lightening virtual environment is active.
  3. Run: python simple_agent.py

Expected Output (will vary slightly based on timing/IDs):

--- Local Test 1: Correct ---
[local_test_id] Agent received task: Local Test - 'Add 10 and 5 (Expected: 15)'
[local_test_id] Mock LLM response: Result: 15
[local_test_id] Correct! Actual: 15, Expected: 15
Local Test 1 Reward: 1.0

--- Local Test 2: Incorrect ---
[local_test_id] Agent received task: Local Test - 'Add 7 and 3 (Expected: 11)'
[local_test_id] Mock LLM response: Result: 10
[local_test_id] Incorrect. Actual: 10, Expected: 11
Local Test 2 Reward: 0.0

--- Local Test 3: Invalid input ---
[local_test_id] Agent received task: Local Test - 'Tell me a joke (Expected: 0)'
[local_test_id] Mock LLM response: Result: I cannot perform that calculation.
[local_test_id] Could not parse expected answer from task context or LLM result.
Local Test 3 Reward: 0.0

Exercise 1: Enhancing Reward Logic

Modify the SimpleMathAgent’s training_rollout method to:

  1. Give a reward of 0.5 if the agent correctly identifies that it cannot perform the calculation (e.g., for “Tell me a joke”) and the expected answer for such tasks is defined as None or NaN.
  2. Keep 1.0 for correct numeric answers and 0.0 for incorrect numeric answers.
  3. Update the run_locally method to test this new None or NaN expected answer for invalid math tasks.

2. AgentLightningServer: The Central Brain

The AgentLightningServer is a FastAPI application that acts as the communication hub between your running LitAgent instances (clients) and the Trainer (which might run on the same or a different machine). It’s responsible for:

  • Task Management: Providing tasks (AgentLightningTask) to agents for execution.
  • Resource Management: Maintaining and versioning shared resources (AgentResource) like prompt templates, tool definitions, or even model configurations that agents use. This allows the trainer to push updated resources to agents during the learning process.
  • Rollout Collection: Receiving completed “rollouts” (execution traces, rewards) from LitAgent clients.
  • LLM Proxy: Optionally, it can host an OpenAI-compatible LLM endpoint. This is particularly useful in RL scenarios where the trainer needs to control which LLM model version the agents interact with.

You typically run the server as a standalone process.

Starting the AgentLightningServer

# In a new terminal, activate your virtual environment if not already.
# Make sure you are in the directory where you want to start the server,
# or specify the path to your server configuration.
agentlightning server start --host 0.0.0.0 --port 8000

This command starts the FastAPI server, listening on http://0.0.0.0:8000. By default, it will look for configuration files and a LightningStore.

Key Server Configurations:

  • --host and --port: Specify the network interface and port to listen on.
  • --openai-compatible-llm-uri: If you want the server to host an LLM endpoint, you can point it to a vLLM server or similar.
  • --debug: Enables debug mode for more verbose output.

Important Note: For a full training loop, the server needs to be running. It handles the queuing of tasks for agents and the collection of their results.

3. Trainer: The Optimization Orchestrator

The Trainer is the brain of the optimization process. It orchestrates the entire training loop, which typically involves:

  1. Fetching Tasks: Getting tasks from the LightningStore (via the AgentLightningServer).
  2. Dispatching Rollouts: Sending these tasks to multiple LitAgent instances (which are managed by “workers”) to execute their training_rollout method.
  3. Collecting Data: Receiving the results of these rollouts, including the reward and potentially detailed interaction traces.
  4. Applying Algorithms: Feeding this collected data into the chosen optimization algorithm (RL, APO, SFT).
  5. Updating Resources: If the algorithm suggests improvements (e.g., a better prompt template, updated model weights), the Trainer will update the AgentLightningServer’s AgentResource store. These updated resources are then fetched by agents in subsequent rollouts, closing the feedback loop.

Key Aspects of Trainer:

  • Trainer(n_workers=..., backend=...):
    • n_workers: Specifies how many parallel agent instances should be run simultaneously. This accelerates data collection.
    • backend: The URL of your running AgentLightningServer (e.g., "http://localhost:8000").
  • trainer.fit(agent, backend, algorithm, ...):
    • agent: An instance of your LitAgent. The trainer will use this to spawn workers.
    • backend: The server URL.
    • algorithm: The optimization algorithm to use (e.g., a reinforcement learning algorithm from verl, or a custom prompt optimizer).
  • trainer.dev(agent, task, ...): A useful debugging method to run a single rollout of your agent without needing a full server or complex training loop. Excellent for iterating on training_rollout logic.

Code Example: Using the Trainer

Let’s see how you’d typically instantiate and use a Trainer. This example assumes you have an AgentLightningServer running in the background.

# Save this as run_trainer.py
import asyncio
from agentlightning.trainer import Trainer
from agentlightning.types import AgentLightningTask, AgentResource
from simple_agent import SimpleMathAgent # Import our agent from simple_agent.py

# A dummy task queue for illustration.
# In a real scenario, tasks would be loaded from a database or external source.
dummy_tasks = [
    AgentLightningTask(name="Addition 1", context="Add 10 and 5 (Expected: 15)"),
    AgentLightningTask(name="Addition 2", context="Add 7 and 3 (Expected: 10)"),
    AgentLightningTask(name="Addition 3", context="Add 2 and 8 (Expected: 11)"), # Incorrect expected
    AgentLightningTask(name="Addition 4", context="Add 12 and 18 (Expected: 30)"),
    AgentLightningTask(name="Invalid Task", context="Tell me a story (Expected: 0)"), # With new reward logic
]

# A simple "Algorithm" that just logs rewards for now.
# Real algorithms would update resources.
class DummyOptimizer:
    def __init__(self):
        self.rewards = []

    async def optimize_step(self, rollout_results: list[tuple[str, float]], resources_version: str):
        """
        Simulates an optimization step.
        rollout_results is a list of (rollout_id, reward) tuples.
        """
        for _id, reward in rollout_results:
            self.rewards.append(reward)
            print(f"Optimizer received reward for {_id}: {reward}")
        print(f"Current average reward: {sum(self.rewards) / len(self.rewards):.2f}")
        # In a real optimizer, this is where you'd update prompts, fine-tune models, etc.
        # And return updated resources if any.
        return {"version": resources_version, "resources": {}} # No updates for now

async def main():
    # Ensure your AgentLightningServer is running in a separate terminal:
    # agentlightning server start --host 0.0.0.0 --port 8000

    backend_url = "http://localhost:8000"
    num_workers = 2 # Run 2 agents in parallel

    # Initialize the Trainer
    trainer = Trainer(n_workers=num_workers)

    # Instantiate our agent
    math_agent = SimpleMathAgent()

    # Initialize a dummy optimizer
    optimizer = DummyOptimizer()

    print("--- Starting Trainer.fit loop (will run dummy tasks) ---")
    # In a real scenario, you'd integrate a proper algorithm here.
    # For demonstration, we'll manually feed tasks and process rewards.

    # Simulating a training loop for N iterations
    num_iterations = 5
    for i in range(num_iterations):
        print(f"\n----- Training Iteration {i+1}/{num_iterations} -----")
        current_tasks = dummy_tasks # In reality, tasks would be dynamically generated or sampled

        # The trainer dispatches tasks and gets results from workers running `math_agent`
        # In Agentic Lightening, a backend is usually responsible for fetching tasks for agents
        # For this simplified example, we'll simulate task distribution and collection.

        # Trainer.fit() is designed for actual algorithms. For manual demonstration,
        # we'll use trainer.dev() to simulate rollouts and collect results,
        # then manually pass to our optimizer.

        # This part is simplified. Actual trainer.fit handles task loading and algorithm.
        # For a basic demo, we'll just show the interaction.
        # In a full setup, the `Trainer` would talk to the `LightningStore` for tasks.

        # Let's use trainer.dev() for individual task execution to simulate the worker
        # behavior, then feed to the dummy optimizer.
        rollout_results = []
        for j, task in enumerate(current_tasks):
            print(f"Worker processing task {j+1}: {task.name}")
            # Use trainer.dev to execute the agent locally, simulating a worker processing a task.
            # This is a simplification; actual `trainer.fit` has a more complex internal loop.
            reward = await math_agent.training_rollout(task, f"iter_{i}_task_{j}", {})
            rollout_results.append((f"iter_{i}_task_{j}", reward))

        await optimizer.optimize_step(rollout_results, "v1.0") # Pass results to our optimizer

    print("\n--- Trainer.fit loop finished ---")
    print(f"Final rewards collected by optimizer: {optimizer.rewards}")


if __name__ == "__main__":
    # Ensure your simple_agent.py file is accessible (in the same directory or PYTHONPATH)
    asyncio.run(main())

Before running run_trainer.py:

  1. Make sure simple_agent.py is in the same directory or your Python path.
  2. Start the AgentLightningServer in a separate terminal:
    (agent-lightening-env) user@host:~/$ agentlightning server start --host 0.0.0.0 --port 8000
    
  3. Then, in your current terminal (with virtual env active), run the trainer script:
    (agent-lightening-env) user@host:~/$ python run_trainer.py
    

Expected Output (simplified, many lines omitted for brevity):

--- Starting Trainer.fit loop (will run dummy tasks) ---

----- Training Iteration 1/5 -----
Worker processing task 1: Addition 1
[iter_0_task_0] Agent received task: Addition 1 - 'Add 10 and 5 (Expected: 15)'
[iter_0_task_0] Mock LLM response: Result: 15
[iter_0_task_0] Correct! Actual: 15, Expected: 15
Worker processing task 2: Addition 2
[iter_0_task_1] Agent received task: Addition 2 - 'Add 7 and 3 (Expected: 10)'
[iter_0_task_1] Mock LLM response: Result: 10
[iter_0_task_1] Correct! Actual: 10, Expected: 10
Worker processing task 3: Addition 3
[iter_0_task_2] Agent received task: Addition 3 - 'Add 2 and 8 (Expected: 11)'
[iter_0_task_2] Mock LLM response: Result: 10
[iter_0_task_2] Incorrect. Actual: 10, Expected: 11
Worker processing task 4: Addition 4
[iter_0_task_3] Agent received task: Addition 4 - 'Add 12 and 18 (Expected: 30)'
[iter_0_task_3] Mock LLM response: Result: 30
[iter_0_task_3] Correct! Actual: 30, Expected: 30
Worker processing task 5: Invalid Task
[iter_0_task_4] Agent received task: Invalid Task - 'Tell me a story (Expected: 0)'
[iter_0_task_4] Mock LLM response: Result: I cannot perform that calculation.
[iter_0_task_4] Could not parse expected answer from task context or LLM result.
Optimizer received reward for iter_0_task_0: 1.0
Optimizer received reward for iter_0_task_1: 1.0
Optimizer received reward for iter_0_task_2: 0.0
Optimizer received reward for iter_0_task_3: 1.0
Optimizer received reward for iter_0_task_4: 0.0
Current average reward: 0.60

... (subsequent iterations) ...

--- Trainer.fit loop finished ---
Final rewards collected by optimizer: [1.0, 1.0, 0.0, 1.0, 0.0, 1.0, 1.0, 0.0, 1.0, 0.0, ...]

Exercise 2: Debugging with trainer.dev()

The trainer.dev() method is incredibly useful for isolated testing.

  1. Modify run_trainer.py to remove the Trainer.fit loop simulation.
  2. Instead, use trainer.dev() to run your SimpleMathAgent on a single, specific AgentLightningTask.
  3. Print the returned LitRollout object to inspect its contents (reward, traces, etc.).

Hint:

# ... (imports and SimpleMathAgent) ...

async def main_dev_test():
    # No need to start the server for trainer.dev()
    trainer = Trainer(n_workers=1) # Workers don't run in dev mode, but still needed

    test_agent = SimpleMathAgent()
    test_task = AgentLightningTask(name="Dev Test", context="Add 20 and 22 (Expected: 42)")

    print(f"--- Running dev test for task: {test_task.name} ---")
    rollout_result = await trainer.dev(agent=test_agent, task=test_task, resources={})

    print("\n--- Dev Test Result ---")
    print(f"Rollout ID: {rollout_result.rollout_id}")
    print(f"Final Reward: {rollout_result.final_reward}")
    print(f"Traces (if any): {rollout_result.traces}") # More on traces later
    print(f"Resources (if any): {rollout_result.resources}")

if __name__ == "__main__":
    asyncio.run(main_dev_test())

4. LightningStore: The Data Backbone

While not a direct component you interact with for execution like LitAgent or Trainer, the LightningStore is the persistent data layer that underpins the entire Agentic Lightening ecosystem. It’s typically implemented using a database (like MongoDB or Postgres).

The LightningStore serves as a central repository for:

  • Tasks: All the tasks that agents need to solve, usually organized by dataset or problem type.
  • Resources: Versioned configurations and data that agents depend on, such as prompt templates, tool descriptions, or fine-tuned model identifiers.
  • Rollout Traces: The detailed logs of agent interactions, including prompts, LLM responses, tool calls, observations, and the reward signals. These traces are critical for offline analysis and training.

The AgentLightningServer interacts with the LightningStore to pull tasks and store rollouts. The Trainer then queries the LightningStore to retrieve data for its optimization algorithms.

Understanding these core components – how your LitAgent interacts with tasks and returns rewards, how the AgentLightningServer orchestrates communication and resource management, and how the Trainer uses this feedback loop for optimization – forms the basis of building adaptive AI agents with Agentic Lightening. In the next chapters, we’ll delve deeper into integrating existing agents, designing reward functions, and implementing various optimization algorithms.