Build an agent harness with a self-hosted model + Modal Sandboxes

Build an LLM agent harness with a self-hosted model and Modal Sandbox tool execution

This tutorial builds an LLM agent loop from scratch using a self-hosted model served on Modal. The agent can use two tools — list a directory and read a file — and every tool call executes inside a Modal Sandbox, an isolated container with its own filesystem.

What you’ll learn:

Deploy a model with DeploymentConfig and get an OpenAI-compatible endpoint.
Use the OpenAI Python SDK pointed at your self-hosted endpoint (no API key needed).
Create a Modal Sandbox with files pre-loaded via filesystem.write_text.
Define tools that run shell commands inside the sandbox with sandbox.exec.
Wire everything into an agent loop with tool calling.

The entire stack runs on Modal — model serving, tool execution, and the sandbox — so you control cost, latency, and data privacy.

import json

import modal
import openai

from modal_training_gym import (
    DeploymentConfig,
    Qwen3_8B,
)
from modal_training_gym.deploy_recipes import SglangRecipe

Define the tools

We define two tools and a dispatcher function. Each tool runs a command inside a sandbox via sandbox.exec and captures stdout/stderr. The tool definitions follow the OpenAI function-calling schema.

The dispatcher takes the sandbox as an argument so it can be called from the agent loop after the sandbox is created.

TOOL_DEFINITIONS = [
    {
        "type": "function",
        "function": {
            "name": "list_directory",
            "description": (
                "List the contents of a directory. Returns one entry per "
                "line. Directories have a trailing slash."
            ),
            "parameters": {
                "type": "object",
                "properties": {
                    "path": {
                        "type": "string",
                        "description": "Absolute path to the directory.",
                    },
                },
                "required": ["path"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "read_file",
            "description": "Read the full contents of a text file.",
            "parameters": {
                "type": "object",
                "properties": {
                    "path": {
                        "type": "string",
                        "description": "Absolute path to the file.",
                    },
                },
                "required": ["path"],
            },
        },
    },
]

def dispatch_tool(sb, name: str, arguments: str) -> str:
    args = json.loads(arguments)
    if name == "list_directory":
        proc = sb.exec("ls", "-1F", args["path"])
        stdout = proc.stdout.read()
        stderr = proc.stderr.read()
        proc.wait()
        return stdout if proc.returncode == 0 else f"Error: {stderr}"

    elif name == "read_file":
        proc = sb.exec("cat", args["path"])
        stdout = proc.stdout.read()
        stderr = proc.stderr.read()
        proc.wait()
        return stdout if proc.returncode == 0 else f"Error: {stderr}"

    return f"Unknown tool: {name}"

Deploy the model

DeploymentConfig.serve() launches an sglang-backed inference server on Modal and returns a ModelDeployment with a live URL. The server exposes an OpenAI-compatible /v1/chat/completions endpoint, so we point the standard OpenAI Python SDK at it.

We pass extra_server_args={"--tool-call-parser": "qwen25"} to the SglangRecipe so the server parses Qwen3’s tool-call format into structured tool_calls in the response. Without this, the model emits tool calls as raw text.

recipe = SglangRecipe(
    extra_server_args={"--tool-call-parser": "qwen25"},
)
deployment = DeploymentConfig(
    model=Qwen3_8B(),
    recipe=recipe,
).serve()
deployment.wait_until_ready()
print(f"Model URL: {deployment.url}")

client = openai.OpenAI(
    base_url=f"{deployment.url}/v1",
    api_key="not-needed",
)

Create a sandbox with sample files

We spin up a long-lived Sandbox running sleep infinity so it stays alive while the agent issues commands. After creation we write a small project tree into the sandbox’s filesystem.

sandbox_app = modal.App.lookup("agent-sandbox-tutorial", create_if_missing=True)

sandbox = modal.Sandbox.create(
    "sleep", "infinity",
    app=sandbox_app,
    image=modal.Image.debian_slim(python_version="3.12"),
    timeout=600,
)

FILES = {
    "/repo/README.md": (
        "# My Project\n\n"
        "A small Python utility that computes Fibonacci numbers.\n\n"
        "## Usage\n\n"
        "```bash\n"
        "python fib.py 10\n"
        "```\n"
    ),
    "/repo/fib.py": (
        "import sys\n\n\n"
        "def fibonacci(n: int) -> list[int]:\n"
        '    """Return the first n Fibonacci numbers."""\n'
        "    if n <= 0:\n"
        "        return []\n"
        "    seq = [0, 1]\n"
        "    while len(seq) < n:\n"
        "        seq.append(seq[-1] + seq[-2])\n"
        "    return seq[:n]\n\n\n"
        'if __name__ == "__main__":\n'
        "    count = int(sys.argv[1]) if len(sys.argv) > 1 else 10\n"
        "    print(fibonacci(count))\n"
    ),
    "/repo/tests/test_fib.py": (
        "from fib import fibonacci\n\n\n"
        "def test_empty():\n"
        "    assert fibonacci(0) == []\n\n\n"
        "def test_one():\n"
        "    assert fibonacci(1) == [0]\n\n\n"
        "def test_ten():\n"
        "    result = fibonacci(10)\n"
        "    assert len(result) == 10\n"
        "    assert result[-1] == 34\n"
    ),
    "/repo/pyproject.toml": (
        '[project]\nname = "fib"\nversion = "0.1.0"\n'
        'requires-python = ">=3.12"\n'
    ),
}

for path, content in FILES.items():
    sandbox.filesystem.write_text(content, path)

print(f"Sandbox created: {sandbox.object_id}")

The agent loop

The loop uses the OpenAI SDK’s tool-calling protocol:

Send messages + tool definitions to the self-hosted model.
If the model returns tool_calls, execute each one in the sandbox and append the results as tool messages.
Repeat until the model produces a final text response.

We cap iterations at 10 to avoid runaway loops. We also pass enable_thinking=False in chat_template_kwargs so Qwen3 skips its internal chain-of-thought block and responds directly — this keeps tool-call parsing clean.

MODEL = deployment.deployment_config.served_model_name
MAX_ITERATIONS = 10

messages = [
    {
        "role": "user",
        "content": (
            "Explore the /repo directory. List what files exist, read "
            "each one, and give me a summary of the project — what it "
            "does, how the code is structured, and whether the tests "
            "look correct."
        ),
    },
]

print("Starting agent loop...\n")

for i in range(MAX_ITERATIONS):
    response = client.chat.completions.create(
        model=MODEL,
        max_tokens=4096,
        tools=TOOL_DEFINITIONS,
        messages=messages,
        extra_body={"chat_template_kwargs": {"enable_thinking": False}},
    )

    choice = response.choices[0]

    if choice.finish_reason == "stop":
        print(f"Agent response:\n{choice.message.content}")
        break

    messages.append(choice.message)

    if choice.message.tool_calls:
        for tc in choice.message.tool_calls:
            print(f"  [{i+1}] Calling {tc.function.name}({tc.function.arguments})")
            result = dispatch_tool(sandbox, tc.function.name, tc.function.arguments)
            print(f"       → {len(result)} chars returned")
            messages.append(
                {
                    "role": "tool",
                    "tool_call_id": tc.id,
                    "content": result,
                }
            )

else:
    print("Reached max iterations without a final response.")

Clean up

Terminate the sandbox so it doesn’t keep running (and billing) after we’re done.

sandbox.terminate()
print("Sandbox terminated.")

Next steps

This tutorial showed how to combine a self-hosted model with sandbox tool execution — no external API keys required. The model runs on Modal, the tools run on Modal, and everything is under your control.

Ideas to extend this:

Add a run_command tool so the agent can execute arbitrary shell commands (run tests, install packages).
Add a write_file tool using sandbox.filesystem.write_text so the agent can modify code.
Swap models — try Qwen3_32B for harder tasks, or Qwen3_4B for lower cost.
Snapshot the filesystem with sandbox.snapshot_filesystem() to create a reusable modal.Image from the sandbox state.

Source: tutorials/agent/000_agent_sandbox/000_agent_sandbox.py | Open in Modal Notebook