Build an agent harness with a self-hosted model + Modal Sandboxes
Build an LLM agent harness with a self-hosted model and Modal Sandbox tool execution
This tutorial builds an LLM agent loop from scratch using a self-hosted model served on Modal. The agent can use two tools — list a directory and read a file — and every tool call executes inside a Modal Sandbox, an isolated container with its own filesystem.
What you’ll learn:
- Deploy a model with
DeploymentConfigand get an OpenAI-compatible endpoint. - Use the OpenAI Python SDK pointed at your self-hosted endpoint (no API key needed).
- Create a Modal Sandbox with files pre-loaded via
filesystem.write_text. - Define tools that run shell commands inside the sandbox
with
sandbox.exec. - Wire everything into an agent loop with tool calling.
The entire stack runs on Modal — model serving, tool execution, and the sandbox — so you control cost, latency, and data privacy.
import json
import modalimport openai
from modal_training_gym import ( DeploymentConfig, Qwen3_8B,)from modal_training_gym.deploy_recipes import SglangRecipeDefine the tools
Section titled “Define the tools”We define two tools and a dispatcher function. Each tool runs a
command inside a sandbox via sandbox.exec and captures
stdout/stderr. The tool definitions follow the OpenAI
function-calling schema.
The dispatcher takes the sandbox as an argument so it can be called from the agent loop after the sandbox is created.
TOOL_DEFINITIONS = [ { "type": "function", "function": { "name": "list_directory", "description": ( "List the contents of a directory. Returns one entry per " "line. Directories have a trailing slash." ), "parameters": { "type": "object", "properties": { "path": { "type": "string", "description": "Absolute path to the directory.", }, }, "required": ["path"], }, }, }, { "type": "function", "function": { "name": "read_file", "description": "Read the full contents of a text file.", "parameters": { "type": "object", "properties": { "path": { "type": "string", "description": "Absolute path to the file.", }, }, "required": ["path"], }, }, },]
def dispatch_tool(sb, name: str, arguments: str) -> str: args = json.loads(arguments) if name == "list_directory": proc = sb.exec("ls", "-1F", args["path"]) stdout = proc.stdout.read() stderr = proc.stderr.read() proc.wait() return stdout if proc.returncode == 0 else f"Error: {stderr}"
elif name == "read_file": proc = sb.exec("cat", args["path"]) stdout = proc.stdout.read() stderr = proc.stderr.read() proc.wait() return stdout if proc.returncode == 0 else f"Error: {stderr}"
return f"Unknown tool: {name}"Deploy the model
Section titled “Deploy the model”DeploymentConfig.serve() launches an sglang-backed inference
server on Modal and returns a ModelDeployment with a live URL.
The server exposes an OpenAI-compatible /v1/chat/completions
endpoint, so we point the standard OpenAI Python SDK at it.
We pass extra_server_args={"--tool-call-parser": "qwen25"} to
the SglangRecipe so the server parses Qwen3’s tool-call
format into structured tool_calls in the response. Without
this, the model emits tool calls as raw text.
recipe = SglangRecipe( extra_server_args={"--tool-call-parser": "qwen25"},)deployment = DeploymentConfig( model=Qwen3_8B(), recipe=recipe,).serve()deployment.wait_until_ready()print(f"Model URL: {deployment.url}")
client = openai.OpenAI( base_url=f"{deployment.url}/v1", api_key="not-needed",)Create a sandbox with sample files
Section titled “Create a sandbox with sample files”We spin up a long-lived Sandbox running sleep infinity so it
stays alive while the agent issues commands. After creation we
write a small project tree into the sandbox’s filesystem.
sandbox_app = modal.App.lookup("agent-sandbox-tutorial", create_if_missing=True)
sandbox = modal.Sandbox.create( "sleep", "infinity", app=sandbox_app, image=modal.Image.debian_slim(python_version="3.12"), timeout=600,)
FILES = { "/repo/README.md": ( "# My Project\n\n" "A small Python utility that computes Fibonacci numbers.\n\n" "## Usage\n\n" "```bash\n" "python fib.py 10\n" "```\n" ), "/repo/fib.py": ( "import sys\n\n\n" "def fibonacci(n: int) -> list[int]:\n" ' """Return the first n Fibonacci numbers."""\n' " if n <= 0:\n" " return []\n" " seq = [0, 1]\n" " while len(seq) < n:\n" " seq.append(seq[-1] + seq[-2])\n" " return seq[:n]\n\n\n" 'if __name__ == "__main__":\n' " count = int(sys.argv[1]) if len(sys.argv) > 1 else 10\n" " print(fibonacci(count))\n" ), "/repo/tests/test_fib.py": ( "from fib import fibonacci\n\n\n" "def test_empty():\n" " assert fibonacci(0) == []\n\n\n" "def test_one():\n" " assert fibonacci(1) == [0]\n\n\n" "def test_ten():\n" " result = fibonacci(10)\n" " assert len(result) == 10\n" " assert result[-1] == 34\n" ), "/repo/pyproject.toml": ( '[project]\nname = "fib"\nversion = "0.1.0"\n' 'requires-python = ">=3.12"\n' ),}
for path, content in FILES.items(): sandbox.filesystem.write_text(content, path)
print(f"Sandbox created: {sandbox.object_id}")The agent loop
Section titled “The agent loop”The loop uses the OpenAI SDK’s tool-calling protocol:
- Send messages + tool definitions to the self-hosted model.
- If the model returns
tool_calls, execute each one in the sandbox and append the results astoolmessages. - Repeat until the model produces a final text response.
We cap iterations at 10 to avoid runaway loops. We also pass
enable_thinking=False in chat_template_kwargs so Qwen3
skips its internal chain-of-thought block and responds
directly — this keeps tool-call parsing clean.
MODEL = deployment.deployment_config.served_model_nameMAX_ITERATIONS = 10
messages = [ { "role": "user", "content": ( "Explore the /repo directory. List what files exist, read " "each one, and give me a summary of the project — what it " "does, how the code is structured, and whether the tests " "look correct." ), },]
print("Starting agent loop...\n")
for i in range(MAX_ITERATIONS): response = client.chat.completions.create( model=MODEL, max_tokens=4096, tools=TOOL_DEFINITIONS, messages=messages, extra_body={"chat_template_kwargs": {"enable_thinking": False}}, )
choice = response.choices[0]
if choice.finish_reason == "stop": print(f"Agent response:\n{choice.message.content}") break
messages.append(choice.message)
if choice.message.tool_calls: for tc in choice.message.tool_calls: print(f" [{i+1}] Calling {tc.function.name}({tc.function.arguments})") result = dispatch_tool(sandbox, tc.function.name, tc.function.arguments) print(f" → {len(result)} chars returned") messages.append( { "role": "tool", "tool_call_id": tc.id, "content": result, } )
else: print("Reached max iterations without a final response.")Clean up
Section titled “Clean up”Terminate the sandbox so it doesn’t keep running (and billing) after we’re done.
sandbox.terminate()print("Sandbox terminated.")Next steps
Section titled “Next steps”This tutorial showed how to combine a self-hosted model with sandbox tool execution — no external API keys required. The model runs on Modal, the tools run on Modal, and everything is under your control.
Ideas to extend this:
- Add a
run_commandtool so the agent can execute arbitrary shell commands (run tests, install packages). - Add a
write_filetool usingsandbox.filesystem.write_textso the agent can modify code. - Swap models — try
Qwen3_32Bfor harder tasks, orQwen3_4Bfor lower cost. - Snapshot the filesystem with
sandbox.snapshot_filesystem()to create a reusablemodal.Imagefrom the sandbox state.
Related API Reference
Section titled “Related API Reference”Source: tutorials/agent/000_agent_sandbox/000_agent_sandbox.py
| Open in Modal Notebook