Skip to content
GitHub
View on GitHub

Training Gym SDK

Open-source Python SDK for GRPO and RL post-training of LLMs on Modal GPU clusters — tutorials, API reference, and runnable examples.

📖 Documentation · API Reference

Modal Training Gym is a Python SDK for RL post-training on Modal—so you don’t have to hand-roll a launcher every time.

Pick a base model, a dataset, and an RL framework; the gym handles cluster topology, Ray/NCCL bring-up, volume mounts, checkpointing, and serving for eval and rollouts.

Install with pip:

Terminal window
pip install -q git+https://github.com/modal-projects/training-gym.git@main

Or pin it in pyproject.toml for uv:

training-gym = { git = "https://github.com/modal-projects/training-gym.git", branch = "main" }

Then import the building blocks from your own script:

from modal_training_gym import TrainConfig

This repository includes an AGENTS.md and a skills/ directory (symlinked to .claude/skills/) that teach Claude Code how to navigate the framework — W&B configuration, custom rollouts and generate functions, custom eval functions, and more.

Clone the repo and run claude from its root; the skills load automatically based on what you ask for.

Training Gym ships a dashboard that aggregates training runs, deployments, and eval results in one place. Deploy your own copy:

Terminal window
training-gym setup

Modal prints a URL where you can watch jobs in progress.

Gym Observability Dashboard

The fastest path through the API is the tutorials. Each one ships as a runnable .py and a paired .ipynb narrated cell-by-cell — the notebook is the canonical walkthrough. Each tutorial below has a one-click Launch button that opens the .ipynb in a fresh Modal Notebook; the first code cell pip-installs modal-training-gym into the notebook kernel, so the rest of the cells run as-is.

Difficulty is a rough self-assessed signal for where to start:

  • Beginner — single-node, introduces one framework concept.
  • Intermediate — 1–2 nodes, or wires up something non-default (custom reward, external script).
  • Advanced — ≥2 nodes with non-trivial parallelism (tensor-parallel, colocated RL, long context); assumes familiarity with the underlying framework.
TutorialSummaryDifficultyFrameworkLaunch
000_rl_basicsQwen3-4B haiku evaluation with verifiable rewards — serve, evaluate, train, compareBeginnerslimeOpen in Modal
001_sandboxesCode RL with Harbor hello-world and sandboxed verificationIntermediateslimeOpen in Modal
002_multiturnMulti-turn number-guessing RL with custom generate and reward functionsIntermediateslimeOpen in Modal
003_on_policy_distillationOn-policy distillation on math — Qwen3-8B teacher, Qwen3-4B studentIntermediateslimeOpen in Modal
005_dapoDAPO on math with Qwen3-4BAdvancedslimeOpen in Modal
006_audio_asrAudio GRPO on Qwen3-ASR-1.7B — transcribe LibriSpeech, reward −WERIntermediateslimeOpen in Modal
TutorialSummaryDifficultyFrameworkLaunch
001_qwen27bTrain Qwen3.6-27B on DAPO-math with GRPOAdvancedslimeOpen in Modal
000_qwen35bTrain Qwen3.6-35B-A3B on DAPO-math with GRPOAdvancedslimeOpen in Modal
TutorialSummaryDifficultyFrameworkLaunch
000_agent_sandboxBuild an LLM agent harness with a self-hosted model and Modal Sandbox tool executionBeginnerModal SandboxOpen in Modal
TutorialSummaryDifficultyFrameworkLaunch
000_kimi_k25Kimi K2.5 LoRA GRPO training on 128 GPUs with DAPO-Math-17kAdvancedmilesOpen in Modal
001_kimi_k26Kimi K2.6 LoRA GRPO training on 128 GPUs with DAPO-Math-17kAdvancedmilesOpen in Modal
002_glm_4_7GLM-4.7 355B MoE full-weight GSPO training on 64 GPUs with DAPO-Math-17kAdvancedslimeOpen in Modal

See tutorials/README.md for how to run the .py companions from the CLI and how to author a new tutorial.

Architecture diagram

Full docs are hosted at gym.modal.dev:

  • API Reference — every public class documented with types and defaults

Modal platform references:

MIT.