Skip to content
GitHub
View on GitHub

HarborFrameworkConfig

API reference for HarborFrameworkConfig

from modal_training_gym.frameworks.harbor.config import HarborFrameworkConfig

Harbor + Miles configuration for sandbox-based RL training.

Inherits from: MilesFrameworkConfig

FieldTypeDefaultDescription
agent_import_pathstr""Python import path for the Harbor agent class. Default "".
agent_model_namestr"model"Model name passed to the agent. Default "model".
agent_kwargsdict[str, Any]{}Extra keyword arguments for agent construction. Default {}.
FieldTypeDefaultDescription
environment_import_path`strNone`None
sandbox_timeout_secsint1800Maximum sandbox execution time in seconds. Default 1800 (30 min).
sandbox_idle_timeout_secsint300Sandbox idle timeout in seconds. Default 300 (5 min).
FieldTypeDefaultDescription
task_rootstr"/data/tasks"Root directory for Harbor task directories on the data volume. Default "/data/tasks".
task_globstr"*"Glob pattern for discovering task directories. Default "*".
instruction_pathstr"instruction.md"Relative path to the instruction file within each task dir. Default "instruction.md".
FieldTypeDefaultDescription
input_keystr"prompt"Dataset column key for model input prompts. Default "prompt".
label_keystr"metadata"Dataset column key for metadata/labels. Default "metadata".
apply_chat_templateboolTrueApply the model’s chat template to inputs. Default True.
enable_thinkingboolTrueEnable thinking/reasoning mode in the model. Default True.
rollout_shuffleboolTrueShuffle data during rollout generation. Default True.
FieldTypeDefaultDescription
recompute_granularitystr"selective"Activation recomputation granularity. Default "selective".
FieldTypeDefaultDescription
max_position_embeddingsint32768Maximum sequence position embeddings. Default 32768.
untie_embeddings_and_output_weightsboolTrueUntie input embeddings from output projection. Default True.
no_masked_softmax_fusionboolTrueDisable masked softmax fusion. Default True.
FieldTypeDefaultDescription
num_rolloutint200Number of rollout episodes per training step. Default 200.
rollout_batch_sizeint64Batch size for rollout generation. Default 64.
rollout_max_response_lenint1024Maximum response length during rollout. Default 1024.
sglang_mem_fraction_staticfloat0.7SGLang static memory fraction. Default 0.7.
FieldTypeDefaultDescription
train_itersint50Total training iterations. Default 50.
global_batch_sizeint512Global batch size across all ranks. Default 512.
FieldTypeDefaultDescription
eval_intervalint10Evaluation interval in iterations. Default 10.
save_intervalint10Checkpoint save interval in iterations. Default 10.
FieldTypeDefaultDescription
miles_imagestr"radixark/miles:dev-202604201238"Docker image with Miles trainer. Default "radixark/miles:dev-202604201238".
miles_src_commitstr"9a003644739f4e6dd509e2e8337e8ae7e571941c"Miles source commit with --custom-agent-function-path support.
harbor_install_commandstr"uv pip install --system git+https://github.com/laude-institute/harbor.git"Command to install Harbor in the image.
FieldTypeDefaultDescription
image_run_commandslist[str][]Extra commands appended to the image build. Default [].
n_nodesint1Number of cluster nodes. Default 1.
app_tagsdict{}Extra Modal app tags. Default {}.
FieldTypeDefaultDescription
recipe_argsstr""Raw Miles CLI flag block. Model architecture flags and per-run overrides go here. Values override typed defaults. Default "".
extra_argsstr""Extra flags appended after recipe_args. Default "".
custom_config_yamlstr""Inline YAML overrides passed via --custom-config-path. Default "".
FieldTypeDefaultDescription
colocateboolTrueReuse a single compute pool for actor + rollout. Default True.
actor_nodes`intNone`None
rollout_num_gpus`intNone`None
FieldTypeDefaultDescription
advantage_estimatorstr"grpo"Advantage estimation method. Default "grpo".
eps_clipfloat0.2PPO clipping epsilon. Default 0.2.
clip_gradfloat1.0Gradient clipping norm. Default 1.0.
kl_coeffloat0.0KL divergence penalty coefficient. Default 0.0.
normalize_advantagesboolFalseNormalize advantages. Default False.
seedint1234Random seed. Default 1234.
FieldTypeDefaultDescription
lrfloat1e-06Learning rate. Default 1e-6.
lr_decay_stylestr"constant"LR decay schedule. Default "constant".
weight_decayfloat0.0Weight decay. Default 0.0.
adam_beta1float0.9Adam beta1. Default 0.9.
adam_beta2float0.95Adam beta2. Default 0.95.
FieldTypeDefaultDescription
micro_batch_sizeint1Micro batch size per GPU. Default 1.
n_samples_per_promptint8Rollout samples per prompt. Default 8.
FieldTypeDefaultDescription
bf16boolTrueEnable BF16 training. Default True.
attention_softmax_in_fp32boolTrueCompute attention softmax in FP32. Default True.
FieldTypeDefaultDescription
attention_dropoutfloat0.0Attention dropout rate. Default 0.0.
hidden_dropoutfloat0.0Hidden layer dropout rate. Default 0.0.
FieldTypeDefaultDescription
rollout_temperaturefloat1.0Sampling temperature for rollouts. Default 1.0.
FieldTypeDefaultDescription
no_save_optimboolTrueSkip saving optimizer state. Default True.

Emit Miles CLI flags, skipping Harbor-specific fields.

Shlex-parse recipe_args + extra_args into a flat argv list.

resolved_rollout_num_gpus(self) -> 'int | None'

Section titled “resolved_rollout_num_gpus(self) -> 'int | None'”

Source: modal_training_gym/frameworks/harbor/config.py