Skip to content
GitHub
View on GitHub

SlimeConfig

API reference for SlimeConfig

from modal_training_gym.frameworks.slime.config import SlimeConfig

slime GRPO training configuration.

FieldTypeDefaultDescription
environmentdict{'PYTHONPATH': '/root/Megatron-LM/', 'CUDA_DEVICE_MAX_CONNECTIONS': '1', 'NCCL_NVLS_ENABLE': '1'}Injected into the Ray job runtime env.
async_modeboolFalseWhen True, uses train_async.py. Default False.
slime_model_scriptstr""Shell script path relative to /root/slime. Default "".
model`ModelConfigurationNone`None
FieldTypeDefaultDescription
dataset`DatasetConfigNone`None
wandb`WandbConfigNone`None
modal`ModalConfigNone`None
app_tagsdict{}Extra Modal app tags. Default {}.
FieldTypeDefaultDescription
actor_num_nodes`intNone`None
actor_num_gpus_per_node`intNone`None
colocate`boolNone`None
rollout_num_gpus`intNone`None
rollout_num_gpus_per_engineint1GPUs per SGLang rollout engine. Default 1.
tensor_model_parallel_size`intNone`None
use_criticboolFalseUse a separate critic network. Default False.
critic_num_nodes`intNone`None
critic_num_gpus_per_node`intNone`None
FieldTypeDefaultDescription
advantage_estimatorstr"grpo"Advantage estimation method. Default "grpo".
n_samples_per_promptint2Rollout samples per prompt. Default 2.
eps_clipfloat0.2PPO clipping epsilon. Default 0.2.
eps_clip_highfloat0.28Asymmetric high-side PPO clip. Default 0.28.
use_kl_lossboolTrueEnable KL divergence loss. Default False.
kl_loss_typestr"low_var_kl"KL loss variant. Default "low_var_kl".
kl_loss_coeffloat0.0KL loss coefficient. Default 0.0.
entropy_coeffloat0.0Entropy bonus coefficient. Default 0.0.
ref_loadstr""Reference model checkpoint path. Default "".
FieldTypeDefaultDescription
num_rolloutint1Number of rollout episodes per step. Default 1.
rollout_batch_sizeint8Batch size for rollout generation. Default 8.
rollout_max_response_lenint8192Maximum response length during rollout. Default 8192.
rollout_temperaturefloat1.0Sampling temperature for rollouts. Default 1.0.
sglang_mem_fraction_staticfloat0.7SGLang static memory fraction. Default 0.7.
FieldTypeDefaultDescription
global_batch_sizeint16Global batch size. Default 16.
lrfloat1e-06Learning rate. Default 1e-6.
lr_decay_stylestr"constant"LR decay schedule. Default "constant".
weight_decayfloat0.1Weight decay. Default 0.0.
adam_beta1float0.9Adam beta1. Default 0.9.
adam_beta2float0.98Adam beta2. Default 0.95.
optimizerstr"adam"Optimizer name. Default "adam".
namestr""
FieldTypeDefaultDescription
attention_dropoutfloat0.0Attention dropout. Default 0.0.
hidden_dropoutfloat0.0Hidden layer dropout. Default 0.0.
attention_softmax_in_fp32boolTrueCompute softmax in FP32. Default True.
accumulate_allreduce_grads_in_fp32boolTrueAccumulate allreduce grads in FP32. Default True.
recompute_granularitystr"full"Activation recomputation granularity. Default "".
recompute_methodstr"uniform"Activation recomputation method. Default "".
recompute_num_layersint1Number of layers to recompute. Default None.
FieldTypeDefaultDescription
use_dynamic_batch_sizeboolTrueUse dynamic batch sizing. Default False.
max_tokens_per_gpuint9216Max tokens per GPU for dynamic batching. Default None.
FieldTypeDefaultDescription
eval_intervalint20Evaluation interval in training steps. Default 20.
n_samples_per_eval_promptint4Eval samples per prompt. Default 4.
eval_max_response_lenint16384Max response length for eval. Default 16384.
eval_top_pfloat1.0Top-p sampling for eval. Default 1.0.
eval_config`dictNone`None
FieldTypeDefaultDescription
savestr"/checkpoints"Enable checkpoint saving. Default True.
save_intervalint1000Checkpoint save interval. Default 1000.
megatron_to_hf_modestr"bridge"Checkpoint conversion mode ("bridge" or "raw"). Default "bridge".
use_fault_toleranceboolTrueEnable fault tolerance. Default False.
FieldTypeDefaultDescription
custom_rm_pathstr""Python import path for custom reward function. Default "".
FieldTypeDefaultDescription
sglang_config`dictNone`None
custom_config_path`dictNone`None
apply_chat_template_kwargsstr""Extra kwargs for chat template. Default None.
FieldTypeDefaultDescription
image_run_commandslist[str][]
local_python_sourceslist[str][]
gpu_type`strNone`None
sequence_parallel`boolNone`None
rm_typestr"math"

build_app(self, *, name: str | None = None, modal: modal_training_gym.frameworks.slime.config.ModalConfig | None = None) -> 'App'

Section titled “build_app(self, *, name: str | None = None, modal: modal_training_gym.frameworks.slime.config.ModalConfig | None = None) -> 'App'”

slime CLI arguments derived from this config.

Materialize the training data.

to_dataset_config(self) -> 'DatasetConfig'

Section titled “to_dataset_config(self) -> 'DatasetConfig'”

Extract dataset-related slime flags back into a DatasetConfig.

Extract model-related slime flags back into a ModelConfiguration.

Extract wandb-related slime flags back into a WandbConfig.

Total Modal cluster nodes required by this config.

Source: modal_training_gym/frameworks/slime/config.py