Skip to content

SlimeRecipe

Recipe dataclass for configuring slime GRPO training on Modal.

from modal_training_gym.train_recipes.slime_recipe.recipe import SlimeRecipe

Recipe dataclass for configuring slime GRPO training on Modal.

Inherits from: BaseTrainRecipe

Fields

Field	Type	Default	Description
`gpu_type`	`str`
`colocate`	`bool`
`tensor_model_parallel_size`	`int`
`sequence_parallel`	`bool`
`rollout_num_gpus_per_engine`	`int`
`num_rollout`	`int`
`rollout_batch_size`	`int`
`rollout_max_response_len`	`int`
`rollout_temperature`	`float`
`save_interval`	`int`
`recipe_type`	`RecipeType`	`slime`
`name`	`str`	`""`
`app_tags`	`dict`	`{}`
`environment`	`dict`	`{'PYTHONPATH': '/root/Megatron-LM/', 'CUDA_DEVICE_MAX_CONNECTIONS': '1', 'NCCL_NVLS_ENABLE': '1'}`
`async_mode`	`bool`	`False`
`wandb`	`WandbConfig \| None`	`None`
`image_overlay`	`collections.abc.Callable[[modal.image.Image], modal.image.Image] \| None`	`None`
`local_slime`	`str \| None`	`None`
`memory`	`int \| tuple[int, int] \| None`	`None`
`cloud`	`str \| None`	`None`
`region`	`str \| None`	`None`
`slime_model_script`	`str`	`""`
`source_hf_checkpoint`	`str \| None`	`None`
`megatron_conversion_hf_checkpoint`	`str \| None`	`None`
`patch_files`	`list[str]`	`[]`
`image_run_commands`	`list[str]`	`[]`
`image_env`	`dict[str, str]`	`{}`
`train_function_kwargs`	`dict[str, Any]`	`{}`
`actor_num_nodes`	`int`	`1`
`actor_num_gpus_per_node`	`int`	`8`
`rollout_num_gpus`	`int \| None`	`None`
`use_critic`	`bool`	`False`
`critic_num_nodes`	`int \| None`	`None`
`critic_num_gpus_per_node`	`int \| None`	`None`
`advantage_estimator`	`str`	`"grpo"`
`n_samples_per_prompt`	`int`	`2`
`eps_clip`	`float`	`0.2`
`eps_clip_high`	`float`	`0.28`
`use_kl_loss`	`bool`	`False`
`kl_loss_type`	`str`	`"low_var_kl"`
`kl_loss_coef`	`float`	`0.0`
`kl_coef`	`float`	`0.0`
`entropy_coef`	`float`	`0.0`
`calculate_per_token_loss`	`bool`	`False`
`ref_load`	`str`	`""`
`over_sampling_batch_size`	`int \| None`	`None`
`dynamic_sampling_filter_path`	`str \| None`	`None`
`balance_data`	`bool`	`False`
`rollout_shuffle`	`bool`	`True`
`rollout_top_p`	`float`	`1.0`
`rollout_stop_token_ids`	`list[int] \| None`	`None`
`sglang_mem_fraction_static`	`float`	`0.75`
`global_batch_size`	`int`	`16`
`lr`	`float`	`1e-06`
`lr_decay_style`	`str`	`"constant"`
`weight_decay`	`float`	`0.1`
`adam_beta1`	`float`	`0.9`
`adam_beta2`	`float`	`0.98`
`optimizer`	`str`	`"adam"`
`attention_dropout`	`float`	`0.0`
`hidden_dropout`	`float`	`0.0`
`attention_softmax_in_fp32`	`bool`	`True`
`accumulate_allreduce_grads_in_fp32`	`bool`	`True`
`use_distributed_optimizer`	`bool`	`False`
`recompute_granularity`	`str`	`"full"`
`recompute_method`	`str`	`"uniform"`
`recompute_num_layers`	`int`	`1`
`use_dynamic_batch_size`	`bool`	`True`
`max_tokens_per_gpu`	`int`	`9216`
`eval_interval`	`int \| None`	`None`
`n_samples_per_eval_prompt`	`int`	`4`
`eval_max_response_len`	`int`	`16384`
`eval_top_p`	`float`	`1.0`
`eval_config`	`dict \| None`	`None`
`save`	`str`	`"/checkpoints"`
`load`	`str`	`""`
`no_save_optim`	`bool`	`False`
`megatron_to_hf_mode`	`str`	`""`
`use_fault_tolerance`	`bool`	`True`
`update_weight_mode`	`str`	`"full"`
`update_weight_transport`	`str`	`"nccl"`
`update_weight_encoding`	`str`	`"indices"`
`update_weight_disk_dir`	`str`	`""`
`rm_type`	`str \| None`	`None`
`custom_rm_function`	`collections.abc.Callable \| None`	`None`
`custom_generate_function`	`collections.abc.Callable \| None`	`None`
`custom_rollout_log_function`	`collections.abc.Callable \| str \| None`	`None`
`custom_eval_rollout_log_function`	`collections.abc.Callable \| str \| None`	`None`
`rollout_function`	`collections.abc.Callable \| str \| None`	`None`
`custom_megatron_before_log_prob_hook`	`collections.abc.Callable \| str \| None`	`None`
`custom_megatron_before_train_step_hook`	`collections.abc.Callable \| str \| None`	`None`
`sglang_enable_dp_attention`	`bool`	`False`
`sglang_dp_size`	`int \| None`	`None`
`sglang_ep_size`	`int \| None`	`None`
`sglang_enable_dp_lm_head`	`bool`	`False`
`sglang_disable_custom_all_reduce`	`bool`	`False`
`sglang_cuda_graph_bs`	`list[int] \| None`	`None`
`sglang_max_running_requests`	`int \| None`	`None`
`extra_config`	`dict \| None`	`None`
`sglang_config`	`dict \| None`	`None`
`sglang_request_params`	`dict \| None`	`None`
`apply_chat_template_kwargs`	`dict \| str`	`""`
`train_env_vars`	`dict \| str \| None`	`None`
`multimodal_keys`	`dict \| str \| None`	`None`

Methods

`cli_args(self, dataset: 'DatasetConfig | None' = None, model: 'ModelConfig | None' = None) -> list[str]`

`get_base_recipe(model_config: modal_training_gym.common.models.base.ModelConfig) -> 'SlimeRecipe | None'`

Source: modal_training_gym/train_recipes/slime_recipe/recipe.py