Qwen3_6_27b_Recipe
Qwen3.6-27B dense hybrid model on 1×8×H100 with TP4×PP2, colocated GRPO.
from modal_training_gym.train_recipes.slime_recipe.qwen3_6_27b import Qwen3_6_27b_RecipeQwen3.6-27B dense hybrid model on 1×8×H100 with TP4×PP2, colocated GRPO.
Inherits from: SlimeRecipe, BaseTrainRecipe
Fields
Section titled “Fields”| Field | Type | Default | Description |
|---|---|---|---|
gpu_type | str | "H100" | |
colocate | bool | True | |
tensor_model_parallel_size | int | 4 | |
sequence_parallel | bool | True | |
rollout_num_gpus_per_engine | int | 4 | |
num_rollout | int | 1 | |
rollout_batch_size | int | 32 | |
rollout_max_response_len | int | 8192 | |
rollout_temperature | float | 1.0 | |
save_interval | int | 20 | |
recipe_type | RecipeType | slime | |
name | str | "" | |
app_tags | dict | {} | |
environment | dict | {'PYTHONPATH': '/root/Megatron-LM/', 'CUDA_DEVICE_MAX_CONNECTIONS': '1', 'NCCL_NVLS_ENABLE': '1'} | |
async_mode | bool | False | |
wandb | WandbConfig | None | None | |
image_overlay | collections.abc.Callable[[modal.image.Image], modal.image.Image] | None | None | |
local_slime | str | None | None | |
memory | int | tuple[int, int] | None | None | |
cloud | str | None | None | |
region | str | None | None | |
slime_model_script | str | "" | |
source_hf_checkpoint | str | None | None | |
megatron_conversion_hf_checkpoint | str | None | None | |
patch_files | list[str] | [] | |
image_run_commands | list[str] | [] | |
image_env | dict[str, str] | {} | |
train_function_kwargs | dict[str, int] | {'ephemeral_disk': 1048576} | |
actor_num_nodes | int | 1 | |
actor_num_gpus_per_node | int | 8 | |
rollout_num_gpus | int | None | None | |
use_critic | bool | False | |
critic_num_nodes | int | None | None | |
critic_num_gpus_per_node | int | None | None | |
advantage_estimator | str | "grpo" | |
n_samples_per_prompt | int | 4 | |
eps_clip | float | 0.2 | |
eps_clip_high | float | 0.28 | |
use_kl_loss | bool | False | |
kl_loss_type | str | "low_var_kl" | |
kl_loss_coef | float | 0.0 | |
kl_coef | float | 0.0 | |
entropy_coef | float | 0.0 | |
calculate_per_token_loss | bool | True | |
ref_load | str | "" | |
over_sampling_batch_size | int | None | None | |
dynamic_sampling_filter_path | str | None | None | |
balance_data | bool | True | |
rollout_shuffle | bool | True | |
rollout_top_p | float | 1.0 | |
rollout_stop_token_ids | list[int] | None | None | |
sglang_mem_fraction_static | float | 0.75 | |
global_batch_size | int | 128 | |
lr | float | 1e-06 | |
lr_decay_style | str | "constant" | |
weight_decay | float | 0.1 | |
adam_beta1 | float | 0.9 | |
adam_beta2 | float | 0.98 | |
optimizer | str | "adam" | |
attention_dropout | float | 0.0 | |
hidden_dropout | float | 0.0 | |
attention_softmax_in_fp32 | bool | True | |
accumulate_allreduce_grads_in_fp32 | bool | False | |
use_distributed_optimizer | bool | True | |
recompute_granularity | str | "full" | |
recompute_method | str | "uniform" | |
recompute_num_layers | int | 1 | |
use_dynamic_batch_size | bool | True | |
max_tokens_per_gpu | int | 8192 | |
eval_interval | int | None | 20 | |
n_samples_per_eval_prompt | int | 4 | |
eval_max_response_len | int | 4096 | |
eval_top_p | float | 1.0 | |
eval_config | dict | None | None | |
save | str | "/checkpoints" | |
load | str | "" | |
no_save_optim | bool | False | |
megatron_to_hf_mode | str | "" | |
use_fault_tolerance | bool | True | |
update_weight_mode | str | "full" | |
update_weight_transport | str | "nccl" | |
update_weight_encoding | str | "indices" | |
update_weight_disk_dir | str | "" | |
rm_type | str | None | None | |
custom_rm_function | collections.abc.Callable | None | None | |
custom_generate_function | collections.abc.Callable | None | None | |
custom_rollout_log_function | collections.abc.Callable | str | None | None | |
custom_eval_rollout_log_function | collections.abc.Callable | str | None | None | |
rollout_function | collections.abc.Callable | str | None | None | |
custom_megatron_before_log_prob_hook | collections.abc.Callable | str | None | None | |
custom_megatron_before_train_step_hook | collections.abc.Callable | str | None | None | |
sglang_enable_dp_attention | bool | False | |
sglang_dp_size | int | None | None | |
sglang_ep_size | int | None | None | |
sglang_enable_dp_lm_head | bool | False | |
sglang_disable_custom_all_reduce | bool | True | |
sglang_cuda_graph_bs | list[int] | None | None | |
sglang_max_running_requests | int | None | 512 | |
extra_config | dict | None | None | |
sglang_config | dict | None | None | |
sglang_request_params | dict | None | None | |
apply_chat_template_kwargs | dict | str | "" | |
train_env_vars | dict | str | None | None | |
multimodal_keys | dict | str | None | None | |
pipeline_model_parallel_size | int | 2 | |
context_parallel_size | int | 1 | |
expert_model_parallel_size | int | 1 | |
expert_tensor_parallel_size | int | 1 | |
optimizer_cpu_offload | bool | True | |
overlap_cpu_optimizer_d2h_h2d | bool | True | |
use_precision_aware_optimizer | bool | True | |
attention_backend | str | "flash" |
Methods
Section titled “Methods”cli_args(self, dataset: 'DatasetConfig | None' = None, model: 'ModelConfig | None' = None) -> list[str]
Section titled “cli_args(self, dataset: 'DatasetConfig | None' = None, model: 'ModelConfig | None' = None) -> list[str]”get_base_recipe(model_config: modal_training_gym.common.models.base.ModelConfig) -> 'SlimeRecipe | None'
Section titled “get_base_recipe(model_config: modal_training_gym.common.models.base.ModelConfig) -> 'SlimeRecipe | None'”Source: modal_training_gym/train_recipes/slime_recipe/qwen3_6_27b.py