Skip to content
GitHub
View on GitHub

MsSwiftFrameworkConfig

API reference for MsSwiftFrameworkConfig

from modal_training_gym.frameworks.ms_swift.config import MsSwiftFrameworkConfig

ms-swift Megatron SFT configuration, including Modal infrastructure.

FieldTypeDefaultDescription
imagestr"modelscope-registry.us-west-1.cr.aliyuncs.com/modelscope-repo/modelscope:ubuntu22.04-cuda12.8.1-py311-torch2.8.0-vllm0.11.0-modelscope1.31.0-swift3.10.3"Docker image for the training container.
transformers_versionstr"4.57.3"Transformers version to reinstall for compatibility. Default "4.57.3".
app_tagsdict{}Extra Modal app tags. Default {}.
environmentdict[str, str]{'PYTORCH_CUDA_ALLOC_CONF': 'expandable_segments:True', 'CUDA_DEVICE_MAX_CONNECTIONS': '1'}Environment variables for the training container.
n_nodesint4Number of Modal cluster nodes. Default 4.
gpus_per_nodeint8GPUs per node. Default 8.
FieldTypeDefaultDescription
tuner_typestr"lora"Fine-tuning method (e.g. "lora"). Default "lora".
split_dataset_ratiofloat0.01Train/eval split ratio. Default 0.01.
perform_initializationboolTrueEmit --perform_initialization flag. Default True.
use_distributed_optimizerboolTrueEnable distributed optimizer. Default True.
FieldTypeDefaultDescription
global_batch_sizeint8Global batch size across all ranks. Default 8.
packingboolTrueEnable sequence packing. Default True.
padding_freeboolTrueEnable padding-free attention. Default True.
use_precision_aware_optimizerboolTrueEnable precision-aware optimizer. Default True.
FieldTypeDefaultDescription
train_iters`intNone`None
num_train_epochsint4Number of training epochs. Default 4.
lrfloat0.0001Learning rate. Default 1e-4.
lr_warmup_fractionfloat0.05Fraction of training for LR warmup. Default 0.05.
lr_decay_stylestr"cosine"LR decay schedule. Default "cosine".
min_lrfloat1e-05Minimum learning rate. Default 1e-5.
weight_decayfloat0.1Weight decay coefficient. Default 0.1.
clip_gradfloat1.0Gradient clipping norm. Default 1.0.
adam_beta1float0.9Adam beta1. Default 0.9.
adam_beta2float0.95Adam beta2. Default 0.95.
seedint42Random seed. Default 42.
FieldTypeDefaultDescription
bf16boolTrueEnable BF16 training. Default True.
recompute_granularitystr"selective"Activation recomputation granularity. Default "selective".
recompute_modulesstr"core_attn"Modules to recompute. Default "core_attn".
attention_softmax_in_fp32boolTrueCompute attention softmax in FP32. Default True.
FieldTypeDefaultDescription
max_lengthint2048Maximum sequence length. Default 2048.
attention_backendstr"flash"Attention implementation. Default "flash".
FieldTypeDefaultDescription
dataset_num_procint8Dataset preprocessing workers. Default 8.
dataloader_num_workersint4DataLoader workers. Default 4.
save_intervalint50Checkpoint save interval (iterations). Default 50.
no_save_optimboolTrueSkip saving optimizer state. Default True.
no_save_rngboolTrueSkip saving RNG state. Default True.
save_safetensorsboolTrueSave in safetensors format. Default True.
use_hfint1Use HuggingFace checkpoint format. Default 1.
add_versionboolFalseAdd version sub-directory to save path. Default False.
FieldTypeDefaultDescription
log_intervalint1Logging interval (iterations). Default 1.
eval_itersint10Evaluation iterations. Default 10.
eval_intervalint50Evaluation interval (iterations). Default 50.
FieldTypeDefaultDescription
lora_dropoutfloat0.05LoRA dropout. Default 0.05.

from_toml(path: 'str | Path') -> "'MsSwiftFrameworkConfig'"

Section titled “from_toml(path: 'str | Path') -> "'MsSwiftFrameworkConfig'"”

Source: modal_training_gym/frameworks/ms_swift/config.py