Custom HuggingFace model with ms-swift LoRA SFT on Modal
Custom HuggingFace model (SmolLM2-135M) LoRA SFT — inline `ModelConfiguration` subclass, no catalog entry
What this tutorial teaches. How to train a model that isn’t in
the built-in catalog. The built-in classes (Qwen3_4B, GLM_4_7,
Llama2_7B, KimiK2_5, Qwen3_32B) are concrete
HFModelConfiguration subclasses, but nothing stops you from
writing your own inline — no registry, no library fork, just a
subclass in your tutorial file.
What it actually runs. LoRA SFT of
HuggingFaceTB/SmolLM2-135M
(a 135M-param Llama-family model) on a 4-example slice of GSM8K on
1×H100. Not a serious run — the goal is to prove end-to-end that the
custom-model seam works, cheaply. Bump split="train[:4]" and
num_train_epochs for a real run.
What to watch. W&B project custom-hf-smoke. train/loss
should drop within the first 1–2 steps since the dataset is tiny.
See quickstart for the shared
primitives; ms_swift_glm_4_7_gsm8k
for a full-scale ms-swift run with 4-D parallelism.
import modal
from modal_training_gym.common.dataset import DatasetConfigfrom modal_training_gym.common.models import ModelConfigurationfrom modal_training_gym.common.wandb import WandbConfigfrom modal_training_gym.frameworks.ms_swift import ( MsSwiftConfig, MsSwiftFrameworkConfig,)from modal_training_gym.frameworks.ms_swift.config import HF_CACHE_PATHDefine the custom model
Section titled “Define the custom model”Subclass ModelConfiguration with:
model_name— the HF repo id. Every framework reads this off your subclass to know what to tokenize/train.download_model()— how to materialize the weights. For HF-hosted models this is one line:snapshot_download(repo_id=...). (If you’d rather not write this body, subclassHFModelConfigurationinstead — it implementsdownload_model()for you.)
That’s the whole seam. The subclass is a regular Python class; you can stack architecture metadata, tokenizer overrides, or custom download logic on top as needed.
from modal_training_gym.common.models import ModelTrainingConfig
class SmolLM2_135M(ModelConfiguration): model_name = "HuggingFaceTB/SmolLM2-135M" training = ModelTrainingConfig( gpu_type="H100", tensor_model_parallel_size=1, pipeline_model_parallel_size=1, lora_rank=8, lora_alpha=16, )
def download_model(self) -> None: from huggingface_hub import snapshot_download
snapshot_download(repo_id=self.model_name)Define the dataset
Section titled “Define the dataset”ms-swift wants a JSONL file of chat messages
({"messages": [{"role": "user", ...}, {"role": "assistant", ...}]}).
The train[:4] slice keeps this run cheap — four examples is enough
to prove the custom-model seam wires up end-to-end.
class TinyGSM8KDataset(DatasetConfig): def __init__( self, hf_cache_root, data_folder="gsm8k_tiny", hf_dataset="openai/gsm8k", split="train[:4]", input_col="question", output_col="answer", ): self._hf_cache_root = str(hf_cache_root) self._data_folder = data_folder self._hf_dataset = hf_dataset self._split = split self._input_col = input_col self._output_col = output_col self.prompt_data = ( f"{self._hf_cache_root}/msswift-data/{data_folder}/training.jsonl" )
def prepare(self): import json import os
from datasets import load_dataset
output_dir = f"{self._hf_cache_root}/msswift-data/{self._data_folder}" os.makedirs(output_dir, exist_ok=True)
try: ds = load_dataset(self._hf_dataset, split=self._split) except ValueError: ds = load_dataset(self._hf_dataset, "joy/initial-setup", split=self._split)
out_path = f"{output_dir}/training.jsonl" with open(out_path, "w") as f: for row in ds: messages = [ {"role": "user", "content": row[self._input_col]}, {"role": "assistant", "content": row[self._output_col]}, ] f.write(json.dumps({"messages": messages}) + "\n") print(f"Wrote {out_path}")Define the experiment
Section titled “Define the experiment”The custom SmolLM2_135M plugs into MsSwiftConfig exactly like a
built-in — the launcher only reads config.model.model_name,
which the subclass provides. Everything is single-parallel (1
node, 1 GPU, all parallelism axes = 1) since the model is tiny.
Note on image=. The default ms-swift image pins Python 3.11,
but modal-training-gym requires the local and remote Python to
match for serialized functions (serialized=True). We pin the
NGC PyTorch 25.01 image instead — it ships py312, which matches
this repo’s .python-version.
swift_framework_config = MsSwiftFrameworkConfig( image="nvcr.io/nvidia/pytorch:25.01-py3", gpu="H100", n_nodes=1, gpus_per_node=1, num_train_epochs=1, global_batch_size=1, max_length=512, save_interval=10, eval_iters=0,)
my_training_run = MsSwiftConfig( dataset=TinyGSM8KDataset(HF_CACHE_PATH), model=SmolLM2_135M(), wandb=WandbConfig(project="custom-hf-smoke"), framework_config=swift_framework_config,)Build and run
Section titled “Build and run”See quickstart for the
pattern.
app = my_training_run.build_app()Related API Reference
Section titled “Related API Reference”Source: tutorials/sft/ms_swift_custom_hf/ms_swift_custom_hf.py
| Open in Modal Notebook