DatasetConfig
Section titled “DatasetConfig”from modal_training_gym.common.dataset import DatasetConfigDataset configuration shared across training frameworks.
Fields
Section titled “Fields”| Field | Type | Default | Description |
|---|---|---|---|
prompt_data | str | "" | Path to the training data file (e.g. a .parquet file on the data volume). Default "". |
eval_prompt_data | `list[str] | str | None` |
input_key | str | "" | Column/key name for model input in the dataset. Default "". |
label_key | str | "" | Column/key name for labels/targets in the dataset. Default "". |
apply_chat_template | bool | True | Whether to apply the model’s chat template to inputs. Default True. |
rollout_shuffle | bool | True | Whether to shuffle data during rollout generation. Default True. |
Methods
Section titled “Methods”prepare(self) -> 'None'
Section titled “prepare(self) -> 'None'”Download and/or preprocess the dataset into the data volume.
Related Tutorials
Section titled “Related Tutorials”- Shared concepts: config containers, framework factories, volume layout, running the pipeline
- Custom HuggingFace model (SmolLM2-135M) LoRA SFT — inline
ModelConfigurationsubclass, no catalog entry - Qwen3-4B GRPO on GSM8K (colocated)
- Customizing your slime run — scaling nodes, parallelism, and throughput
- Qwen3-4B GRPO on haiku poems — structure score + LLM judge
- Qwen3-4B RL code-golf on MBPP with Harbor sandboxes
- GLM-4.7 LoRA SFT on GSM8K (Megatron)