from modal_training_gym.common.dataset import DatasetConfigDataset configuration shared across training frameworks.
Fields
Section titled “Fields”| Field | Type | Default | Description |
|---|---|---|---|
dataset_id | str | "" | |
input_key | str | "" | |
label_key | str | "" | |
apply_chat_template | bool | True | |
always_prepare | bool | False |
Methods
Section titled “Methods”load(self, split: "Literal['all', 'train', 'eval']" = 'all') -> 'Any'
Section titled “load(self, split: "Literal['all', 'train', 'eval']" = 'all') -> 'Any'”Load raw examples, optionally filtered by split.
prepare(self, path: 'str', eval_paths: 'dict[str, str] | None' = None) -> 'None'
Section titled “prepare(self, path: 'str', eval_paths: 'dict[str, str] | None' = None) -> 'None'”Materialize training data to path (and eval splits to eval_paths).
validate_prepared(self, path: 'str') -> 'None'
Section titled “validate_prepared(self, path: 'str') -> 'None'”Sniff what prepare() wrote and confirm the columns the framework will index.