Skip to content
GitHub
View on GitHub

DatasetConfig

Dataset configuration shared across training frameworks.

from modal_training_gym.common.dataset import DatasetConfig

Dataset configuration shared across training frameworks.

FieldTypeDefaultDescription
dataset_idstr""
input_keystr""
label_keystr""
apply_chat_templateboolTrue
always_prepareboolFalse

load(self, split: "Literal['all', 'train', 'eval']" = 'all') -> 'Any'

Section titled “load(self, split: "Literal['all', 'train', 'eval']" = 'all') -> 'Any'”

Load raw examples, optionally filtered by split.

prepare(self, path: 'str', eval_paths: 'dict[str, str] | None' = None) -> 'None'

Section titled “prepare(self, path: 'str', eval_paths: 'dict[str, str] | None' = None) -> 'None'”

Materialize training data to path (and eval splits to eval_paths).

validate_prepared(self, path: 'str') -> 'None'

Section titled “validate_prepared(self, path: 'str') -> 'None'”

Sniff what prepare() wrote and confirm the columns the framework will index.

Source: modal_training_gym/common/dataset.py