from modal_training_gym.common.dataset import HuggingFaceDatasetDataset backed by a HuggingFace datasets repo.
Inherits from: DatasetConfig
Fields
Section titled “Fields”| Field | Type | Default | Description |
|---|---|---|---|
dataset_id | str | "" | |
input_key | str | "" | |
label_key | str | "label" | |
apply_chat_template | bool | True | |
always_prepare | bool | False | |
hf_repo | str | "" | |
hf_split | str | "train" | |
hf_config | str | None | None | |
output_format | str | "parquet" | |
input_column | str | "" | |
output_column | str | "" | |
system_prompt | str | "" | |
prompt_template | str | "{input}" | |
n_rows | int | 0 |
Methods
Section titled “Methods”load(self, split: "Literal['all', 'train', 'eval']" = 'all') -> 'Any'
Section titled “load(self, split: "Literal['all', 'train', 'eval']" = 'all') -> 'Any'”Load raw examples, optionally filtered by split.
prepare(self, path: 'str', eval_paths: 'dict[str, str] | None' = None) -> 'None'
Section titled “prepare(self, path: 'str', eval_paths: 'dict[str, str] | None' = None) -> 'None'”Materialize training data to path (and eval splits to eval_paths).
to_pandas(self, *, formatted: 'bool' = False)
Section titled “to_pandas(self, *, formatted: 'bool' = False)”validate_prepared(self, path: 'str') -> 'None'
Section titled “validate_prepared(self, path: 'str') -> 'None'”Sniff what prepare() wrote and confirm the columns the framework will index.