Skip to content
GitHub
View on GitHub

DatasetConfig

API reference for DatasetConfig

from modal_training_gym.common.dataset import DatasetConfig

Dataset configuration shared across training frameworks.

FieldTypeDefaultDescription
prompt_datastr""Path to the training data file (e.g. a .parquet file on the data volume). Default "".
eval_prompt_data`list[str]strNone`
input_keystr""Column/key name for model input in the dataset. Default "".
label_keystr""Column/key name for labels/targets in the dataset. Default "".
apply_chat_templateboolTrueWhether to apply the model’s chat template to inputs. Default True.
rollout_shuffleboolTrueWhether to shuffle data during rollout generation. Default True.

Download and/or preprocess the dataset into the data volume.

Source: modal_training_gym/common/dataset.py