from modal_training_gym.common.eval import EvalConfigEvaluate a deployed model on a dataset config.
Fields
Section titled “Fields”| Field | Type | Default | Description |
|---|---|---|---|
dataset | 'DatasetConfig' | ||
eval_fn | EvalFn | None | None | |
eval_response_fn | EvalResponseFn | None | None | |
prompt_column | str | None | None | |
eval_config_id | str | None | None | |
generate_kwargs | dict[str, Any] | {} |
Methods
Section titled “Methods”build_prompt(self, row: 'DatasetRow') -> 'str'
Section titled “build_prompt(self, row: 'DatasetRow') -> 'str'”evaluate(self, deployment: "'ModelDeployment'", debug: 'bool' = False, max_concurrency: 'int' = 1) -> 'EvalResult'
Section titled “evaluate(self, deployment: "'ModelDeployment'", debug: 'bool' = False, max_concurrency: 'int' = 1) -> 'EvalResult'”save(self) -> 'EvalConfigDurable'
Section titled “save(self) -> 'EvalConfigDurable'”to_durable(self) -> 'EvalConfigDurable'
Section titled “to_durable(self) -> 'EvalConfigDurable'”Related Tutorials
Section titled “Related Tutorials”- Qwen3-4B haiku evaluation with verifiable rewards — serve, evaluate, train, compare
- Multi-turn number-guessing RL with custom generate and reward functions
- On-policy distillation on math — Qwen3-8B teacher, Qwen3-4B student
- DAPO on math with Qwen3-4B
- Audio GRPO on Qwen3-ASR-1.7B — transcribe LibriSpeech, reward −WER