Skip to content
GitHub
View on GitHub

HarborEval

Evaluate a deployed model on a Harbor dataset using sandbox execution.

from modal_training_gym.common.harbor.eval import HarborEval

Evaluate a deployed model on a Harbor dataset using sandbox execution.

Inherits from: EvalConfig

FieldTypeDefaultDescription
dataset'DatasetConfig'
eval_fnEvalFn | NoneNone
eval_response_fnEvalResponseFn | NoneNone
prompt_columnstr | NoneNone
eval_config_idstr | NoneNone
generate_kwargsdict[str, Any]{}
model'ModelConfig | None'None
test_caseslist[dict[str, str]] | NoneNone
sandbox_timeoutint60
sandbox_cpufloat1.0
sandbox_memoryint1024
sandbox_cpu_policystr"limit"
sandbox_memory_policystr"limit"
sandbox_python_versionstr"3.11"
extract_code_fnCallable[[str], str] | NoneNone

build_prompt(self, row: 'DatasetRow') -> 'str'

Section titled “build_prompt(self, row: 'DatasetRow') -> 'str'”

evaluate(self, deployment: "'ModelDeployment'", debug: 'bool' = False, max_concurrency: 'int' = 1) -> 'EvalResult'

Section titled “evaluate(self, deployment: "'ModelDeployment'", debug: 'bool' = False, max_concurrency: 'int' = 1) -> 'EvalResult'”

Source: modal_training_gym/common/harbor/eval.py