TrainResult
Section titled “TrainResult”from modal_training_gym.common.train_result import TrainResultOne completed training run’s checkpoint handle.
Constructor
Section titled “Constructor”TrainResult(app_name, framework, run_id, checkpoint_dir, base_model, model_class='', checkpoints_volume_name='', checkpoints_mount_path='', iteration_prefix='', wandb_project='', wandb_entity='', wandb_run_id='', extra=<factory>)| Parameter | Type | Default | Description |
|---|---|---|---|
app_name | str | required | |
framework | str | required | |
run_id | str | required | |
checkpoint_dir | str | required | |
base_model | str | required | |
model_class | str | "" | |
checkpoints_volume_name | str | "" | |
checkpoints_mount_path | str | "" | |
iteration_prefix | str | "" | |
wandb_project | str | "" | |
wandb_entity | str | "" | |
wandb_run_id | str | "" | |
extra | dict[str, Any] | <factory> |
Attributes
Section titled “Attributes”| Attribute | Type | Default | Description |
|---|---|---|---|
app_name | str | ||
framework | str | ||
run_id | str | ||
checkpoint_dir | str | ||
base_model | str | ||
model_class | str | "" | |
checkpoints_volume_name | str | "" | |
checkpoints_mount_path | str | "" | |
iteration_prefix | str | "" | |
wandb_project | str | "" | |
wandb_entity | str | "" | |
wandb_run_id | str | "" | |
extra | dict[str, Any] | {} |
Methods
Section titled “Methods”build_serve_app(self, *, served_model_name: 'str | None' = None, checkpoint_path: 'str | None' = None, **vllm_kwargs: 'Any') -> "'App'"
Section titled “build_serve_app(self, *, served_model_name: 'str | None' = None, checkpoint_path: 'str | None' = None, **vllm_kwargs: 'Any') -> "'App'"”Build a vLLM serving app pointing at a trained checkpoint.
dashboard_url(self) -> 'str'
Section titled “dashboard_url(self) -> 'str'”URL for browsing the checkpoints volume in the Modal dashboard.
latest_checkpoint_path(self) -> 'str'
Section titled “latest_checkpoint_path(self) -> 'str'”Absolute in-volume path of the latest checkpoint.
list_checkpoints(self) -> 'list[str]'
Section titled “list_checkpoints(self) -> 'list[str]'”Return per-iteration checkpoint directory names under
list_runs(app_name: 'str') -> 'list[str]'
Section titled “list_runs(app_name: 'str') -> 'list[str]'”Return all run_ids saved for app_name, sorted oldest
load(app_name: 'str', run_id: 'str | None' = None) -> "'TrainResult'"
Section titled “load(app_name: 'str', run_id: 'str | None' = None) -> "'TrainResult'"”Load a completed run’s result from the shared store.
save(self) -> 'None'
Section titled “save(self) -> 'None'”Persist this result to the shared :class:modal.Dict.
volume(self) -> "'Volume'"
Section titled “volume(self) -> "'Volume'"”Return a handle to the checkpoints :class:modal.Volume.
wandb_metrics(self, keys: 'list[str] | None' = None, samples: 'int' = 500) -> 'list[dict[str, Any]]'
Section titled “wandb_metrics(self, keys: 'list[str] | None' = None, samples: 'int' = 500) -> 'list[dict[str, Any]]'”Fetch training metrics from W&B.
wandb_summary(self) -> 'dict[str, Any]'
Section titled “wandb_summary(self) -> 'dict[str, Any]'”Fetch the W&B run summary (final metric values).
wandb_url(self) -> 'str | None'
Section titled “wandb_url(self) -> 'str | None'”Return the W&B run URL, or None if W&B info is not set.