from modal_training_gym.deploy_recipes.sglang_recipe import SglangRecipeSGLang serving configuration.
Inherits from: BaseDeployRecipe
Fields
Section titled “Fields”| Field | Type | Default | Description |
|---|---|---|---|
recipe_type | DeployRecipeType | sglang | |
gpu | Literal['H100', 'H200', 'B200', 'B300'] | "H100" | GPU type for the serving container. Default "H100". |
tp | int | None | None | Tensor parallelism degree. Default None (SGLang infers from GPU count). |
dp | int | None | None | Data parallelism degree. Enables --enable-dp-attention when set. Default None. |
context_length | int | None | None | Maximum context length. Default None (model default). |
mem_fraction_static | float | None | None | Fraction of GPU memory for KV cache. Default None (SGLang default). |
chunked_prefill_size | int | None | None | Chunked prefill token budget. Default None. |
max_running_requests | int | None | None | Max concurrent requests per worker. Default None. |
sglang_image | str | "lmsysorg/sglang:v0.5.12" | Docker image tag for the SGLang container. Default is a recent nightly. |
extra_server_args | dict[str, str] | None | None | Additional --flag value pairs passed to sglang.launch_server. Use an empty string value for boolean flags (e.g. {"--trust-remote-code": ""}). Default None. |
environment_name | str | None | None | Modal environment to deploy into. Default None. |
deploy_strategy | str | "rolling" | Modal deployment strategy. Default "rolling". |
startup_timeout | int | 1200 | Seconds the server container is allowed to spend in startup before Modal kills it — gates both Modal’s container startup_timeout and the SGLang health-check poll. Bump this for very large models whose weight load exceeds the default (e.g. GLM-4.7 at 355B, Kimi-K2.5 at ~1T). Default 1200 (20 minutes). |
Methods
Section titled “Methods”server_args(self, *, served_model_name: 'str') -> 'dict[str, str]'
Section titled “server_args(self, *, served_model_name: 'str') -> 'dict[str, str]'”Build the --flag value dict for the SGLang launch command.