SglangRecipe

SGLang serving configuration.

from modal_training_gym.deploy_recipes.sglang_recipe import SglangRecipe

SGLang serving configuration.

Inherits from: BaseDeployRecipe

Fields

Field	Type	Default	Description
`recipe_type`	`DeployRecipeType`	`sglang`
`gpu`	`Literal['H100', 'H200', 'B200', 'B300']`	`"H100"`	GPU type for the serving container. Default `"H100"`.
`tp`	`int \| None`	`None`	Tensor parallelism degree. Default `None` (SGLang infers from GPU count).
`dp`	`int \| None`	`None`	Data parallelism degree. Enables `--enable-dp-attention` when set. Default `None`.
`context_length`	`int \| None`	`None`	Maximum context length. Default `None` (model default).
`mem_fraction_static`	`float \| None`	`None`	Fraction of GPU memory for KV cache. Default `None` (SGLang default).
`chunked_prefill_size`	`int \| None`	`None`	Chunked prefill token budget. Default `None`.
`max_running_requests`	`int \| None`	`None`	Max concurrent requests per worker. Default `None`.
`sglang_image`	`str`	`"lmsysorg/sglang:v0.5.12"`	Docker image tag for the SGLang container. Default is a recent nightly.
`extra_server_args`	`dict[str, str] \| None`	`None`	Additional `--flag value` pairs passed to `sglang.launch_server`. Use an empty string value for boolean flags (e.g. `{"--trust-remote-code": ""}`). Default `None`.
`environment_name`	`str \| None`	`None`	Modal environment to deploy into. Default `None`.
`deploy_strategy`	`str`	`"rolling"`	Modal deployment strategy. Default `"rolling"`.
`startup_timeout`	`int`	`1200`	Seconds the server container is allowed to spend in startup before Modal kills it — gates both Modal’s container `startup_timeout` and the SGLang health-check poll. Bump this for very large models whose weight load exceeds the default (e.g. GLM-4.7 at 355B, Kimi-K2.5 at ~1T). Default `1200` (20 minutes).

Methods

`server_args(self, *, served_model_name: 'str') -> 'dict[str, str]'`

Build the --flag value dict for the SGLang launch command.

Source: modal_training_gym/deploy_recipes/sglang_recipe.py