Qwen3_4B
Section titled “Qwen3_4B”from modal_training_gym.common.models.qwen3_4b import Qwen3_4BQwen3-4B (4 billion parameters) from Alibaba.
Inherits from: HFModelConfiguration, ModelConfiguration
Fields
Section titled “Fields”| Field | Type | Default | Description |
|---|---|---|---|
model_name | str | "Qwen/Qwen3-4B" | HuggingFace repo ID or other model identifier. Default "". |
model_path | `str | None` | None |
architecture | `ModelArchitecture | None` | ModelArchitecture(num_layers=36, hidden_size=2560, ffn_hidden_size=9728, num_attention_heads=32, group_query_attention=True, num_query_groups=8, kv_channels=128, vocab_size=151936, normalization='RMSNorm', norm_epsilon=1e-06, swiglu=True, disable_bias_linear=True, qk_layernorm=True, use_rotary_position_embeddings=True, rotary_base=1000000) |
training | `ModelTrainingConfig | None` | None |
Methods
Section titled “Methods”download_model(self) -> 'None'
Section titled “download_model(self) -> 'None'”Download or materialize weights into the model volume.
Related Tutorials
Section titled “Related Tutorials”- Shared concepts: config containers, framework factories, volume layout, running the pipeline
- Qwen3-4B GRPO on GSM8K (colocated)
- Customizing your slime run — scaling nodes, parallelism, and throughput
- Qwen3-4B GRPO on haiku poems — structure score + LLM judge
- Qwen3-4B RL code-golf on MBPP with Harbor sandboxes