from modal_training_gym.common.models.qwen3_6_27b import Qwen3_6_27BQwen3.6-27B (27B-parameter dense) model from Alibaba.
Inherits from: HFModelConfiguration, ModelConfig
Fields
Section titled “Fields”| Field | Type | Default | Description |
|---|---|---|---|
model_name | str | "Qwen/Qwen3.6-27B" | |
model_path | str | None | None | |
architecture | ModelArchitecture | None | ModelArchitecture(num_layers=64, hidden_size=5120, ffn_hidden_size=17408, num_attention_heads=24, group_query_attention=True, num_query_groups=4, kv_channels=256, vocab_size=248320, normalization='RMSNorm', norm_epsilon=1e-06, swiglu=True, disable_bias_linear=True, qk_layernorm=True, untie_embeddings_and_output_weights=True, num_experts=0, moe_ffn_hidden_size=0, moe_shared_expert_intermediate_size=0, moe_grouped_gemm=False, moe_shared_expert_gate=False, moe_router_topk=0, moe_router_score_function='', moe_token_drop_policy='', moe_router_dtype='', moe_permute_fusion=False, moe_aux_loss_coeff=None, megatron_spec=['slime_plugins.models.qwen3_5', 'get_qwen3_5_spec'], megatron_model_type='qwen3.5-27B', apply_layernorm_1p=True, use_gated_attention=True, attention_output_gate=True, use_rotary_position_embeddings=True, rotary_base=10000000, rotary_percent=0.25) | |
response_parser | Optional[Callable[[str], ParsedResponse]] | <function parse_qwen3_6_response at 0x1035232e0> |
Methods
Section titled “Methods”download(self) -> 'None'
Section titled “download(self) -> 'None'”Download or materialize weights into the model volume.
parse_response(self, text: 'str') -> 'ParsedResponse'
Section titled “parse_response(self, text: 'str') -> 'ParsedResponse'”Parse raw model output into structured content.
response_parser(text: 'str') -> 'ParsedResponse'
Section titled “response_parser(text: 'str') -> 'ParsedResponse'”Parse Qwen3.5/3.6-family model output into structured content.