Skip to content
GitHub
View on GitHub

LlmJudge

API reference for LlmJudge

from modal_training_gym.common.llm_judge import LlmJudge

LLM-as-judge client for an OpenAI-compatible chat-completions endpoint.

LlmJudge(model_name, base_url, max_score=10.0, max_tokens=100)
ParameterTypeDefaultDescription
model_namestrrequiredModel identifier to pass in the chat-completions request (e.g. "qwen3-4b").
base_urlstrrequiredBase URL of the OpenAI-compatible endpoint (e.g. "http://localhost:8000").
max_scorefloat10.0Upper bound of the judge’s numeric scale. Scores are normalized to [0, 1] by dividing by this value. Default 10.0.
max_tokensint100Maximum tokens in the judge response. Default 100.

build_prompt(self, prompt: 'str', response: 'str', **kwargs: 'Any') -> 'str'

Section titled “build_prompt(self, prompt: 'str', response: 'str', **kwargs: 'Any') -> 'str'”

Return the judge prompt for one (prompt, response) pair.

score(self, session: "'aiohttp.ClientSession'", prompt: 'str', response: 'str', **kwargs: 'Any') -> 'float'

Section titled “score(self, session: "'aiohttp.ClientSession'", prompt: 'str', response: 'str', **kwargs: 'Any') -> 'float'”

Score one (prompt, response) pair; returns a float in [0, 1].

Source: modal_training_gym/common/llm_judge.py