Windowed-FIFO rollout scheduling
Windowed-FIFO rollout scheduling for over-sampled RL
When you over-sample an RL rollout — generate more prompt groups than you
keep, as DAPO does — slime collects the first rollout_batch_size groups to
finish and discards the rest.
This greedy “first-finished-first-out” (FFFO) collection has a subtle failure mode:
- In agent RL, completion time tracks task difficulty (hard prompts produce longer generations, take longer).
Thus, the fastest-finishing groups are the easy ones, and the kept batch skews easy— harder prompts are systematically dropped. This causes the training distribution to drift, and gradients to oscillate (MiniMax Forge §3.1).
There are many ways to fix this, but one of the methods is implementing a Windowed FIFO, where you
keep a sliding window of width W = ratio x N over the generation queue (N = generation batch size).
A completed group may be collected only while it sits inside the window; groups that are past the window stay blocked even when it is finished, until the window advances as the head is consumed.
This keeps collection close to queue order — which is independent of difficulty, so the kept batch matches the true population.
Note that you only need to implement this if you are using over-sampling** (or dynamic filtering). If every
generated group is kept, there’s nothing to reorder — set over_sampling_batch_size > rollout_batch_size for the knob to take effect.
See the bias (no GPU needed)
Section titled “See the bias (no GPU needed)”Before launching anything, let’s watch the problem and the fix on the actual
scheduler. WindowedFIFOCollector is the pure-Python core of the rollout — no
slime, no GPU. We simulate 256 prompt groups (2x over-sampling for a batch of
128) where harder prompts finish later, then collect a batch under FFFO vs.
windowed FIFO and compare how many hard prompts survive.
import numpy as np
from modal_training_gym.frameworks.slime.windowed_fifo_rollout import ( WindowedFIFOCollector,)
N, target = 256, 128
def kept_batch(window_size): rng = np.random.default_rng(0) # Half easy, half hard prompts; latency rises with difficulty. d = np.where(rng.random(N) < 0.5, rng.beta(2, 5, N), rng.beta(5, 2, N)) finish_order = np.argsort(d + 0.03 * rng.normal(0, 1, N))
collector = WindowedFIFOCollector(total=N, window_size=window_size) kept = [] for pos in finish_order: # groups finish fastest-first collector.mark_completed(int(pos), int(pos)) for g in collector.drain(): if len(kept) < target: kept.append(g) if len(kept) >= target: break hard_frac = (d[np.array(kept)] > 0.6).mean() return hard_frac, (d > 0.6).mean()
fffo_hard, pop_hard = kept_batch(window_size=N) # ratio = 1.0win_hard, _ = kept_batch(window_size=int(0.3 * N)) # ratio = 0.3
print(f"population hard-prompt fraction: {pop_hard:.0%}")print(f" greedy FFFO (ratio=1.0): {fffo_hard:.0%} <- hard prompts dropped")print(f" windowed (ratio=0.3): {win_hard:.0%} <- matches population")Turn it on in training
Section titled “Turn it on in training”Setting windowed_fifo_ratio does two things for you: it selects the
windowed-FIFO rollout function (--rollout-function-path) and ships the ratio
to slime. Pair it with over_sampling_batch_size — that’s what gives the
scheduler completed groups to choose between. Here we over-sample 2x (32 → 16)
on a short Qwen3-4B math run.
Async training (async_mode=True) is the natural home for this: it overlaps
generation with training, so the scheduler is continuously deciding which
completed groups to admit.
We will be using a windowed_fifo_ratio=0.3 which is Forge’s recommended balance, where
the window_fifo_ratio corresponds to the ratio of the window size to the generation batch size.
windowed_fifo_ratio=0.0→ window of 1 → strict FIFO (a slow head blocks everything)windowed_fifo_ratio=1.0→ window of N → greedy FFFO (no blocking at all)
from typing import Any
from modal_training_gym import ( HuggingFaceDataset, Qwen3_4B, SlimeRecipe, TrainConfig,)class MathDataset(HuggingFaceDataset): hf_repo = "zhuzilin/dapo-math-17k" input_key = "prompt" label_key = "label" output_format = "jsonl" apply_chat_template = True always_prepare = True
def load(self, split: str = "all") -> Any: from datasets import load_dataset
ds = load_dataset(self.hf_repo, self.hf_config, split=self.hf_split) stop = len(ds) if not self.n_rows else min(self.n_rows, len(ds)) return ds.select(range(stop))
train_dataset = MathDataset(n_rows=2_000)base_model = Qwen3_4B()training_run = TrainConfig( model=base_model, dataset=train_dataset, recipe=SlimeRecipe( rm_type="dapo", gpu_type="H100", colocate=True, actor_num_nodes=1, actor_num_gpus_per_node=8, tensor_model_parallel_size=2, sequence_parallel=True, rollout_num_gpus_per_engine=1, async_mode=True, num_rollout=15, rollout_batch_size=16, # Over-sample 2x so the windowed-FIFO scheduler has groups to choose # between — without this, the ratio below is a no-op. over_sampling_batch_size=32, windowed_fifo_ratio=0.3, n_samples_per_prompt=8, rollout_max_response_len=8192, rollout_temperature=1.0, global_batch_size=32, lr=1e-6, advantage_estimator="grpo", use_kl_loss=False, kl_coef=0.0, eps_clip=0.2, eps_clip_high=0.28, use_dynamic_batch_size=True, max_tokens_per_gpu=9216, sglang_mem_fraction_static=0.75, save_interval=10, apply_chat_template_kwargs='{"enable_thinking": true}', ),)train_result = training_run.train()print(f"Training run id: {train_result.training_run_id}")Tuning
Section titled “Tuning”windowed_fifo_ratio | behavior | when |
|---|---|---|
0.0 | strict FIFO — max consistency, a slow head blocks the batch | low-variance tasks |
0.3 | Forge’s balance | general agent RL |
0.5 | more straggler-tolerant | high-variance tasks (coding, web agents) |
1.0 | greedy FFFO (slime’s default) | uniform difficulty |
Windowed FIFO composes cleanly with the rest of the stack: over_sampling_batch_size
feeds it candidates, dynamic_sampling_filter_path still drops zero-variance
groups (now in windowed order), and balance_data independently controls how the
kept batch is spread across DP ranks. It changes which groups train, not how
they’re optimized.
Related API Reference
Section titled “Related API Reference”Source: tutorials/rl/007_windowed_fifo/007_windowed_fifo.py
| Open in Modal Notebook