Posts by Category

Reinforcement Learning

What’s in Pass@K?

11 minute read

Published:

Pass@k is ubiquitous in evaluating reasoning models, but the metric is more subtle than it appears. Computing it correctly requires the unbiased estimator, and the nonlinearity of pass@k means it effectively upweights hard problems compared to pass@1.

Implementing Training-Free Process Rewards in VeRL

9 minute read

Published:

A training-free approach to process rewards: estimate V(prefix) via log-probability, compute marginal utility across episodes. Plus VeRL implementation pitfalls to avoid.

Understanding Length Dynamics in RL Training

37 minute read

Published:

An empirical investigation into what drives output length growth during RL training, revealing that dataset difficulty composition is the primary driver behind the ‘overthinking’ phenomenon.

Research

Understanding Length Dynamics in RL Training

37 minute read

Published:

An empirical investigation into what drives output length growth during RL training, revealing that dataset difficulty composition is the primary driver behind the ‘overthinking’ phenomenon.