RLVR on Julien Pourcel

RLVR on Julien Pourcel https://julienp.netlify.app/tags/rlvr/ Recent content in RLVR on Julien Pourcel Julien Pourcel https://julienp.netlify.app/images/papermod-cover.png https://julienp.netlify.app/images/papermod-cover.png Hugo -- 0.147.3 en-us Mon, 25 May 2026 10:00:00 +0200 Reasoning to Cheat: How RLVR-Trained Models Can Exploit Code Benchmarks https://julienp.netlify.app/posts/reward_hacking/ Mon, 25 May 2026 10:00:00 +0200 https://julienp.netlify.app/posts/reward_hacking/ What happens when ~80 open-weight LLMs face hard Python programming problems, and why reasoning models cheat far more than non-reasoning ones.