May 11, 2026 BenchCAD — evaluating LLMs on the part of code where output is physical Mar 15, 2026 A practitioner's tour of RL for LLMs — DPO, GRPO, GSPO, AReaL