Mar 22, 2026 估 KL penalty 的三个 estimator / Three ways to estimate the KL penalty Mar 15, 2026 从 PPO 到 SAPO:每一代 RL 算法究竟在改什么 / From PPO to SAPO, what each version actually changes Oct 15, 2025 教扑克 bot 接近 GTO:从 CFR 到实时重解,还有差点把我骗了的那次实验 / Teaching a poker bot to approach GTO: from CFR to real-time re-solving, and the run that almost fooled me