Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Details on how DS used GRPO for RL rewards

https://medium.com/@sahin.samia/the-math-behind-deepseek-a-d...



Thanks!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: