Details on how DS used GRPO for RL rewards https://medium.com/@sahin.samia/the-m...

		mtkd 10 months ago \| parent \| context \| favorite \| on: TL;DR of Deep Dive into LLMs Like ChatGPT by Andre... Details on how DS used GRPO for RL rewards https://medium.com/@sahin.samia/the-math-behind-deepseek-a-d...

Thanks!