Any ideas are much appreciated.
Oral exams graded by LLMs? Scale with the improving models. Based on GPQA Diamond results they're mostly at PhD level for subject trivia anyway.
In the end, will be build a GAN loop?
Why am I now reminded of corewars?
reply
Any ideas are much appreciated.
Oral exams graded by LLMs? Scale with the improving models. Based on GPQA Diamond results they're mostly at PhD level for subject trivia anyway.