Hacker Newsnew | past | comments | ask | show | jobs | submit | fromlogin
Releasing Vending-Bench 2 for measuring model performance on running a business (andonlabs.com)
2 points by lr0 1 day ago | past | discuss
Bengt Hires a Human–Towards a Happy Future with AI Employers (andonlabs.com)
2 points by lukaspetersson 10 days ago | past | 1 comment
The Evolution of Bengt Betjänt (andonlabs.com)
54 points by lukaspetersson 17 days ago | past | 7 comments
Vending-Bench 2 (andonlabs.com)
2 points by samdung 17 days ago | past
Opus 4.6 on Vending-Bench – Not Just a Helpful Assistant (andonlabs.com)
5 points by lukaspetersson 21 days ago | past | 1 comment
Gemini 3 is #1 on Vending-Bench 2 (andonlabs.com)
1 point by lukaspetersson 3 months ago | past
Our LLM-controlled office robot can't pass butter (andonlabs.com)
229 points by lukaspetersson 4 months ago | past | 117 comments
Misaligned Vending Machines [pdf] (andonlabs.com)
1 point by bulla 5 months ago | past
Vending-Bench: Testing long-term coherence in agents (andonlabs.com)
3 points by andromaton 8 months ago | past | 1 comment
Vending-Bench: Testing long-term coherence in agents (andonlabs.com)
1 point by vector_spaces 10 months ago | past
Vending-Bench: Testing long-term coherence in agents (andonlabs.com)
5 points by tosh 10 months ago | past | 2 comments
Vending-Bench: Testing long-term coherence in agents (andonlabs.com)
1 point by gdeglin 11 months ago | past
Claude isn't the best Computer-use agent (andonlabs.com)
2 points by lukaspetersson on Jan 10, 2025 | past

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: