Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm responsible for multiple LLM apps with hundreds of thousands of DAU total. I have built and am using promptfoo to iterate: https://github.com/promptfoo/promptfoo

My workflow is based on testing: start by defining a set of representative test cases and using them to guide prompting. I tend to prefer programmatic test cases over LLM-based evals, but LLM evals seem popular these days. Then, I create a hypothesis, run an eval, and if the results show improvement I share them with the team. In some of my projects, this is integrated with CI.

The next step is closing the feedback loop and gathering real-world examples for your evals. This can be difficult to do if you respect the privacy of your users, which is why I prefer a local, open-source CLI. You'll have to set up the appropriate opt-ins etc. to gather this data, if at all.



Great workflow and tool, I'll do some tests with promptfoo tonight.


promptfoo looks excellent. will be digging into this later, thanks for sharing!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: