I'm responsible for multiple LLM apps with hundreds of thousands of DAU total. I... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		typpo on July 19, 2023 \| parent \| context \| favorite \| on: Ask HN: How are you improving your use of LLMs in ... I'm responsible for multiple LLM apps with hundreds of thousands of DAU total. I have built and am using promptfoo to iterate: https://github.com/promptfoo/promptfoo My workflow is based on testing: start by defining a set of representative test cases and using them to guide prompting. I tend to prefer programmatic test cases over LLM-based evals, but LLM evals seem popular these days. Then, I create a hypothesis, run an eval, and if the results show improvement I share them with the team. In some of my projects, this is integrated with CI. The next step is closing the feedback loop and gathering real-world examples for your evals. This can be difficult to do if you respect the privacy of your users, which is why I prefer a local, open-source CLI. You'll have to set up the appropriate opt-ins etc. to gather this data, if at all.

meiraleal on July 19, 2023 | [–]

Great workflow and tool, I'll do some tests with promptfoo tonight.

skyfallsin on July 19, 2023 | [–]

promptfoo looks excellent. will be digging into this later, thanks for sharing!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact