Hacker Newsnew | past | comments | ask | show | jobs | submit | f_k's commentslogin

https://citellm.com

Working on CiteLLM, an API that extracts structured data from PDFs and returns citations for each field (page + coordinates + source snippet + confidence).

Instead of blindly trusting the LLM, you can verify every value by linking it back to its exact location in the original PDF.


I'm working on this exact problem with https://citellm.com .

Every extracted field comes with a precise citation back to the source document (page + snippet + bounding box + confidence score) so reviewers can verify where each value came from.

Hallucinations get flagged automatically because there's no supporting text in the source.

The goal is to make HITL fast and not have reviewers read through the whole document.


https://citellm.com

Working on CiteLLM, an API that extracts structured data from PDFs and returns citations for each field (page + coordinates + source snippet + confidence).

Instead of blindly trusting the LLM, you can verify every value by linking it back to its exact location in the original PDF.


> verifying their claims ends up taking time.

I've been working on this problem with https://citellm.com, specifically for PDFs.

Instead of relying on the LLM answer alone, each extracted field links to its source in the original document (page number + highlighted snippet + confidence score).

Checking any claim becomes simple: click and see the exact source.


I'm working on SuperCurate (https://getsupercurate.com), which is geared towards note retrieval and curation rather than note creation. Think filing cabinet for your notes, web clippings, images and PDFs.

I wanted fast search and filters for my Evernote archive so I could drill down and surface exactly what I was looking for.

There's also a Web Clipper extension for Chrome.

Demos:

Search and curation: https://www.youtube.com/watch?v=z4QSIoUL4Uk

Web Clipper: https://www.youtube.com/watch?v=8F7QoC7X3fs

Search inside PDFs (jumps to page + highlights snippet): https://www.youtube.com/watch?v=t0X9sD-938Q

It's free while in beta, would love feedback if you try it.




We've built an app like that but for PDF table extraction, https://table2xl.com


Looks great! Do you mind talking about your tech stack? Do you build on top of Tessaract or do you use a custom model?


Shameless plug: https://getsearchablepdf.com

There's a free trial so you can check if it works for your handwriting.


If you're on Windows try https://table2xl.com (disclosure: I'm the founder), it's more accurate than Excel's camera import. No API though.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: