Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Have you tried the tabula free program? I use it for some finance work reading filings.

http://tabula.technology/



Yep! It's great, but is maybe 60% there, so I'm looking for something that can extract much more structure from a document. I doubt what I'm looking for will exist for another 10 years, though.


is it feasible to create loose templates for where the data is and extract that way? i have a mothballed project that did pretty well. it was able to discern different templates from a mass of documents.


I'm curious, if you email me a sample I can tell you what's possible.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: