In this case, the use is extracting words that can be used in full-text search, so structural extraction isn't a key criteria.
In case someone wants to know more, the former is known as "full page OCR" and the latter as "data capture"/"document processing" (or IDP, intelligent document processing).
Full page OCR for machine printed text is considered a solved problem (but not for handwritten text).
Data capture is hard to do and involves extracting specific fields from documents.
The first big cloud company going into data capture territory was Amazon with AWS Textract (calling it OCR++). There's also Document Understanding AI (Google) and Azure Form Recognizer in Beta, as mentioned by others in this thread.
The big 3 RPA companies (UiPath, Automation Anywhere, Blue Prism) have also gone into data capture (calling it cognitive or intelligent RPA).
ABBYY (with FlexiCapture) and Kofax (who recently acquired Nuance's imaging division, the 2nd most popular OCR engine after ABBYY's) are the traditional IDP players.
Full page OCR for machine printed text is considered a solved problem (but not for handwritten text). Data capture is hard to do and involves extracting specific fields from documents.
The first big cloud company going into data capture territory was Amazon with AWS Textract (calling it OCR++). There's also Document Understanding AI (Google) and Azure Form Recognizer in Beta, as mentioned by others in this thread.
The big 3 RPA companies (UiPath, Automation Anywhere, Blue Prism) have also gone into data capture (calling it cognitive or intelligent RPA).
ABBYY (with FlexiCapture) and Kofax (who recently acquired Nuance's imaging division, the 2nd most popular OCR engine after ABBYY's) are the traditional IDP players.