Textract

  • Amazon Textract is a machine learning (ML) service that automatically extracts text, handwriting, and data from scanned documents.

  • It goes beyond simple optical character recognition (OCR) to identify, understand, and extract data from forms and tables

  • Automatically extract printed text, handwriting, and data from any document

  • Features:

    • Optical character recognition (OCR)

    • Identifies relationships, structure, and text

    • Uses AI to extract text and structured data

    • Recognizes handwriting as well as printed text

    • Can extract from documents such as PDFs, images, forms, and tables

    • Understands context. For example know what data to extract from a receipt or invoice

Last updated