Wipro Limited
Method and device for extracting images from portable document format (PDF) documents
Last updated:
Abstract:
A method and device for extracting images from PDF documents are disclosed. The method includes performing a text recognition process on a PDF document that includes one or more images. The text recognition process replaces the one or more images with a plurality of contiguous newlines. The method further includes storing a location of each of the one or more images within the PDF document based on occurrence of the plurality of contiguous newlines within the PDF document. The method includes converting each page of the PDF document to an image format in order to generate an image document corresponding to the PDF document. The method further includes extracting each of the one or more images from the image document based on the location stored for each of the one or more images within the PDF document.
Utility
11 Jul 2017
15 Oct 2019