Under normal circumstances, you'll be able to download your PDFdeconstruct software within one business days after you purchase it.
When you buy this product, you'll receive an executable package that can be installed on one Linux computer. If you need a source-code license, please contact us.
PDFdeconstruct™ decomposes PDF files into XML files. The XML output includes:
- text – Unicode text with font, color and position data for each word (or each character)
- images – in PNG, TIFF or JPEG format
- vector graphics – complete path information for fills and strokes
- form fields – with field names and values
PDFdeconstruct can be used for:
- document-format conversion: convert PDF to other formats
- document analysis: examine the content of a PDF page
- complex content extraction: e.g., input to further processing based on text with position information
The PDFdeconstruct output format is described in the manual.
PDFdeconstruct is a cross-platform command-line tool, suitable for use on servers or for batch-mode processing. Basic usage looks like this:
pdfdeconstruct test.pdf testout
will create a directory called "
", containing a "
" file along with any extracted fonts and images. Various options are available for controlling and enriching the XML output.
- Mac OS X
- 32-bit and 64-bit versions available for all platforms
- other platforms: portable C++ source code is available. Contact us for details.
For conversion to plain text (instead of XML), consider XpdfText
To arrange for an evaluation copy of PDFdeconstruct, please contact us.