Syncfusion AI Assistant

How can I help you?

PdfDataExtractor

6 Feb 20262 minutes to read

Represents a utility for extracting data from a PDF document.

// Load an existing PDF document
let document: PdfDocument = new PdfDocument(data, password);
// Initialize a new instance of the `PdfDataExtractor` class
let extractor: PdfDataExtractor = new PdfDataExtractor(document);
// Extract `TextLine` from the PDF document.
let textLines: Array<TextLine> = extractor.extractTextLines({ startPageIndex: 0, endPageIndex: document.pageCount-1});
// Save the document
document.save('output.pdf');
// Destroy the document
document.destroy();

Methods

extractImages

//Extract all image information from the PDF document

Returns Promise

extractImages

Extract all image information from the PDF document

Parameter Type Description
options Object The options to specify the page range to be selected.

Returns Promise

extractText

Extract text from the PDF document

Returns string

extractText

Extract text from the page ranges specified by start and end page number

Parameter Type Description
options Object Options to specify the page range to be selected and to extract the text.

Returns string

extractTextLines

Extract TextLine collection from the PDF document.

Returns TextLine[]

extractTextLines

Extract TextLine from the PDF document.

Parameter Type Description
options Object The options to specify the page range to be selected.

Returns TextLine[]

getStructureElement

Gets the root structure element of the PDF document.

Returns PdfStructureElement

getStructureElements

Gets the structure elements of PDF page.

Parameter Type Description
page PdfPage PDF page

Returns PdfStructureElement[]