PdfDataExtractor

6 Feb 20262 minutes to read

Represents a utility for extracting data from a PDF document.

// Load an existing PDF document
let document: PdfDocument = new PdfDocument(data, password);
// Initialize a new instance of the `PdfDataExtractor` class
let extractor: PdfDataExtractor = new PdfDataExtractor(document);
// Extract `TextLine` from the PDF document.
let textLines: Array<TextLine> = extractor.extractTextLines({ startPageIndex: 0, endPageIndex: document.pageCount-1});
// Save the document
document.save('output.pdf');
// Destroy the document
document.destroy();

Methods

extractImages

//Extract all image information from the PDF document

Returns Promise

extractImages

Extract all image information from the PDF document

Parameter	Type	Description
options	`Object`	The options to specify the page range to be selected.

Returns Promise

extractText

Extract text from the PDF document

Returns string

extractText

Extract text from the page ranges specified by start and end page number

Parameter	Type	Description
options	`Object`	Options to specify the page range to be selected and to extract the text.

Returns string

extractTextLines

Extract TextLine collection from the PDF document.

Returns TextLine[]

extractTextLines

Extract TextLine from the PDF document.

Parameter	Type	Description
options	`Object`	The options to specify the page range to be selected.

Returns TextLine[]

getStructureElement

Gets the root structure element of the PDF document.

Returns PdfStructureElement

getStructureElements

Gets the structure elements of PDF page.

Parameter	Type	Description
page	`PdfPage`	PDF page

Returns PdfStructureElement[]

Search docs

Ask Syncfusion AI Assistant

Search docs

Ask Syncfusion AI Assistant

PdfDataExtractor

Methods

extractImages

extractImages

extractText

extractText

extractTextLines

extractTextLines

getStructureElement

getStructureElements