How can I help you?
PdfDataExtractor
6 Feb 20262 minutes to read
Represents a utility for extracting data from a PDF document.
// Load an existing PDF document
let document: PdfDocument = new PdfDocument(data, password);
// Initialize a new instance of the `PdfDataExtractor` class
let extractor: PdfDataExtractor = new PdfDataExtractor(document);
// Extract `TextLine` from the PDF document.
let textLines: Array<TextLine> = extractor.extractTextLines({ startPageIndex: 0, endPageIndex: document.pageCount-1});
// Save the document
document.save('output.pdf');
// Destroy the document
document.destroy();Methods
extractImages
//Extract all image information from the PDF document
Returns Promise
extractImages
Extract all image information from the PDF document
| Parameter | Type | Description |
|---|---|---|
| options | Object |
The options to specify the page range to be selected. |
Returns Promise
extractText
Extract text from the PDF document
Returns string
extractText
Extract text from the page ranges specified by start and end page number
| Parameter | Type | Description |
|---|---|---|
| options | Object |
Options to specify the page range to be selected and to extract the text. |
Returns string
extractTextLines
Extract TextLine collection from the PDF document.
Returns TextLine[]
extractTextLines
Extract TextLine from the PDF document.
| Parameter | Type | Description |
|---|---|---|
| options | Object |
The options to specify the page range to be selected. |
Returns TextLine[]
getStructureElement
Gets the root structure element of the PDF document.
Returns PdfStructureElement
getStructureElements
Gets the structure elements of PDF page.
| Parameter | Type | Description |
|---|---|---|
| page | PdfPage |
PDF page |
Returns PdfStructureElement[]