How to use
setExtractUniqueInlineImagesOnly
method
in
org.apache.tika.parser.pdf.PDFPureJavaParserConfig

Best Java code snippets using org.apache.tika.parser.pdf.PDFPureJavaParserConfig.setExtractUniqueInlineImagesOnly (Showing top 2 results out of 315)

@Field
void setExtractUniqueInlineImagesOnly(boolean extractUniqueInlineImagesOnly) {
  defaultConfig.setExtractUniqueInlineImagesOnly(extractUniqueInlineImagesOnly);
}

    getBooleanProp(props.getProperty("extractInlineImages"),
        getExtractInlineImages()));
setExtractUniqueInlineImagesOnly(
    getBooleanProp(props.getProperty("extractUniqueInlineImagesOnly"),
        getExtractUniqueInlineImagesOnly()));

Javadoc

Multiple pages within a PDF file might refer to the same underlying image. If #extractUniqueInlineImagesOnly is set to false, the parser will call the EmbeddedExtractor each time the image appears on a page. This might be desired for some use cases. However, to avoid duplication of extracted images, set this to true. The default is true.

Note that uniqueness is determined only by the underlying PDF COSObject id, not by file hash or similar equality metric. If the PDF actually contains multiple copies of the same image -- all with different object ids -- then all images will be extracted.

For this parameter to have any effect, #extractInlineImages must be set to true.

Because of TIKA-1742 -- to avoid infinite recursion -- no matter the setting of this parameter, the extractor will only pull out one copy of each image per page. This parameter tries to capture uniqueness across the entire document.

Popular methods of PDFPureJavaParserConfig

Popular in Java

Reactive rest calls using spring rest template
compareTo (BigDecimal)
addToBackStack (FragmentTransaction)
notifyDataSetChanged (ArrayAdapter)
Scanner (java.util)
A parser that parses a text string of primitive types and strings with the help of regular expressio
StringTokenizer (java.util)
Breaks a string into tokens; new code should probably use String#split.> // Legacy code: StringTo
BlockingQueue (java.util.concurrent)
A java.util.Queue that additionally supports operations that wait for the queue to become non-empty
Executor (java.util.concurrent)
An object that executes submitted Runnable tasks. This interface provides a way of decoupling task s
JarFile (java.util.jar)
JarFile is used to read jar entries and their associated data from jar files.
Runner (org.openjdk.jmh.runner)
CodeWhisperer alternatives

How to use setExtractUniqueInlineImagesOnlymethodin org.apache.tika.parser.pdf.PDFPureJavaParserConfig

Best Java code snippets using org.apache.tika.parser.pdf.PDFPureJavaParserConfig.setExtractUniqueInlineImagesOnly (Showing top 2 results out of 315)

How to use
setExtractUniqueInlineImagesOnly
method
in
org.apache.tika.parser.pdf.PDFPureJavaParserConfig