How to use
getDocId
method
in
de.julielab.jules.types.Header

Best Java code snippets using de.julielab.jules.types.Header.getDocId (Showing top 3 results out of 315)

@Override
public void process(JCas jCas) throws AnalysisEngineProcessException {
  Header header = selectSingle(jCas, Header.class);
  File pdfFile = new File(header.getSource());
  checkFileExists(pdfFile);
  LOG.debug("extracting {}", pdfFile.getName());
  try {
    PDFTextStream pdf;
    if (pdfFile.getName().endsWith("zip")) {
      InputStream is = unzipUniqueFileAsStream(pdfFile);
      pdf = new PDFTextStream(is, removeExtension(pdfFile.getName()));
    } else {
      pdf = new PDFTextStream(pdfFile);
    }
    BlockHandler blueHandler = new BlockHandler();
    pdf.pipe(blueHandler);
    pdf.close();
    PdfCollectionReader.extractText(jCas, blueHandler.getDoc(),
        header.getDocId(), expandAbbrevs);
    if (extractTables)
      PdfCollectionReader
          .extractTables(tableExtractor, pdfFile, jCas);
    // if (extractReferences)
    // extractReferences(f, jcas);
  } catch (Throwable t) {
    LOG.error("error extracting " + header.getSource(), t);
    // throw new AnalysisEngineProcessException(e);
  }
}

@Override
public void getNext(JCas jcas) throws IOException, CollectionException {
  File f = fileIterator.next();
  Header header = new Header(jcas);
  // .* removes the tmp part
  header.setDocId(f.getName().replaceAll("\\.pdf.*", ""));
  header.setSource(f.getAbsolutePath());
  header.addToIndexes();
  PDFTextStream pdf = new PDFTextStream(f);
  BlockHandler blueHandler = new BlockHandler();
  pdf.pipe(blueHandler);
  pdf.close();
  extractText(jcas, blueHandler.getDoc(), header.getDocId(),
      expandAbbrevs);
  if (extractTables)
    extractTables(tableExtractor, f, jcas);
  // printHtml(jcas, new File("target/" + header.getDocId() + ".html"));
}

  || typeName.equals(TypeSystem.PUBMED_HEADER)) {
Header h = (Header) a;
doc.put(ID, h.getDocId()); // LATER set prefix
doc.put(PM_ID, parseInt(h.getDocId()));
doc.put(TITLE, h.getTitle());

Javadoc

getter for docId - gets The identifier of the document with respect to its source. E.g.: PMID in PubMed. In combination with the source, this is a unique identifier for a document, C

Popular methods of Header

<init>
addToIndexes
getSource
getter for source - gets The source of the document (e.g. WWW, Database), C
setDocId
setter for docId - sets The identifier of the document with respect to its source. E.g.: PMID in Pub
getTitle
getter for title - gets The title of the document, C
readObject
Write your own initialization here
setBegin
setComponentId
setEnd
setSource
setter for source - sets The source of the document (e.g. WWW, Database), C
setTitle
setter for title - sets The title of the document, C

setTitle

Popular in Java

Start an intent from android
getApplicationContext (Context)
runOnUiThread (Activity)
addToBackStack (FragmentTransaction)
Connection (java.sql)
A connection represents a link from a Java application to a database. All SQL statements and results
HashMap (java.util)
HashMap is an implementation of Map. All optional operations are supported.All elements are permitte
HashSet (java.util)
HashSet is an implementation of a Set. All optional operations (adding and removing) are supported.
DateTimeFormat (org.joda.time.format)
Factory that creates instances of DateTimeFormatter from patterns and styles. Datetime formatting i
JLabel (javax.swing)
JList (javax.swing)
Github Copilot alternatives

How to use getDocIdmethodin de.julielab.jules.types.Header

Best Java code snippets using de.julielab.jules.types.Header.getDocId (Showing top 3 results out of 315)

How to use
getDocId
method
in
de.julielab.jules.types.Header