Implements
VocabularyExtension to annotate
VocabularyInputTerm from
#getTargetVocabularyIds with data from
#getAnnotationSource. The default
behavior implemented in this base class is to gather data from the named columns in the file, and add this data to
the respective terms when reindexing a supported vocabulary. Setting up the names of the columns is done by the
concrete class, either by
#setupCSVParser the CSV parser to treat the first row as the header
definition, or by explicitly assigning names to columns.
To let the first row be parsed as the column names:
protected CSVFormat setupCSVParser(Vocabulary vocabulary)}
To explicitly name columns:
protected CSVFormat setupCSVParser(Vocabulary vocabulary)}
With the default implementation of
#processCSVRecordRow, having a column named
id is mandatory.
Columns that are not named are ignored.
Missing, empty, or whitespace-only cells will be ignored.
If multiple rows for the same term identifier exists, then the values are accumulated in lists of values.
If one or more of the fields parsed happen to already have values already in the term being extended, then the
existing values will be discarded and replaced with the data read from the input file.
If multiple rows for the same term identifier exists, then the values are accumulated in lists of values. If in the
schema definition a field is set as non-multi-valued, then it's the responsibility of the user to make sure that only
one value will be specified for such fields. If a value is specified multiple times in the input file, then it will
be added multiple times in the field.
Example: for the following parser set-up:
CSVFormat.CSV.withHeader("id", null, "symptom", null, "frequency")
and the following input file:
MIM:162200,"NEUROFIBROMATOSIS, TYPE I",HP:0009737,"Lisch nodules",HP:0040284,HPO:curators
the following fields will be added:
"symptom"
"HP:0009737",
HP:0001256
"frequency"
"HP:0040284",
HP:0040283,
"HP:0040284"