Returns the total amount of parallelism in the unprocessed part of this reader's current
BoundedSource (as would be returned by
#getCurrentSource). This corresponds
to all unprocessed split point records (see
RangeTracker), including the last split
point returned, in the remainder part of the source.
This function should be implemented only in addition to
#getSplitPointsConsumed() and only if an exact value can be returned.
Consider the following examples: (1) An input that can be read in parallel down to the
individual records, such as
CountingSource#upTo, is called "perfectly splittable".
(2) a "block-compressed" file format such as
AvroIO, in which a block of records has
to be read as a whole, but different blocks can be read in parallel. (3) An "unsplittable"
input such as a cursor in a database.
Assume for examples (1) and (2) that the number of records or blocks remaining is known:
- Any
BoundedReader for which the last call to
#start or
#advance has returned true should should not return 0, because this reader itself
represents parallelism at least 1. This condition holds independent of whether the
input is splittable.
- A finished reader (for which
#start or
#advance) has returned false
should return a value of 0. This condition holds independent of whether the input is
splittable.
- For example 1: After returning record #30 (starting at 1) out of 50 in a perfectly
splittable 50-record input, this value should be 21 (20 remaining + 1 current) if the
total number of records is known.
- For example 2: After returning a record in block 3 in a block-compressed file
consisting of 5 blocks, this value should be 3 (since blocks 4 and 5 can be processed
in parallel by new readers produced via dynamic work rebalancing, while the current
reader continues processing block 3) if the total number of blocks is known.
- For example (3): a reader for any non-empty unsplittable input, should return 1 until
it is finished, at which point it should return 0.
- For any reader: After returning the last split point in a file (e.g., the last record
in example (1), the first record in the last block for example (2), or the first record
in the file for example (3), this value should be 1: apart from the current task, no
additional remainder can be split off.
Defaults to
#SPLIT_POINTS_UNKNOWN. Any value less than 0 will be interpreted as
unknown.
Thread safety
See the javadoc on
BoundedReader for information about thread safety.