Tells the reader to narrow the range of the input it's going to read and give up the
remainder, so that the new range would contain approximately the given fraction of the amount
of data in the current range.
Returns a
BoundedSource representing the remainder.
Detailed description
Assuming the following sequence of calls:
BoundedSource initial = reader.getCurrentSource();
- The "primary" and "residual" sources, when read, should together cover the same set of
records as "initial".
- The current reader should continue to be in a valid state, and continuing to read from
it should, together with the records it already read, yield the same records as would
have been read by "primary".
- The amount of data read by "primary" should ideally represent approximately the given
fraction of the amount of data read by "initial".
For example, a reader that reads a range of offsets
[A, B) in a file might implement
this method by truncating the current range to
[A, A + fraction*(B-A)) and returning a
Source representing the range
[A + fraction*(B-A), B).
This method should return
null if the split cannot be performed for this fraction
while satisfying the semantics above. E.g., a reader that reads a range of offsets in a file
should return
null if it is already past the position in its range corresponding to
the given fraction. In this case, the method MUST have no effect (the reader must behave as
if the method hadn't been called at all).
Statefulness
Since this method (if successful) affects the reader's source, in subsequent invocations
"fraction" should be interpreted relative to the new current source.
Thread safety and blocking
This method will be called concurrently to other methods (however there will not be multiple
concurrent invocations of this method itself), and it is critical for it to be implemented in
a thread-safe way (otherwise data loss is possible).
It is also very important that this method always completes quickly. In particular, it
should not perform or wait on any blocking operations such as I/O, RPCs etc. Violating this
requirement may stall completion of the work item or even cause it to fail.
It is incorrect to make both this method and
#start/
#advance
synchronized, because those methods can perform blocking operations, and then this method
would have to wait for those calls to complete.
org.apache.beam.sdk.io.range.RangeTracker makes it easy to implement this method
safely and correctly.
By default, returns null to indicate that splitting is not possible.