Base class for Hadoop jobs that consume time-partitioned data
in a non-incremental way. Typically this is only used for comparing incremental
jobs against a non-incremental baseline.
It is essentially the same as
AbstractPartitionCollapsingIncrementalJobwithout all the incremental features.
Jobs extending this class consume input data partitioned according to yyyy/MM/dd.
Only a single input path is supported. The output will be written to a directory
in the output path with name format yyyyMMdd derived from the end of the time
window that is consumed.
This class has the same configuration and methods as
TimeBasedJob.
In addition it also recognizes the following properties:
- combine.inputs - True if inputs should be combined (defaults to false)
- num.reducers.bytes.per.reducer - Number of input bytes per reducer
When combine.inputs is true, then CombinedAvroKeyInputFormat is used
instead of AvroKeyInputFormat. This enables a single map task to consume more than
one file.
The num.reducers.bytes.per.reducer property controls the number of reducers to
use based on the input size. The total size of the input files is divided by this number
and then rounded up.