How to use
AbstractNonIncrementalJob
in
datafu.hourglass.jobs

Best Java code snippets using datafu.hourglass.jobs.AbstractNonIncrementalJob (Showing top 1 results out of 315)

if (!getFileSystem().exists(getOutputPath()))
 getFileSystem().mkdirs(getOutputPath());
if (getInputPaths().size() > 1)
List<DatePath> inputs = PathUtils.findNestedDatedPaths(getFileSystem(), getInputPaths().get(0));
DateRange dateRange = DateRangePlanner.getDateRange(getStartDate(), getEndDate(), dates, getDaysAgo(), getNumDays(), true);
Schema inputSchema = PathUtils.getSchemaFromPath(getFileSystem(),latestInput.getPath());
ReduceEstimator estimator = new ReduceEstimator(getFileSystem(),getProperties());
Path timestampOutputPath = new Path(getOutputPath(),PathUtils.datedPathFormat.format(latestInput.getDate()));
                   getConf(),
                   getName() + "-" + PathUtils.datedPathFormat.format(latestInput.getDate()),            
                   inputPaths,
                   "/tmp" + timestampOutputPath.toString(),
job.setCountersParentPath(getCountersParentPath());
AvroJob.setMapOutputKeySchema(job, getMapOutputKeySchema());
AvroJob.setMapOutputValueSchema(job, getMapOutputValueSchema());
AvroJob.setOutputKeySchema(job, getReduceOutputSchema());
if (getNumReducers() != null)

Javadoc

Base class for Hadoop jobs that consume time-partitioned data in a non-incremental way. Typically this is only used for comparing incremental jobs against a non-incremental baseline. It is essentially the same as AbstractPartitionCollapsingIncrementalJobwithout all the incremental features.

Jobs extending this class consume input data partitioned according to yyyy/MM/dd. Only a single input path is supported. The output will be written to a directory in the output path with name format yyyyMMdd derived from the end of the time window that is consumed.

This class has the same configuration and methods as TimeBasedJob. In addition it also recognizes the following properties:

combine.inputs - True if inputs should be combined (defaults to false)
num.reducers.bytes.per.reducer - Number of input bytes per reducer

When combine.inputs is true, then CombinedAvroKeyInputFormat is used instead of AvroKeyInputFormat. This enables a single map task to consume more than one file.

The num.reducers.bytes.per.reducer property controls the number of reducers to use based on the input size. The total size of the input files is divided by this number and then rounded up.

Most used methods

config
getCombinerClass
Gets the combiner class.
getConf
getCountersParentPath
getDaysAgo
getEndDate
getFileSystem
getInputPaths
getMapOutputKeySchema
Gets the key schema for the map output.
getMapOutputValueSchema
Gets the value schema for the map output.
getMapperClass
Gets the mapper class.
getName

Popular in Java

Reading from database using SQL prepared statement
notifyDataSetChanged (ArrayAdapter)
setContentView (Activity)
startActivity (Activity)
OutputStream (java.io)
A writable sink for bytes.Most clients will use output streams that write data to the file system (
Deque (java.util)
A linear collection that supports element insertion and removal at both ends. The name deque is shor
ConcurrentHashMap (java.util.concurrent)
A plug-in replacement for JDK1.5 java.util.concurrent.ConcurrentHashMap. This version is based on or
ReentrantLock (java.util.concurrent.locks)
A reentrant mutual exclusion Lock with the same basic behavior and semantics as the implicit monitor
VirtualMachine (com.sun.tools.attach)
A Java virtual machine. A VirtualMachine represents a Java virtual machine to which this Java vir
Graphics2D (java.awt)
This Graphics2D class extends the Graphics class to provide more sophisticated control overgraphics
Top PhpStorm plugins

How to useAbstractNonIncrementalJob in datafu.hourglass.jobs

Best Java code snippets using datafu.hourglass.jobs.AbstractNonIncrementalJob (Showing top 1 results out of 315)

How to use
AbstractNonIncrementalJob
in
datafu.hourglass.jobs