The abstract Combiner class is used to build combiners for the
Job.
Those Combiners are distributed inside of the cluster and are running alongside
the
Mapper implementations in the same node.
Combiners are called in a threadsafe way so internal locking is not required.
Combiners are normally used to build intermediate results on the mapping nodes to
lower the traffic overhead between the different nodes before the reducing phase.
Combiners need to be capable of combining data in multiple chunks to create a more
streaming like internal behavior.
A simple Combiner implementation in combination with a
Reducer could look
like this avg-function implementation.
public class AvgCombiner implements Combiner<Integer, Tuple<Long, Long>>
{
private long count;
private long amount;
public void combine(Integer value)
{
count++;
amount += value;
}
public Tuple<Long, Long> finalizeChunk()
{
Tuple<Long, Long> tuple = new Tuple<>( count, amount );
count = 0;
amount = 0;
return tuple;
}
}
public class SumReducer implements Reducer<Tuple<Long, Long>, Integer>
{
private long count;
private long amount;
public void reduce( Tuple<Long, Long> value )
{
count += value.getFirst();
amount += value.getSecond();
}
public Integer finalizeReduce()
{
return amount / count;
}
}