org.apache.flink.api.java.DataSet.aggregate java code examples

/**
 * Syntactic sugar for aggregate (SUM, field).
 * @param field The index of the Tuple field on which the aggregation function is applied.
 * @return An AggregateOperator that represents the summed DataSet.
 *
 * @see org.apache.flink.api.java.operators.AggregateOperator
 */
public AggregateOperator<T> sum(int field) {
  return aggregate(Aggregations.SUM, field);
}

/**
 * Syntactic sugar for {@link #aggregate(Aggregations, int)} using {@link Aggregations#MAX} as
 * the aggregation function.
 *
 * <p><strong>Note:</strong> This operation is not to be confused with {@link #maxBy(int...)},
 * which selects one element with maximum value at the specified field positions.
 *
 * @param field The index of the Tuple field on which the aggregation function is applied.
 * @return An AggregateOperator that represents the max'ed DataSet.
 *
 * @see #aggregate(Aggregations, int)
 * @see #maxBy(int...)
 */
public AggregateOperator<T> max(int field) {
  return aggregate(Aggregations.MAX, field);
}

/**
 * Syntactic sugar for {@link #aggregate(Aggregations, int)} using {@link Aggregations#MIN} as
 * the aggregation function.
 *
 * <p><strong>Note:</strong> This operation is not to be confused with {@link #minBy(int...)},
 * which selects one element with the minimum value at the specified field positions.
 *
 * @param field The index of the Tuple field on which the aggregation function is applied.
 * @return An AggregateOperator that represents the min'ed DataSet.
 *
 * @see #aggregate(Aggregations, int)
 * @see #minBy(int...)
 */
public AggregateOperator<T> min(int field) {
  return aggregate(Aggregations.MIN, field);
}

  @Test
  public void testAggregationTypes() {
    try {
      final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
      DataSet<Tuple5<Integer, Long, String, Long, Integer>> tupleDs = env.fromCollection(emptyTupleData, tupleTypeInfo);

      // should work: multiple aggregates
      tupleDs.aggregate(Aggregations.SUM, 0).and(Aggregations.MIN, 4);

      // should work: nested aggregates
      tupleDs.aggregate(Aggregations.MIN, 2).aggregate(Aggregations.SUM, 1);

      // should not work: average on string
      try {
        tupleDs.aggregate(Aggregations.SUM, 2);
        Assert.fail();
      } catch (UnsupportedAggregationTypeException iae) {
        // we're good here
      }
    }
    catch (Exception e) {
      System.err.println(e.getMessage());
      e.printStackTrace();
      Assert.fail(e.getMessage());
    }
  }
}

@Test
public void testFieldsAggregate() {
  final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
  DataSet<Tuple5<Integer, Long, String, Long, Integer>> tupleDs = env.fromCollection(emptyTupleData, tupleTypeInfo);
  // should work
  try {
    tupleDs.aggregate(Aggregations.SUM, 1);
  } catch (Exception e) {
    Assert.fail();
  }
  // should not work: index out of bounds
  try {
    tupleDs.aggregate(Aggregations.SUM, 10);
    Assert.fail();
  } catch (IllegalArgumentException iae) {
    // we're good here
  } catch (Exception e) {
    Assert.fail();
  }
  // should not work: not applied to tuple dataset
  DataSet<Long> longDs = env.fromCollection(emptyLongData, BasicTypeInfo.LONG_TYPE_INFO);
  try {
    longDs.aggregate(Aggregations.MIN, 1);
    Assert.fail();
  } catch (InvalidProgramException uoe) {
    // we're good here
  } catch (Exception e) {
    Assert.fail();
  }
}

@Test
public void testFullAggregateOfMutableValueTypes() throws Exception {
  /*
   * Full Aggregate of mutable value types
   */
  final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
  DataSet<Tuple3<IntValue, LongValue, StringValue>> ds = ValueCollectionDataSets.get3TupleDataSet(env);
  DataSet<Tuple2<IntValue, LongValue>> aggregateDs = ds
      .aggregate(Aggregations.SUM, 0)
      .and(Aggregations.MAX, 1)
      .project(0, 1);
  List<Tuple2<IntValue, LongValue>> result = aggregateDs.collect();
  String expected = "231,6\n";
  compareResultAsTuples(result, expected);
}

@Test
public void testFullAggregate() throws Exception {
  /*
   * Full Aggregate
   */
  final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
  DataSet<Tuple3<Integer, Long, String>> ds = CollectionDataSets.get3TupleDataSet(env);
  DataSet<Tuple2<Integer, Long>> aggregateDs = ds
      .aggregate(Aggregations.SUM, 0)
      .and(Aggregations.MAX, 1)
      .project(0, 1);
  List<Tuple2<Integer, Long>> result = aggregateDs.collect();
  String expected = "231,6\n";
  compareResultAsTuples(result, expected);
}

/**
 * Syntactic sugar for aggregate (SUM, field).
 * @param field The index of the Tuple field on which the aggregation function is applied.
 * @return An AggregateOperator that represents the summed DataSet.
 *
 * @see org.apache.flink.api.java.operators.AggregateOperator
 */
public AggregateOperator<T> sum(int field) {
  return aggregate(Aggregations.SUM, field);
}

/**
 * Syntactic sugar for aggregate (SUM, field).
 * @param field The index of the Tuple field on which the aggregation function is applied.
 * @return An AggregateOperator that represents the summed DataSet.
 *
 * @see org.apache.flink.api.java.operators.AggregateOperator
 */
public AggregateOperator<T> sum(int field) {
  return aggregate(Aggregations.SUM, field);
}

/**
 * Syntactic sugar for {@link #aggregate(Aggregations, int)} using {@link Aggregations#MAX} as
 * the aggregation function.
 *
 * <p><strong>Note:</strong> This operation is not to be confused with {@link #maxBy(int...)},
 * which selects one element with maximum value at the specified field positions.
 *
 * @param field The index of the Tuple field on which the aggregation function is applied.
 * @return An AggregateOperator that represents the max'ed DataSet.
 *
 * @see #aggregate(Aggregations, int)
 * @see #maxBy(int...)
 */
public AggregateOperator<T> max(int field) {
  return aggregate(Aggregations.MAX, field);
}

/**
 * Syntactic sugar for {@link #aggregate(Aggregations, int)} using {@link Aggregations#MIN} as
 * the aggregation function.
 *
 * <p><strong>Note:</strong> This operation is not to be confused with {@link #minBy(int...)},
 * which selects one element with the minimum value at the specified field positions.
 *
 * @param field The index of the Tuple field on which the aggregation function is applied.
 * @return An AggregateOperator that represents the min'ed DataSet.
 *
 * @see #aggregate(Aggregations, int)
 * @see #minBy(int...)
 */
public AggregateOperator<T> min(int field) {
  return aggregate(Aggregations.MIN, field);
}

/**
 * Syntactic sugar for {@link #aggregate(Aggregations, int)} using {@link Aggregations#MIN} as
 * the aggregation function.
 *
 * <p><strong>Note:</strong> This operation is not to be confused with {@link #minBy(int...)},
 * which selects one element with the minimum value at the specified field positions.
 *
 * @param field The index of the Tuple field on which the aggregation function is applied.
 * @return An AggregateOperator that represents the min'ed DataSet.
 *
 * @see #aggregate(Aggregations, int)
 * @see #minBy(int...)
 */
public AggregateOperator<T> min(int field) {
  return aggregate(Aggregations.MIN, field);
}

/**
 * Syntactic sugar for {@link #aggregate(Aggregations, int)} using {@link Aggregations#MAX} as
 * the aggregation function.
 *
 * <p><strong>Note:</strong> This operation is not to be confused with {@link #maxBy(int...)},
 * which selects one element with maximum value at the specified field positions.
 *
 * @param field The index of the Tuple field on which the aggregation function is applied.
 * @return An AggregateOperator that represents the max'ed DataSet.
 *
 * @see #aggregate(Aggregations, int)
 * @see #maxBy(int...)
 */
public AggregateOperator<T> max(int field) {
  return aggregate(Aggregations.MAX, field);
}

private void createAggregationOperation(OperationInfo info) throws IOException {
  DataSet op = (DataSet) sets.get(info.parentID);
  AggregateOperator ao = op.aggregate(info.aggregates[0].agg, info.aggregates[0].field);
  for (int x = 1; x < info.count; x++) {
    ao = ao.and(info.aggregates[x].agg, info.aggregates[x].field);
  }
  sets.put(info.setID, ao.name("Aggregation"));
}

Javadoc

Applies an Aggregate transformation on a non-grouped Tuple DataSet.

Note: Only Tuple DataSets can be aggregated. The transformation applies a built-in Aggregations on a specified field of a Tuple DataSet. Additional aggregation functions can be added to the resulting AggregateOperator by calling AggregateOperator#and(Aggregations,int).

Popular methods of DataSet

map
Applies a Map transformation on this DataSet.The transformation calls a org.apache.flink.api.common.
flatMap
Applies a FlatMap transformation on a DataSet.The transformation calls a org.apache.flink.api.common
output
Emits a DataSet using an OutputFormat. This method adds a data sink to the program. Programs may hav
groupBy
Groups a DataSet using field expressions. A field expression is either the name of a public field or
filter
Applies a Filter transformation on a DataSet.The transformation calls a org.apache.flink.api.common.
join
Initiates a Join transformation.A Join transformation joins the elements of two DataSet on key equal
collect
Convenience method to get the elements of a DataSet as a List. As DataSet can contain a lot of data,
getType
Returns the TypeInformation for the type of this DataSet.
union
Creates a union of this DataSet with an other DataSet. The other DataSet must be of the same data ty
iterate
Initiates an iterative part of the program that executes multiple times and feeds back data sets. Th
writeAsCsv
Writes a Tuple DataSet as CSV file(s) to the specified location.Note: Only a Tuple DataSet can writt
writeAsText
Writes a DataSet as text file(s) to the specified location.For each element of the DataSet the resul

Popular in Java

Reactive rest calls using spring rest template
runOnUiThread (Activity)
getApplicationContext (Context)
addToBackStack (FragmentTransaction)
Pointer (com.sun.jna)
An abstraction for a native pointer data type. A Pointer instance represents, on the Java side, a na
Proxy (java.net)
This class represents proxy server settings. A created instance of Proxy stores a type and an addres
Path (java.nio.file)
Timestamp (java.sql)
A Java representation of the SQL TIMESTAMP type. It provides the capability of representing the SQL
Pattern (java.util.regex)
Patterns are compiled regular expressions. In many cases, convenience methods such as String#matches
Kernel (java.awt.image)
CodeWhisperer alternatives

How to use aggregatemethodin org.apache.flink.api.java.DataSet

Best Java code snippets using org.apache.flink.api.java.DataSet.aggregate (Showing top 14 results out of 315)

How to use
aggregate
method
in
org.apache.flink.api.java.DataSet