Tabnine Logo
PCollection.by
Code IndexAdd Tabnine to your IDE (free)

How to use
by
method
in
org.apache.crunch.PCollection

Best Java code snippets using org.apache.crunch.PCollection.by (Showing top 8 results out of 315)

origin: spotify/crunch-lib

/**
 * Key a PCollection of Avro records by a String field name. This is less safe than writing a custom MapFn, but it
 * could significantly reduce code volume in cases that need a lot of disparate collections to be joined or processed
 * according to key values.
 * @param collection PCollection of Avro records to process
 * @param fieldPath The Avro schema field name of the field to key on. Use . separated names for nested records
 * @param fieldType PType of the field you wish to extract from the Avro record.
 * @param <T> record type
 * @return supplied collection keyed by the field named fieldName
 */
public static <T extends SpecificRecord, F> PTable<F, T> keyByAvroField(PCollection<T> collection, String fieldPath, PType<F> fieldType) {
 Class<T> recordType = collection.getPType().getTypeClass();
 return collection.by(new AvroExtractMapFn<T, F>(recordType, fieldPath), fieldType);
}
origin: cloudera/search

/** Randomizes the order of the items in the collection via a MapReduce job */
private static <T> PCollection<T> randomize(PCollection<T> items) {
 PTable<Long, T> table = items.by("randomize", new RandomizeFn<T>(), Writables.longs());
 table = Sort.sort(table, Sort.Order.ASCENDING);
 return table.values();
}
origin: org.apache.crunch/crunch-core

/**
 * Creates a {@code PCollection<T>} that has the same contents as its input argument but will
 * be written to a fixed number of output files. This is useful for map-only jobs that process
 * lots of input files but only write out a small amount of input per task.
 * 
 * @param pc The {@code PCollection<T>} to rebalance
 * @param numPartitions The number of output partitions to create
 * @return A rebalanced {@code PCollection<T>} with the same contents as the input
 */
public static <T> PCollection<T> shard(PCollection<T> pc, int numPartitions) {
 return pc.by(new ShardFn<T>(), pc.getTypeFamily().ints())
   .groupByKey(numPartitions)
   .ungroup()
   .values();
}

origin: kite-sdk/kite-examples

.by(new GetSessionKey(), Avros.strings())
.groupByKey()
.parallelDo(new MakeSession(), Avros.specifics(Session.class));
origin: org.apache.crunch/crunch-core

/**
 * Sorts the {@code PCollection} of {@link TupleN}s using the specified column
 * ordering and a client-specified number of reducers.
 * 
 * @return a {@code PCollection} representing the sorted collection.
 */
public static <T extends Tuple> PCollection<T> sortTuples(PCollection<T> collection, int numReducers,
  ColumnOrder... columnOrders) {
 PType<T> pType = collection.getPType();
 SortFns.KeyExtraction<T> ke = new SortFns.KeyExtraction<T>(pType, columnOrders);
 PTable<Object, T> pt = collection.by(ke.getByFn(), ke.getKeyType());
 Configuration conf = collection.getPipeline().getConfiguration();
 GroupingOptions options = buildGroupingOptions(pt, conf, numReducers, columnOrders);
 return pt.groupByKey(options).ungroup().values();
}
origin: kite-sdk/kite

GetStorageKey<E> getKey = new GetStorageKey<E>(view, numPartitionWriters);
PTable<Pair<GenericData.Record, Integer>, E> table = collection
  .by(getKey, Avros.pairs(Avros.generics(getKey.schema()), Avros.ints()));
PGroupedTable<Pair<GenericData.Record, Integer>, E> grouped =
  numWriters > 0 ? table.groupByKey(numWriters) : table.groupByKey();
origin: org.apache.crunch/crunch-hbase

PTable<ByteBuffer, C> cellsByRow = cells.by(new ExtractRowFn<C>(), bytes());
final int versions = scan.getMaxVersions();
return cellsByRow.groupByKey().parallelDo("CombineKeyValueIntoRow",
origin: apache/crunch

PTable<ByteBuffer, C> cellsByRow = cells.by(new ExtractRowFn<C>(), bytes());
final int versions = scan.getMaxVersions();
return cellsByRow.groupByKey().parallelDo("CombineKeyValueIntoRow",
org.apache.crunchPCollectionby

Javadoc

Apply the given map function to each element of this instance in order to create a PTable.

Popular methods of PCollection

  • parallelDo
    Applies the given doFn to the elements of this PCollection and returns a new PCollection that is the
  • getPType
    Returns the PType of this PCollection.
  • write
    Write the contents of this PCollection to the given Target, using the given Target.WriteMode to hand
  • materialize
    Returns a reference to the data set represented by this PCollection that may be used by the client t
  • getPipeline
    Returns the Pipeline associated with this PCollection.
  • getTypeFamily
    Returns the PTypeFamily of this PCollection.
  • count
    Returns a PTable instance that contains the counts of each unique element of this PCollection.
  • aggregate
    Returns a PCollection that contains the result of aggregating all values in this instance.
  • asReadable
  • cache
    Marks this data as cached using the given CachingOptions. Cached PCollections will only be processed
  • filter
    Apply the given filter function to this instance and return the resulting PCollection.
  • first
  • filter,
  • first,
  • getName,
  • getSize,
  • union

Popular in Java

  • Making http post requests using okhttp
  • scheduleAtFixedRate (Timer)
  • getApplicationContext (Context)
  • requestLocationUpdates (LocationManager)
  • Window (java.awt)
    A Window object is a top-level window with no borders and no menubar. The default layout for a windo
  • Runnable (java.lang)
    Represents a command that can be executed. Often used to run code in a different Thread.
  • Charset (java.nio.charset)
    A charset is a named mapping between Unicode characters and byte sequences. Every Charset can decode
  • SortedSet (java.util)
    SortedSet is a Set which iterates over its elements in a sorted order. The order is determined eithe
  • LoggerFactory (org.slf4j)
    The LoggerFactory is a utility class producing Loggers for various logging APIs, most notably for lo
  • Option (scala)
  • Top 17 Plugins for Android Studio
Tabnine Logo
  • Products

    Search for Java codeSearch for JavaScript code
  • IDE Plugins

    IntelliJ IDEAWebStormVisual StudioAndroid StudioEclipseVisual Studio CodePyCharmSublime TextPhpStormVimAtomGoLandRubyMineEmacsJupyter NotebookJupyter LabRiderDataGripAppCode
  • Company

    About UsContact UsCareers
  • Resources

    FAQBlogTabnine AcademyStudentsTerms of usePrivacy policyJava Code IndexJavascript Code Index
Get Tabnine for your IDE now