Tabnine Logo
PCollection.count
Code IndexAdd Tabnine to your IDE (free)

How to use
count
method
in
org.apache.crunch.PCollection

Best Java code snippets using org.apache.crunch.PCollection.count (Showing top 5 results out of 315)

origin: org.apache.crunch/crunch-core

/**
 * Create a list of unique items in the input collection with their count, sorted descending by their frequency.
 * @param input input collection
 * @param <X> record type
 * @return global toplist
 */
public static <X> PTable<X, Long> globalToplist(PCollection<X> input) {
 return negateCounts(negateCounts(input.count()).groupByKey(1).ungroup());
}
origin: spotify/crunch-lib

 /**
  * Create a list of unique items in the input collection with their count, sorted descending by their frequency.
  * @param input input collection
  * @param <X> record type
  * @return global toplist
  */
 public static <X> PTable<X, Long> globalToplist(PCollection<X> input) {
  return SPTables.negateCounts(SPTables.negateCounts(input.count()).groupByKey(1).ungroup());
 }
}
origin: apache/crunch

PTable<String, Long> counts = words.count();
origin: spotify/crunch-lib

/**
 * Calculate a set of percentiles for each key in a numerically-valued table.
 *
 * Percentiles are calculated on a per-key basis by counting, joining and sorting. This is highly scalable, but takes
 * 2 more map-reduce cycles than if you can guarantee that the value set will fit into memory. Use inMemory
 * if you have less than the order of 10M values per key.
 *
 * The percentile definition that we use here is the "nearest rank" defined here:
 * http://en.wikipedia.org/wiki/Percentile#Definition
 *
 * @param table numerically-valued PTable
 * @param p1 First percentile (in the range 0.0 - 1.0)
 * @param pn More percentiles (in the range 0.0 - 1.0)
 * @param <K> Key type of the table
 * @param <V> Value type of the table (must extends java.lang.Number)
 * @return PTable of each key with a collection of pairs of the percentile provided and it's result.
 */
public static <K, V extends Number> PTable<K, Result<V>> distributed(PTable<K, V> table,
    double p1, double... pn) {
 final List<Double> percentileList = createListFromVarargs(p1, pn);
 PTypeFamily ptf = table.getTypeFamily();
 PTable<K, Long> totalCounts = table.keys().count();
 PTable<K, Pair<Long, V>> countValuePairs = totalCounts.join(table);
 PTable<K, Pair<V, Long>> valueCountPairs =
     countValuePairs.mapValues(new SwapPairComponents<Long, V>(), ptf.pairs(table.getValueType(), ptf.longs()));
 return SecondarySort.sortAndApply(
     valueCountPairs,
     new DistributedPercentiles<K, V>(percentileList),
     ptf.tableOf(table.getKeyType(), Result.pType(table.getValueType())));
}
origin: org.apache.crunch/crunch-core

/**
 * Calculate a set of quantiles for each key in a numerically-valued table.
 *
 * Quantiles are calculated on a per-key basis by counting, joining and sorting. This is highly scalable, but takes
 * 2 more map-reduce cycles than if you can guarantee that the value set will fit into memory. Use inMemory
 * if you have less than the order of 10M values per key.
 *
 * The quantile definition that we use here is the "nearest rank" defined here:
 * http://en.wikipedia.org/wiki/Percentile#Definition
 *
 * @param table numerically-valued PTable
 * @param p1 First quantile (in the range 0.0 - 1.0)
 * @param pn More quantiles (in the range 0.0 - 1.0)
 * @param <K> Key type of the table
 * @param <V> Value type of the table (must extends java.lang.Number)
 * @return PTable of each key with a collection of pairs of the quantile provided and it's result.
 */
public static <K, V extends Number> PTable<K, Result<V>> distributed(PTable<K, V> table,
    double p1, double... pn) {
 final List<Double> quantileList = createListFromVarargs(p1, pn);
 PTypeFamily ptf = table.getTypeFamily();
 PTable<K, Long> totalCounts = table.keys().count();
 PTable<K, Pair<Long, V>> countValuePairs = totalCounts.join(table);
 PTable<K, Pair<V, Long>> valueCountPairs =
     countValuePairs.mapValues(new SwapPairComponents<Long, V>(), ptf.pairs(table.getValueType(), ptf.longs()));
 return SecondarySort.sortAndApply(
     valueCountPairs,
     new DistributedQuantiles<K, V>(quantileList),
     ptf.tableOf(table.getKeyType(), Result.pType(table.getValueType())));
}
org.apache.crunchPCollectioncount

Javadoc

Returns a PTable instance that contains the counts of each unique element of this PCollection.

Popular methods of PCollection

  • parallelDo
    Applies the given doFn to the elements of this PCollection and returns a new PCollection that is the
  • getPType
    Returns the PType of this PCollection.
  • by
    Apply the given map function to each element of this instance in order to create a PTable.
  • write
    Write the contents of this PCollection to the given Target, using the given Target.WriteMode to hand
  • materialize
    Returns a reference to the data set represented by this PCollection that may be used by the client t
  • getPipeline
    Returns the Pipeline associated with this PCollection.
  • getTypeFamily
    Returns the PTypeFamily of this PCollection.
  • aggregate
    Returns a PCollection that contains the result of aggregating all values in this instance.
  • asReadable
  • cache
    Marks this data as cached using the given CachingOptions. Cached PCollections will only be processed
  • filter
    Apply the given filter function to this instance and return the resulting PCollection.
  • first
  • filter,
  • first,
  • getName,
  • getSize,
  • union

Popular in Java

  • Reading from database using SQL prepared statement
  • getExternalFilesDir (Context)
  • scheduleAtFixedRate (ScheduledExecutorService)
  • putExtra (Intent)
  • URLEncoder (java.net)
    This class is used to encode a string using the format required by application/x-www-form-urlencoded
  • Charset (java.nio.charset)
    A charset is a named mapping between Unicode characters and byte sequences. Every Charset can decode
  • DateFormat (java.text)
    Formats or parses dates and times.This class provides factories for obtaining instances configured f
  • PriorityQueue (java.util)
    A PriorityQueue holds elements on a priority heap, which orders the elements according to their natu
  • JPanel (javax.swing)
  • Join (org.hibernate.mapping)
  • Best plugins for Eclipse
Tabnine Logo
  • Products

    Search for Java codeSearch for JavaScript code
  • IDE Plugins

    IntelliJ IDEAWebStormVisual StudioAndroid StudioEclipseVisual Studio CodePyCharmSublime TextPhpStormVimGoLandRubyMineEmacsJupyter NotebookJupyter LabRiderDataGripAppCode
  • Company

    About UsContact UsCareers
  • Resources

    FAQBlogTabnine AcademyTerms of usePrivacy policyJava Code IndexJavascript Code Index
Get Tabnine for your IDE now