How to use
readRowKeysFromParquet
method
in
com.uber.hoodie.common.util.ParquetUtils

Best Java code snippets using com.uber.hoodie.common.util.ParquetUtils.readRowKeysFromParquet (Showing top 3 results out of 315)

@Test
public void testHoodieWriteSupport() throws Exception {
 List<String> rowKeys = new ArrayList<>();
 for (int i = 0; i < 1000; i++) {
  rowKeys.add(UUID.randomUUID().toString());
 }
 String filePath = basePath + "/test.parquet";
 writeParquetFile(filePath, rowKeys);
 // Read and verify
 List<String> rowKeysInFile = new ArrayList<>(
   ParquetUtils.readRowKeysFromParquet(HoodieTestUtils.getDefaultHadoopConf(), new Path(filePath)));
 Collections.sort(rowKeysInFile);
 Collections.sort(rowKeys);
 assertEquals("Did not read back the expected list of keys", rowKeys, rowKeysInFile);
 BloomFilter filterInFile = ParquetUtils.readBloomFilterFromParquetMetadata(HoodieTestUtils.getDefaultHadoopConf(),
   new Path(filePath));
 for (String rowKey : rowKeys) {
  assertTrue("key should be found in bloom filter", filterInFile.mightContain(rowKey));
 }
}

assertEquals("file should contain 100 records", ParquetUtils.readRowKeysFromParquet(jsc.hadoopConfiguration(),
  new Path(basePath, testPartitionPath + "/" + FSUtils.makeDataFileName(commitTime1, 0, file1))).size(), 100);
Path newFile = new Path(basePath, testPartitionPath + "/" + FSUtils.makeDataFileName(commitTime2, 0, file1));
assertEquals("file should contain 140 records",
  ParquetUtils.readRowKeysFromParquet(jsc.hadoopConfiguration(), newFile).size(), 140);

assertEquals("file should contain 100 records", ParquetUtils.readRowKeysFromParquet(jsc.hadoopConfiguration(),
  new Path(basePath, testPartitionPath + "/" + FSUtils.makeDataFileName(commitTime1, 0, file1))).size(), 100);
Path newFile = new Path(basePath, testPartitionPath + "/" + FSUtils.makeDataFileName(commitTime2, 0, file1));
assertEquals("file should contain 140 records",
  ParquetUtils.readRowKeysFromParquet(jsc.hadoopConfiguration(), newFile).size(), 140);

Javadoc

Read the rowKey list from the given parquet file.

Popular methods of ParquetUtils

filterParquetRowKeys
readBloomFilterFromParquetMetadata
readAvroRecords
NOTE: This literally reads the entire file contents, thus should be used with caution.
readMetadata
readMinMaxRecordKeys
readParquetFooter
readSchema
Get the schema of the given parquet file.

Popular in Java

Finding current android device location
getSystemService (Context)
onRequestPermissionsResult (Fragment)
getContentResolver (Context)
Runnable (java.lang)
Represents a command that can be executed. Often used to run code in a different Thread.
Format (java.text)
The base class for all formats. This is an abstract base class which specifies the protocol for clas
BlockingQueue (java.util.concurrent)
A java.util.Queue that additionally supports operations that wait for the queue to become non-empty
ReentrantLock (java.util.concurrent.locks)
A reentrant mutual exclusion Lock with the same basic behavior and semantics as the implicit monitor
ZipFile (java.util.zip)
This class provides random read access to a zip file. You pay more to read the zip file's central di
XPath (javax.xml.xpath)
XPath provides access to the XPath evaluation environment and expressions. Evaluation of XPath Expr
Best plugins for Eclipse

How to use readRowKeysFromParquetmethodin com.uber.hoodie.common.util.ParquetUtils

Best Java code snippets using com.uber.hoodie.common.util.ParquetUtils.readRowKeysFromParquet (Showing top 3 results out of 315)

How to use
readRowKeysFromParquet
method
in
com.uber.hoodie.common.util.ParquetUtils