org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormat java code examples

/**
 * Sets up the job for reading from a table snapshot. It bypasses hbase servers and read directly
 * from snapshot files.
 * @param snapshotName The name of the snapshot (of a table) to read from.
 * @param scan The scan instance with the columns, time range etc.
 * @param mapper The mapper class to use.
 * @param outputKeyClass The class of the output key.
 * @param outputValueClass The class of the output value.
 * @param job The current job to adjust. Make sure the passed job is carrying all necessary HBase
 *          configuration.
 * @param addDependencyJars upload HBase jars and jars for any of the configured job classes via
 *          the distributed cache (tmpjars).
 * @param tmpRestoreDir a temporary directory to copy the snapshot files into. Current user should
 *          have write permissions to this directory, and this should not be a subdirectory of
 *          rootdir. After the job is finished, restore directory can be deleted.
 * @throws IOException When setting up the details fails.
 * @see TableSnapshotInputFormat
 */
public static void initTableSnapshotMapperJob(String snapshotName, Scan scan,
  Class<? extends TableMapper> mapper,
  Class<?> outputKeyClass,
  Class<?> outputValueClass, Job job,
  boolean addDependencyJars, Path tmpRestoreDir)
  throws IOException {
 TableSnapshotInputFormat.setInput(job, snapshotName, tmpRestoreDir);
 initTableMapperJob(snapshotName, scan, mapper, outputKeyClass, outputValueClass, job,
  addDependencyJars, false, TableSnapshotInputFormat.class);
 resetCacheConfig(job.getConfiguration());
}

private void verifyWithMockedMapReduce(Job job, int numRegions, int expectedNumSplits,
  byte[] startRow, byte[] stopRow)
  throws IOException, InterruptedException {
 TableSnapshotInputFormat tsif = new TableSnapshotInputFormat();
 List<InputSplit> splits = tsif.getSplits(job);
  when(taskAttemptContext.getConfiguration()).thenReturn(job.getConfiguration());
  RecordReader<ImmutableBytesWritable, Result> rr =
    tsif.createRecordReader(split, taskAttemptContext);
  rr.initialize(split, taskAttemptContext);

@Override
public List<InputSplit> getSplits(final JobContext jobContext) throws IOException, InterruptedException {
  return this.tableSnapshotInputFormat.getSplits(jobContext);
}

@Override
public RecordReader<StaticBuffer, Iterable<Entry>> createRecordReader(final InputSplit inputSplit, final TaskAttemptContext taskAttemptContext) throws IOException, InterruptedException {
  tableReader = tableSnapshotInputFormat.createRecordReader(inputSplit, taskAttemptContext);
  janusgraphRecordReader = new HBaseBinaryRecordReader(tableReader, edgeStoreFamily);
  return janusgraphRecordReader;
}

@Override
public List<InputSplit> getSplits(final JobContext jobContext) throws IOException, InterruptedException {
  return this.tableSnapshotInputFormat.getSplits(jobContext);
}

@Override
public RecordReader<StaticBuffer, Iterable<Entry>> createRecordReader(final InputSplit inputSplit, final TaskAttemptContext taskAttemptContext) throws IOException, InterruptedException {
  tableReader = tableSnapshotInputFormat.createRecordReader(inputSplit, taskAttemptContext);
  janusgraphRecordReader = new HBaseBinaryRecordReader(tableReader, edgeStoreFamily);
  return janusgraphRecordReader;
}

private void verifyWithMockedMapReduce(Job job, int numRegions, int expectedNumSplits,
  byte[] startRow, byte[] stopRow)
  throws IOException, InterruptedException {
 TableSnapshotInputFormat tsif = new TableSnapshotInputFormat();
 List<InputSplit> splits = tsif.getSplits(job);
  when(taskAttemptContext.getConfiguration()).thenReturn(job.getConfiguration());
  RecordReader<ImmutableBytesWritable, Result> rr =
    tsif.createRecordReader(split, taskAttemptContext);
  rr.initialize(split, taskAttemptContext);

                      int numSplitsPerRegion)
   throws IOException {
TableSnapshotInputFormat.setInput(job, snapshotName, tmpRestoreDir, splitAlgo,
    numSplitsPerRegion);
initTableMapperJob(snapshotName, scan, mapper, outputKeyClass,

private void verifyWithMockedMapReduce(Job job, int numRegions, int expectedNumSplits,
  byte[] startRow, byte[] stopRow)
  throws IOException, InterruptedException {
 TableSnapshotInputFormat tsif = new TableSnapshotInputFormat();
 List<InputSplit> splits = tsif.getSplits(job);
  when(taskAttemptContext.getConfiguration()).thenReturn(job.getConfiguration());
  RecordReader<ImmutableBytesWritable, Result> rr =
    tsif.createRecordReader(split, taskAttemptContext);
  rr.initialize(split, taskAttemptContext);

/**
 * Sets up the job for reading from a table snapshot. It bypasses hbase servers and read directly
 * from snapshot files.
 * @param snapshotName The name of the snapshot (of a table) to read from.
 * @param scan The scan instance with the columns, time range etc.
 * @param mapper The mapper class to use.
 * @param outputKeyClass The class of the output key.
 * @param outputValueClass The class of the output value.
 * @param job The current job to adjust. Make sure the passed job is carrying all necessary HBase
 *          configuration.
 * @param addDependencyJars upload HBase jars and jars for any of the configured job classes via
 *          the distributed cache (tmpjars).
 * @param tmpRestoreDir a temporary directory to copy the snapshot files into. Current user should
 *          have write permissions to this directory, and this should not be a subdirectory of
 *          rootdir. After the job is finished, restore directory can be deleted.
 * @throws IOException When setting up the details fails.
 * @see TableSnapshotInputFormat
 */
public static void initTableSnapshotMapperJob(String snapshotName, Scan scan,
  Class<? extends TableMapper> mapper,
  Class<?> outputKeyClass,
  Class<?> outputValueClass, Job job,
  boolean addDependencyJars, Path tmpRestoreDir)
  throws IOException {
 TableSnapshotInputFormat.setInput(job, snapshotName, tmpRestoreDir);
 initTableMapperJob(snapshotName, scan, mapper, outputKeyClass, outputValueClass, job,
  addDependencyJars, false, TableSnapshotInputFormat.class);
 resetCacheConfig(job.getConfiguration());
}

  boolean addDependencyJars, Path tmpRestoreDir)
throws IOException {
 TableSnapshotInputFormat.setInput(job, snapshotName, tmpRestoreDir);
 initTableMapperJob(snapshotName, scan, mapper, outputKeyClass,
   outputValueClass, job, addDependencyJars, false, TableSnapshotInputFormat.class);

/**
 * Sets up the job for reading from a table snapshot. It bypasses hbase servers and read directly
 * from snapshot files.
 * @param snapshotName The name of the snapshot (of a table) to read from.
 * @param scan The scan instance with the columns, time range etc.
 * @param mapper The mapper class to use.
 * @param outputKeyClass The class of the output key.
 * @param outputValueClass The class of the output value.
 * @param job The current job to adjust. Make sure the passed job is carrying all necessary HBase
 *          configuration.
 * @param addDependencyJars upload HBase jars and jars for any of the configured job classes via
 *          the distributed cache (tmpjars).
 * @param tmpRestoreDir a temporary directory to copy the snapshot files into. Current user should
 *          have write permissions to this directory, and this should not be a subdirectory of
 *          rootdir. After the job is finished, restore directory can be deleted.
 * @throws IOException When setting up the details fails.
 * @see TableSnapshotInputFormat
 */
public static void initTableSnapshotMapperJob(String snapshotName, Scan scan,
  Class<? extends TableMapper> mapper,
  Class<?> outputKeyClass,
  Class<?> outputValueClass, Job job,
  boolean addDependencyJars, Path tmpRestoreDir)
  throws IOException {
 TableSnapshotInputFormat.setInput(job, snapshotName, tmpRestoreDir);
 initTableMapperJob(snapshotName, scan, mapper, outputKeyClass, outputValueClass, job,
  addDependencyJars, false, TableSnapshotInputFormat.class);
 resetCacheConfig(job.getConfiguration());
}

                      int numSplitsPerRegion)
   throws IOException {
TableSnapshotInputFormat.setInput(job, snapshotName, tmpRestoreDir, splitAlgo,
    numSplitsPerRegion);
initTableMapperJob(snapshotName, scan, mapper, outputKeyClass,

                      int numSplitsPerRegion)
   throws IOException {
TableSnapshotInputFormat.setInput(job, snapshotName, tmpRestoreDir, splitAlgo,
    numSplitsPerRegion);
initTableMapperJob(snapshotName, scan, mapper, outputKeyClass,

TableSnapshotInputFormat.setInput(job, snapshotName, restoreDir);
config.set(SNAPSHOT_NAME_KEY, job.getConfiguration().get(SNAPSHOT_NAME_KEY));
config.set(RESTORE_DIR_KEY, job.getConfiguration().get(RESTORE_DIR_KEY));

TableSnapshotInputFormat.setInput(job, snapshotName, restoreDir);
config.set(SNAPSHOT_NAME_KEY, job.getConfiguration().get(SNAPSHOT_NAME_KEY));
config.set(RESTORE_DIR_KEY, job.getConfiguration().get(RESTORE_DIR_KEY));

Javadoc

TableSnapshotInputFormat allows a MapReduce job to run over a table snapshot. The job bypasses HBase servers, and directly accesses the underlying files (hfile, recovered edits, wals, etc) directly to provide maximum performance. The snapshot is not required to be restored to the live cluster or cloned. This also allows to run the mapreduce job from an online or offline hbase cluster. The snapshot files can be exported by using the org.apache.hadoop.hbase.snapshot.ExportSnapshot tool, to a pure-hdfs cluster, and this InputFormat can be used to run the mapreduce job directly over the snapshot files. The snapshot should not be deleted while there are jobs reading from snapshot files.

Usage is similar to TableInputFormat, and TableMapReduceUtil#initTableSnapshotMapperJob(String,Scan,Class,Class,Class,Job,boolean,Path)can be used to configure the job.

 
Job job = new Job(conf);

Internally, this input format restores the snapshot into the given tmp directory. Similar to TableInputFormat an InputSplit is created per region. The region is opened for reading from each RecordReader. An internal RegionScanner is used to execute the org.apache.hadoop.hbase.CellScanner obtained from the user.

HBase owns all the data and snapshot files on the filesystem. Only the 'hbase' user can read from snapshot files and data files. To read from snapshot files directly from the file system, the user who is running the MR job must have sufficient permissions to access snapshot and reference files. This means that to run mapreduce over snapshot files, the MR job has to be run as the HBase user or the user must have group or other privileges in the filesystem (See HBASE-8369). Note that, given other users access to read from snapshot/data files will completely circumvent the access control enforced by HBase.

Most used methods

setInput
Configures the job to use TableSnapshotInputFormat to read from a snapshot.
createRecordReader
getSplits
<init>

Popular in Java

Making http post requests using okhttp
getSharedPreferences (Context)
scheduleAtFixedRate (ScheduledExecutorService)
requestLocationUpdates (LocationManager)
ServerSocket (java.net)
This class represents a server-side socket that waits for incoming client connections. A ServerSocke
Charset (java.nio.charset)
A charset is a named mapping between Unicode characters and byte sequences. Every Charset can decode
Connection (java.sql)
A connection represents a link from a Java application to a database. All SQL statements and results
MessageFormat (java.text)
Produces concatenated messages in language-neutral way. New code should probably use java.util.Forma
Vector (java.util)
Vector is an implementation of List, backed by an array and synchronized. All optional operations in
Kernel (java.awt.image)
Best plugins for Eclipse

How to useTableSnapshotInputFormat in org.apache.hadoop.hbase.mapreduce

Best Java code snippets using org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormat (Showing top 16 results out of 315)

How to use
TableSnapshotInputFormat
in
org.apache.hadoop.hbase.mapreduce