org.apache.gobblin.util.HadoopUtils.copyPath java code examples

/**
 * Copies data from a src {@link Path} to a dst {@link Path}.
 *
 * <p>
 *   This method should be used in preference to
 *   {@link FileUtil#copy(FileSystem, Path, FileSystem, Path, boolean, boolean, Configuration)}, which does not handle
 *   clean up of incomplete files if there is an error while copying data.
 * </p>
 *
 * <p>
 *   TODO this method does not handle cleaning up any local files leftover by writing to S3.
 * </p>
 *
 * @param srcFs the source {@link FileSystem} where the src {@link Path} exists
 * @param src the {@link Path} to copy from the source {@link FileSystem}
 * @param dstFs the destination {@link FileSystem} where the dst {@link Path} should be created
 * @param dst the {@link Path} to copy data to
 */
public static void copyPath(FileSystem srcFs, Path src, FileSystem dstFs, Path dst, Configuration conf)
  throws IOException {
 copyPath(srcFs, src, dstFs, dst, false, false, conf);
}

/**
 * Copies data from a src {@link Path} to a dst {@link Path}.
 *
 * <p>
 *   This method should be used in preference to
 *   {@link FileUtil#copy(FileSystem, Path, FileSystem, Path, boolean, boolean, Configuration)}, which does not handle
 *   clean up of incomplete files if there is an error while copying data.
 * </p>
 *
 * <p>
 *   TODO this method does not handle cleaning up any local files leftover by writing to S3.
 * </p>
 *
 * @param srcFs the source {@link FileSystem} where the src {@link Path} exists
 * @param src the {@link Path} to copy from the source {@link FileSystem}
 * @param dstFs the destination {@link FileSystem} where the dst {@link Path} should be created
 * @param dst the {@link Path} to copy data to
 * @param overwrite true if the destination should be overwritten; otherwise, false
 */
public static void copyPath(FileSystem srcFs, Path src, FileSystem dstFs, Path dst, boolean overwrite,
  Configuration conf) throws IOException {
 copyPath(srcFs, src, dstFs, dst, false, overwrite, conf);
}

 public static void moveSelectFiles(String extension, String source, String destination) throws IOException {
  FileSystem fs = getFileSystem();
  fs.mkdirs(new Path(destination));
  FileStatus[] fileStatuses = fs.listStatus(new Path(source));
  for (FileStatus fileStatus : fileStatuses) {
   Path path = fileStatus.getPath();
   if (!fileStatus.isDirectory() && path.toString().toLowerCase().endsWith(extension.toLowerCase())) {
    HadoopUtils.deleteIfExists(fs, new Path(destination), true);
    HadoopUtils.copyPath(fs, path, fs, new Path(destination), getConfiguration());
   }
  }
 }
}

/**
 * Moves a src {@link Path} from a srcFs {@link FileSystem} to a dst {@link Path} on a dstFs {@link FileSystem}. If
 * the srcFs and the dstFs have the same scheme, and neither of them or S3 schemes, then the {@link Path} is simply
 * renamed. Otherwise, the data is from the src {@link Path} to the dst {@link Path}. So this method can handle copying
 * data between different {@link FileSystem} implementations.
 *
 * @param srcFs the source {@link FileSystem} where the src {@link Path} exists
 * @param src the source {@link Path} which will me moved
 * @param dstFs the destination {@link FileSystem} where the dst {@link Path} should be created
 * @param dst the {@link Path} to move data to
 * @param overwrite true if the destination should be overwritten; otherwise, false
 */
public static void movePath(FileSystem srcFs, Path src, FileSystem dstFs, Path dst, boolean overwrite,
  Configuration conf) throws IOException {
 if (srcFs.getUri().getScheme().equals(dstFs.getUri().getScheme())
   && !FS_SCHEMES_NON_ATOMIC.contains(srcFs.getUri().getScheme())
   && !FS_SCHEMES_NON_ATOMIC.contains(dstFs.getUri().getScheme())) {
  renamePath(srcFs, src, dst);
 } else {
  copyPath(srcFs, src, dstFs, dst, true, overwrite, conf);
 }
}

 @Override
 public Void call() throws Exception {
  Path convertedFilePath = MRCompactorJobRunner.this.outputRecordCountProvider.convertPath(
    LateFileRecordCountProvider.restoreFilePath(filePath),
    MRCompactorJobRunner.this.outputExtension,
    MRCompactorJobRunner.this.inputRecordCountProvider);
  String targetFileName = convertedFilePath.getName();
  Path outPath = MRCompactorJobRunner.this.lateOutputRecordCountProvider.constructLateFilePath(targetFileName,
    MRCompactorJobRunner.this.fs, outputDirectory);
  HadoopUtils.copyPath (MRCompactorJobRunner.this.fs, filePath, MRCompactorJobRunner.this.fs, outPath, true,
    MRCompactorJobRunner.this.fs.getConf());
  LOG.debug(String.format("Copied %s to %s.", filePath, outPath));
  return null;
 }
});

/**
 * Copies data from a src {@link Path} to a dst {@link Path}.
 *
 * <p>
 *   This method should be used in preference to
 *   {@link FileUtil#copy(FileSystem, Path, FileSystem, Path, boolean, boolean, Configuration)}, which does not handle
 *   clean up of incomplete files if there is an error while copying data.
 * </p>
 *
 * <p>
 *   TODO this method does not handle cleaning up any local files leftover by writing to S3.
 * </p>
 *
 * @param srcFs the source {@link FileSystem} where the src {@link Path} exists
 * @param src the {@link Path} to copy from the source {@link FileSystem}
 * @param dstFs the destination {@link FileSystem} where the dst {@link Path} should be created
 * @param dst the {@link Path} to copy data to
 */
public static void copyPath(FileSystem srcFs, Path src, FileSystem dstFs, Path dst, Configuration conf)
  throws IOException {
 copyPath(srcFs, src, dstFs, dst, false, false, conf);
}

/**
 * Copies data from a src {@link Path} to a dst {@link Path}.
 *
 * <p>
 *   This method should be used in preference to
 *   {@link FileUtil#copy(FileSystem, Path, FileSystem, Path, boolean, boolean, Configuration)}, which does not handle
 *   clean up of incomplete files if there is an error while copying data.
 * </p>
 *
 * <p>
 *   TODO this method does not handle cleaning up any local files leftover by writing to S3.
 * </p>
 *
 * @param srcFs the source {@link FileSystem} where the src {@link Path} exists
 * @param src the {@link Path} to copy from the source {@link FileSystem}
 * @param dstFs the destination {@link FileSystem} where the dst {@link Path} should be created
 * @param dst the {@link Path} to copy data to
 * @param overwrite true if the destination should be overwritten; otherwise, false
 */
public static void copyPath(FileSystem srcFs, Path src, FileSystem dstFs, Path dst, boolean overwrite,
  Configuration conf) throws IOException {
 copyPath(srcFs, src, dstFs, dst, false, overwrite, conf);
}

 public static void moveSelectFiles(String extension, String source, String destination) throws IOException {
  FileSystem fs = getFileSystem();
  fs.mkdirs(new Path(destination));
  FileStatus[] fileStatuses = fs.listStatus(new Path(source));
  for (FileStatus fileStatus : fileStatuses) {
   Path path = fileStatus.getPath();
   if (!fileStatus.isDirectory() && path.toString().toLowerCase().endsWith(extension.toLowerCase())) {
    HadoopUtils.deleteIfExists(fs, new Path(destination), true);
    HadoopUtils.copyPath(fs, path, fs, new Path(destination), getConfiguration());
   }
  }
 }
}

/**
 * Moves a src {@link Path} from a srcFs {@link FileSystem} to a dst {@link Path} on a dstFs {@link FileSystem}. If
 * the srcFs and the dstFs have the same scheme, and neither of them or S3 schemes, then the {@link Path} is simply
 * renamed. Otherwise, the data is from the src {@link Path} to the dst {@link Path}. So this method can handle copying
 * data between different {@link FileSystem} implementations.
 *
 * @param srcFs the source {@link FileSystem} where the src {@link Path} exists
 * @param src the source {@link Path} which will me moved
 * @param dstFs the destination {@link FileSystem} where the dst {@link Path} should be created
 * @param dst the {@link Path} to move data to
 * @param overwrite true if the destination should be overwritten; otherwise, false
 */
public static void movePath(FileSystem srcFs, Path src, FileSystem dstFs, Path dst, boolean overwrite,
  Configuration conf) throws IOException {
 if (srcFs.getUri().getScheme().equals(dstFs.getUri().getScheme())
   && !FS_SCHEMES_NON_ATOMIC.contains(srcFs.getUri().getScheme())
   && !FS_SCHEMES_NON_ATOMIC.contains(dstFs.getUri().getScheme())) {
  renamePath(srcFs, src, dst);
 } else {
  copyPath(srcFs, src, dstFs, dst, true, overwrite, conf);
 }
}

 @Override
 public Void call() throws Exception {
  Path convertedFilePath = MRCompactorJobRunner.this.outputRecordCountProvider.convertPath(
    LateFileRecordCountProvider.restoreFilePath(filePath),
    MRCompactorJobRunner.this.outputExtension,
    MRCompactorJobRunner.this.inputRecordCountProvider);
  String targetFileName = convertedFilePath.getName();
  Path outPath = MRCompactorJobRunner.this.lateOutputRecordCountProvider.constructLateFilePath(targetFileName,
    MRCompactorJobRunner.this.fs, outputDirectory);
  HadoopUtils.copyPath (MRCompactorJobRunner.this.fs, filePath, MRCompactorJobRunner.this.fs, outPath, true,
    MRCompactorJobRunner.this.fs.getConf());
  LOG.debug(String.format("Copied %s to %s.", filePath, outPath));
  return null;
 }
});

Javadoc

Copies data from a src Path to a dst Path.

This method should be used in preference to FileUtil#copy(FileSystem,Path,FileSystem,Path,boolean,boolean,Configuration), which does not handle clean up of incomplete files if there is an error while copying data.

TODO this method does not handle cleaning up any local files leftover by writing to S3.

Popular methods of HadoopUtils

getConfFromState
Provides Hadoop configuration given state. It also supports decrypting values on "encryptedPath". No
newConfiguration
renamePath
A wrapper around FileSystem#rename(Path,Path) which throws IOException if FileSystem#rename(Path,Pat
deleteIfExists
A wrapper around FileSystem#delete(Path,boolean) that only deletes a given Path if it is present on
deletePath
A wrapper around FileSystem#delete(Path,boolean) which throws IOException if the given Path exists,
sanitizePath
Remove illegal HDFS path characters from the given path. Illegal characters will be replaced with th
getOptionallyThrottledFileSystem
Calls #getOptionallyThrottledFileSystem(FileSystem,int) parsing the qps from the input Stateat key #
getStateFromConf
movePath
Moves a src Path from a srcFs FileSystem to a dst Path on a dstFs FileSystem. If the srcFs and the d
getSourceFileSystem
Get a FileSystem object for the uri specified at ConfigurationKeys#SOURCE_FILEBASED_FS_URI.
addGobblinSite
Add "gobblin-site.xml" as a Configuration resource.
copyFile
Copy a file from a srcFs FileSystem to a dstFs FileSystem. The src Path must be a file, that is File

Popular in Java

Updating database using SQL prepared statement
getApplicationContext (Context)
findViewById (Activity)
getContentResolver (Context)
HttpServer (com.sun.net.httpserver)
This class implements a simple HTTP server. A HttpServer is bound to an IP address and port number a
BufferedReader (java.io)
Wraps an existing Reader and buffers the input. Expensive interaction with the underlying reader is
InetAddress (java.net)
An Internet Protocol (IP) address. This can be either an IPv4 address or an IPv6 address, and in pra
HashSet (java.util)
HashSet is an implementation of a Set. All optional operations (adding and removing) are supported.
NoSuchElementException (java.util)
Thrown when trying to retrieve an element past the end of an Enumeration or Iterator.
SSLHandshakeException (javax.net.ssl)
The exception that is thrown when a handshake could not be completed successfully.
Best plugins for Eclipse

How to use copyPathmethodin org.apache.gobblin.util.HadoopUtils

Best Java code snippets using org.apache.gobblin.util.HadoopUtils.copyPath (Showing top 10 results out of 315)

How to use
copyPath
method
in
org.apache.gobblin.util.HadoopUtils