How to use
BatchSource
in
co.cask.cdap.etl.api.batch

Best Java code snippets using co.cask.cdap.etl.api.batch.BatchSource (Showing top 7 results out of 315)

@Override
public void initialize(BatchRuntimeContext context) throws Exception {
 super.initialize(context);
}

@Override
public void configurePipeline(PipelineConfigurer pipelineConfigurer) {
 super.configurePipeline(pipelineConfigurer);
 pipelineConfigurer.createDataset(config.tableName, Table.class);
 if (!config.containsMacro("runtimeDatasetName")) {
  pipelineConfigurer.createDataset(config.runtimeDatasetName, KeyValueTable.class.getName(),
                   DatasetProperties.EMPTY);
 }
}

@Override
public void configurePipeline(PipelineConfigurer pipelineConfigurer) {
 super.configurePipeline(pipelineConfigurer);
 pipelineConfigurer.createDataset(config.tableName, Table.class);
 if (config.schema != null) {
  try {
   pipelineConfigurer.getStageConfigurer().setOutputSchema(Schema.parseJson(config.schema));
  } catch (IOException e) {
   throw new IllegalArgumentException("Could not parse schema " + config.schema, e);
  }
 }
}

@Override
public void configurePipeline(PipelineConfigurer pipelineConfigurer) {
 super.configurePipeline(pipelineConfigurer);
 excelInputreaderConfig.validate();
 if (Strings.isNullOrEmpty(excelInputreaderConfig.columnList) &&
  Strings.isNullOrEmpty(excelInputreaderConfig.outputSchema)) {
  throw new IllegalArgumentException("'Field Name Schema Type Mapping' input cannot be empty when the empty " +
                     "input value of 'Columns To Be Extracted' is provided.");
 }
 createDatasets(pipelineConfigurer, null);
 init();
 getOutputSchema();
 pipelineConfigurer.getStageConfigurer().setOutputSchema(outputSchema);
}

@Override
public void initialize(BatchRuntimeContext context) throws Exception {
 super.initialize(context);
 if (config.schema != null) {
  // should never happen, just done to test App correctness in unit tests
  Schema outputSchema = Schema.parseJson(config.schema);
  if (!outputSchema.equals(context.getOutputSchema())) {
   throw new IllegalStateException("Output schema does not match what was set at configure time.");
  }
 }
}

@Override
public void configurePipeline(PipelineConfigurer pipelineConfigurer) {
 super.configurePipeline(pipelineConfigurer);
 streamBatchConfig.validate();
 if (!streamBatchConfig.containsMacro(Properties.Stream.NAME)) {
  pipelineConfigurer.addStream(new Stream(streamBatchConfig.name));
 }
 // if no format is specified then default schema is used, if otherwise its based on format spec.
 if (streamBatchConfig.format == null) {
  pipelineConfigurer.getStageConfigurer().setOutputSchema(DEFAULT_SCHEMA);
 } else if (streamBatchConfig.getFormatSpec() != null && streamBatchConfig.getFormatSpec().getSchema() != null) {
  List<Schema.Field> fields = Lists.newArrayList();
  fields.add(Schema.Field.of("ts", Schema.of(Schema.Type.LONG)));
  fields.add(Schema.Field.of("headers",
                Schema.mapOf(Schema.of(Schema.Type.STRING), Schema.of(Schema.Type.STRING))));
  fields.addAll(streamBatchConfig.getFormatSpec().getSchema().getFields());
  pipelineConfigurer.getStageConfigurer().setOutputSchema(Schema.recordOf("event", fields));
 }
}

 @Override
 public Iterable<RecordInfo<Object>> call(Tuple2<Object, Object> input) throws Exception {
  if (transform == null) {
   BatchSource<Object, Object, Object> batchSource = pluginFunctionContext.createPlugin();
   batchSource.initialize(pluginFunctionContext.createBatchRuntimeContext());
   transform = new TrackedTransform<>(pluginFunctionContext.getDataTracer().isEnabled() ?
                      new LimitingTransform<>(batchSource, numOfRecordsPreview) :
                      batchSource,
                     pluginFunctionContext.createStageMetrics(),
                     pluginFunctionContext.getDataTracer(),
                     pluginFunctionContext.getStageStatisticsCollector());
   emitter = new CombinedEmitter<>(pluginFunctionContext.getStageName());
  }
  emitter.reset();
  KeyValue<Object, Object> inputKV = new KeyValue<>(input._1(), input._2());
  transform.transform(inputKV, emitter);
  return emitter.getEmitted();
 }
}

Javadoc

Batch Source forms the first stage of a Batch ETL Pipeline. In addition to configuring the Batch run, it also transforms the key value pairs provided by the Batch run into a single output type to be consumed by subsequent transforms. By default, the value of the key value pair will be emitted. BatchSource#initialize, BatchSource#transform and BatchSource#destroy methods are called inside the Batch Run while BatchSource#prepareRun and BatchSource#onRunFinish methods are called on the client side, which launches the Batch run, before the Batch run starts and after it finishes respectively.

Most used methods

configurePipeline
initialize
Initialize the Batch Source stage. Executed inside the Batch Run. This method is guaranteed to be in

Popular in Java

Start an intent from android
compareTo (BigDecimal)
scheduleAtFixedRate (Timer)
scheduleAtFixedRate (ScheduledExecutorService)
BufferedInputStream (java.io)
A BufferedInputStream adds functionality to another input stream-namely, the ability to buffer the i
Comparator (java.util)
A Comparator is used to compare two objects to determine their ordering with respect to each other.
BlockingQueue (java.util.concurrent)
A java.util.Queue that additionally supports operations that wait for the queue to become non-empty
SAXParseException (org.xml.sax)
Encapsulate an XML parse error or warning.> This module, both source code and documentation, is in t
Table (org.hibernate.mapping)
A relational table
Scheduler (org.quartz)
This is the main interface of a Quartz Scheduler. A Scheduler maintains a registry of org.quartz.Job
From CI to AI: The AI layer in your organization

How to useBatchSource in co.cask.cdap.etl.api.batch

Best Java code snippets using co.cask.cdap.etl.api.batch.BatchSource (Showing top 7 results out of 315)

How to use
BatchSource
in
co.cask.cdap.etl.api.batch