Grid task interface defines a task that can be executed on the grid. Grid task
is responsible for splitting business logic into multiple grid jobs, receiving
results from individual grid jobs executing on remote nodes, and reducing
(aggregating) received jobs' results into final grid task result.
Grid Task Execution Sequence
-
Upon request to execute a grid task with given task name system will find
deployed task with given name. Task needs to be deployed prior to execution
(see
GridCompute#localDeployTask(Class,ClassLoader) method), however if task does not specify
its name explicitly via
GridComputeTaskName annotation, it
will be auto-deployed first time it gets executed.
-
System will create new distributed task session (see
GridComputeTaskSession).
-
System will inject all annotated resources (including task session) into grid task instance.
See org.gridgain.grid.resources
package for the list of injectable resources.
-
System will apply
#map(List,Object). This
method is responsible for splitting business logic of grid task into
multiple grid jobs (units of execution) and mapping them to
grid nodes. Method
#map(List,Object) returns
a map of with grid jobs as keys and grid node as values.
-
System will send mapped grid jobs to their respective nodes.
-
Upon arrival on the remote node a grid job will be handled by collision SPI
(see
GridCollisionSpi) which will determine how a job will be executed
on the remote node (immediately, buffered or canceled).
-
Once job execution results become available method
#result(GridComputeJobResult,List)will be called for each received job result. The policy returned by this method will
determine the way task reacts to every job result:
-
If
GridComputeJobResultPolicy#WAIT policy is returned, task will continue to wait
for other job results. If this result is the last job result, then
#reduce(List) method will be called.
-
If
GridComputeJobResultPolicy#REDUCE policy is returned, then method
#reduce(List) will be called right away without waiting for
other jobs' completion (all remaining jobs will receive a cancel request).
-
If
GridComputeJobResultPolicy#FAILOVER policy is returned, then job will
be failed over to another node for execution. The node to which job will get
failed over is decided by
GridFailoverSpi SPI implementation.
Note that if you use
GridComputeTaskAdapter adapter for
GridComputeTaskimplementation, then it will automatically fail jobs to another node for 2
known failure cases:
-
Job has failed due to node crash. In this case
GridComputeJobResult#getException()method will return an instance of
GridTopologyException exception.
-
Job execution was rejected, i.e. remote node has cancelled job before it got
a chance to execute, while it still was on the waiting list. In this case
GridComputeJobResult#getException() method will return an instance of
GridComputeExecutionRejectedException exception.
-
Once all results are received or
#result(GridComputeJobResult,List)method returned
GridComputeJobResultPolicy#REDUCE policy, method
#reduce(List)is called to aggregate received results into one final result. Once this method is finished the
execution of the grid task is complete. This result will be returned to the user through
GridComputeTaskFuture#get() method.
Continuous Job Mapper
For cases when jobs within split are too large to fit in memory at once or when
simply not all jobs in task are known during
#map(List,Object) step,
use
GridComputeTaskContinuousMapper to continuously stream jobs from task even after
map(...)step is complete. Usually with continuous mapper the number of jobs within task
may grow too large - in this case it may make sense to use it in combination with
GridComputeTaskNoResultCache annotation.
Task Result Caching
Sometimes job results are too large or task simply has too many jobs to keep track
of which may hinder performance. In such cases it may make sense to disable task
result caching by attaching
GridComputeTaskNoResultCache annotation to task class, and
processing all results as they come in
#result(GridComputeJobResult,List) method.
When GridGain sees this annotation it will disable tracking of job results and
list of all job results passed into
#result(GridComputeJobResult,List) or
#reduce(List) methods will always be empty. Note that list of
job siblings on
GridComputeTaskSession will also be empty to prevent number
of job siblings from growing as well.
Resource Injection
Grid task implementation can be injected using IoC (dependency injection) with
grid resources. Both, field and method based injection are supported.
The following grid resources can be injected:
-
GridTaskSessionResource
-
GridInstanceResource
-
GridLoggerResource
-
GridHomeResource
-
GridExecutorServiceResource
-
GridLocalNodeIdResource
-
GridMBeanServerResource
-
GridMarshallerResource
-
GridSpringApplicationContextResource
-
GridSpringResource
Refer to corresponding resource documentation for more information.
Grid Task Adapters
GridComputeTask comes with several convenience adapters to make the usage easier:
-
GridComputeTaskAdapter provides default implementation for
GridComputeTask#result(GridComputeJobResult,List)method which provides automatic fail-over to another node if remote job has failed
due to node crash (detected by
GridTopologyException exception) or due to job
execution rejection (detected by
GridComputeExecutionRejectedException exception).
Here is an example of how a you would implement your task using
GridComputeTaskAdapter:
public class MyFooBarTask extends GridComputeTaskAdapter<String, String> {
// Inject load balancer.
@GridLoadBalancerResource
GridComputeLoadBalancer balancer;
// Map jobs to grid nodes.
public Map<? extends GridComputeJob, GridNode> map(List<GridNode> subgrid, String arg) throws GridException {
Map<MyFooBarJob, GridNode> jobs = new HashMap<MyFooBarJob, GridNode>(subgrid.size());
// In more complex cases, you can actually do
// more complicated assignments of jobs to nodes.
for (int i = 0; i < subgrid.size(); i++) {
// Pick the next best balanced node for the job.
jobs.put(new MyFooBarJob(arg), balancer.getBalancedNode())
}
return jobs;
}
// Aggregate results into one compound result.
public String reduce(List<GridComputeJobResult> results) throws GridException {
// For the purpose of this example we simply
// concatenate string representation of every
// job result
StringBuilder buf = new StringBuilder();
for (GridComputeJobResult res : results) {
// Append string representation of result
// returned by every job.
buf.append(res.getData().string());
}
return buf.string();
}
}
-
GridComputeTaskSplitAdapter hides the job-to-node mapping logic from
user and provides convenient
GridComputeTaskSplitAdapter#split(int,Object)method for splitting task into sub-jobs in homogeneous environments.
Here is an example of how you would implement your task using
GridComputeTaskSplitAdapter:
public class MyFooBarTask extends GridComputeTaskSplitAdapter<Object, String> {
@Override
protected Collection<? extends GridComputeJob> split(int gridSize, Object arg) throws GridException {
List<MyFooBarJob> jobs = new ArrayList<MyFooBarJob>(gridSize);
for (int i = 0; i < gridSize; i++) {
jobs.add(new MyFooBarJob(arg));
}
// Node assignment via load balancer
// happens automatically.
return jobs;
}
// Aggregate results into one compound result.
public String reduce(List<GridComputeJobResult> results) throws GridException {
// For the purpose of this example we simply
// concatenate string representation of every
// job result
StringBuilder buf = new StringBuilder();
for (GridComputeJobResult res : results) {
// Append string representation of result
// returned by every job.
buf.append(res.getData().string());
}
return buf.string();
}
}