Defines executable unit for
GridComputeTask.
Description
Grid job is an executable unit of
GridComputeTask. Grid task gets split into jobs
when
GridComputeTask#map(List,Object) method is called. This method returns
all jobs for the task mapped to their corresponding grid nodes for execution. Grid
will then serialize this jobs and send them to requested nodes for execution.
When a node receives a request to execute a job, the following sequence of events
takes place:
-
If collision SPI is defined, then job gets put on waiting list which is passed to underlying
GridCollisionSpi SPI. Otherwise job will be submitted to the executor
service responsible for job execution immediately upon arrival.
-
If collision SPI is configured, then it will decide one of the following scheduling policies:
-
Job will be kept on waiting list. In this case, job will not get a
chance to execute until next time the Collision SPI is called.
-
Job will be moved to active list. In this case system will proceed
with job execution.
-
Job will be rejected. In this case the
GridComputeJobResult passed into
GridComputeTask#result(GridComputeJobResult,List) method will contain
GridComputeExecutionRejectedException exception. If you are using any
of the task adapters shipped with GridGain, then job will be failed
over automatically for execution on another node.
-
For activated jobs, an instance of distributed task session (see
GridComputeTaskSession)
will be injected.
-
System will execute the job by calling
GridComputeJob#execute() method.
-
If job gets cancelled while executing then
GridComputeJob#cancel()method will be called. Note that just like with
Thread#interrupt()method, grid job cancellation serves as a hint that a job should stop
executing or exhibit some other user defined behavior. Generally it is
up to a job to decide whether it wants to react to cancellation or
ignore it. Job cancellation can happen for several reasons:
- Collision SPI cancelled an active job.
- Parent task has completed without waiting for this job's result.
- User cancelled task by calling
GridComputeTaskFuture#cancel() method.
-
Once job execution is complete, the return value will be sent back to parent
task and will be passed into
GridComputeTask#result(GridComputeJobResult,List)method via
GridComputeJobResult instance. If job execution resulted
in a checked exception, then
GridComputeJobResult#getException() method
will contain that exception. If job execution threw a runtime exception
or error, then it will be wrapped into
GridComputeUserUndeclaredExceptionexception.
Resource Injection
Grid job implementation can be injected using IoC (dependency injection) with
grid resources. Both, field and method based injection are supported.
The following grid resources can be injected:
-
GridTaskSessionResource
-
GridJobContextResource
-
GridInstanceResource
-
GridLoggerResource
-
GridHomeResource
-
GridExecutorServiceResource
-
GridLocalNodeIdResource
-
GridMBeanServerResource
-
GridMarshallerResource
-
GridSpringApplicationContextResource
-
GridSpringResource
Refer to corresponding resource documentation for more information.
GridComputeJobAdapter
GridGain comes with convenience
GridComputeJobAdapter adapter that provides
default empty implementation for
GridComputeJob#cancel() method and also
allows user to set and get job argument, if there is one.
Distributed Session Attributes
Jobs can communicate with parent task and with other job siblings from the same
task by setting session attributes (see
GridComputeTaskSession). Other jobs
can wait for an attribute to be set either synchronously or asynchronously.
Such functionality allows jobs to synchronize their execution with other jobs
at any point and can be useful when other jobs within task need to be made aware
of certain event or state change that occurred during job execution.
Distributed task session can be injected into
GridComputeJob implementation
using
GridTaskSessionResource annotation.
Both, field and method based injections are supported. Refer to
GridComputeTaskSession documentation for more information on session functionality.
Saving Checkpoints
Long running jobs may wish to save intermediate checkpoints to protect themselves
from failures. There are three checkpoint management methods:
-
GridComputeTaskSession#saveCheckpoint(String,Object,GridComputeTaskSessionScope,long)
-
GridComputeTaskSession#loadCheckpoint(String)
-
GridComputeTaskSession#removeCheckpoint(String)
Jobs that utilize checkpoint functionality should attempt to load a check
point at the beginning of execution. If a
non-null value is returned,
then job can continue from where it failed last time, otherwise it would start
from scratch. Throughout it's execution job should periodically save its
intermediate state to avoid starting from scratch in case of a failure.