IProgram execution support for the RDF DB.
The rules have potential parallelism when performing closure. Each join has
potential parallelism as well for subqueries. We could even define a PARALLEL
iterator flag and have parallelism across index partitions for a
read-historical iterator since the data service locators are immutable for
historical reads.
Rule-level parallelism (for fix point closure of a rule set) and join
subquery-level parallelism could be distributed to available workers in a
cluster. In a similar way, high-level queries could be distributed to workers
in a cluster to evaluation. Such distribution would increase the practical
parallelism beyond what a single machine could support as long as the total
parallelism does not overload the cluster.
There is a pragmatic limit on the #of concurrent threads for a single host.
When those threads target a blocking queue, then thread contention becomes
very high and throughput drops dramatically. We can reduce this problem by
allocating a distinct
UnsynchronizedArrayBuffer to each task. The
task collects a 'chunk' in the
UnsynchronizedArrayBuffer. When full,
the buffer propagates onto a thread-safe buffer of chunks which flushes
either on an
IMutableRelation (mutation) or feeding an
IAsynchronousIterator (high-level query). It is chunks themselves
that accumulate in this thread-safe buffer, so each add() on that buffer may
cause the thread to yield, but the return for yielding is an entire chunk in
the buffer, not just a single element.
There is one high-level buffer factory corresponding to each of the kinds of
ActionEnum:
#newQueryBuffer();
#newInsertBuffer(IMutableRelation); and
#newDeleteBuffer(IMutableRelation). In addition there is one for
UnsynchronizedArrayBuffers -- this is a buffer that is NOT
thread-safe and that is designed to store a single chunk of elements, e.g.,
in an array E[N]).