This makes an isolated classloader that has classes loaded in the "proper" priority.
This isolation is *only* for the part of the HadoopTask that calls runTask in an isolated manner.
Jars for the job are the same jars as for the classloader EXCEPT the hadoopDependencyCoordinates, which are not
used in the job jars.
The URLs in the resultant classloader are loaded in this priority:
1. Non-Druid jars (see
#IS_DRUID_URL) found in the ClassLoader for HadoopIndexTask.class. This will
probably be the ApplicationClassLoader
2. Hadoop jars found in the hadoop dependency coordinates directory, loaded in the order they are specified in
3. Druid jars (see
#IS_DRUID_URL) found in the ClassLoader for HadoopIndexTask.class
4. Extension URLs maintaining the order specified in the extensions list in the extensions config
At one point I tried making each one of these steps a URLClassLoader, but it is not easy to make a properly
#IS_DRUID_URL that captures all things which reference druid classes. This lead to a case where
the class loader isolation worked great for stock druid, but failed in many common use cases including extension
jars on the classpath which were not listed in the extensions list.
As such, the current approach is to make a list of URLs for a URLClassLoader based on the priority above, and use
THAT ClassLoader with a null parent as the isolated loader for running hadoop or hadoop-like driver tasks.
Such an approach combined with reasonable exclusions in org.apache.druid.cli.PullDependencies#exclusions tries to maintain
sanity in a ClassLoader where all jars (which are isolated by extension ClassLoaders in the Druid framework) are
jumbled together into one ClassLoader for Hadoop and Hadoop-like tasks (Spark for example).