A
CleanableDataset that may have multiple
VersionFinder,
VersionSelectionPolicyand
RetentionActions. Retention needs to performed for different kinds of
DatasetVersions. Each
kind of
DatasetVersion can have its own
VersionSelectionPolicy and/or
RetentionActionassociated with it.
-
MultiVersionCleanableDatasetBase#getVersionFindersAndPolicies() gets a list
VersionFinderAndPolicys
- Each
VersionFinderAndPolicy contains a
VersionFinder and a
VersionSelectionPolicy. It can
optionally have a
RetentionAction
- The
MultiVersionCleanableDatasetBase#clean() method finds all the
FileSystemDatasetVersions using
VersionFinderAndPolicy#versionFinder
- It gets the deletable
FileSystemDatasetVersions by applying
VersionFinderAndPolicy#versionSelectionPolicy.
These deletable version are deleted and then deletes empty parent directories.
- If additional retention actions are available at
VersionFinderAndPolicy#getRetentionActions(), all versions
found by the
VersionFinderAndPolicy#versionFinder are passed to
RetentionAction#execute(List) for
each
RetentionAction
Concrete subclasses should implement
#getVersionFindersAndPolicies()
Datasets are directories in the filesystem containing data files organized in version-like directory structures.
Example datasets:
For snapshot based datasets, with the directory structure:
/path/to/table/
snapshot1/
dataFiles...
snapshot2/
dataFiles...
each of snapshot1 and snapshot2 are dataset versions.
For tracking datasets, with the directory structure:
/path/to/tracking/data/
2015/
06/
01/
dataFiles...
02/
dataFiles...
each of 2015/06/01 and 2015/06/02 are dataset versions.