Simple hashing functions that we can rely on staying the same cross-platform.
The static methods of this class (not its inner classes) use a custom algorithm
designed for speed and general-purpose usability, but not cryptographic security;
this algorithm is sometimes referred to as Hive, and several other algorithms are
available in static inner classes, some of which have different goals, such as
reduced likelihood of successfully reversing a hash, or just providing another
choice. The hashes this returns are always 0 when given null to hash. Arrays with
identical elements of identical types will hash identically. Arrays with identical
numerical values but different types will sometimes hash differently. This class
always provides 64-bit hashes via hash64() and 32-bit hashes via hash(), and some
of the algorithms here may provide a hash32() method that matches older behavior
and uses only 32-bit math. The hash64() and hash() methods, at least in Wisp and
Mist, use 64-bit math even when producing 32-bit hashes, for GWT reasons. GWT
doesn't have the same behavior as desktop and Android applications when using ints
because it treats doubles mostly like ints, sometimes, due to it using JavaScript.
If we use mainly longs, though, GWT emulates the longs with a more complex
technique behind-the-scenes, that behaves the same on the web as it does on
desktop or on a phone. Since CrossHash is supposed to be stable cross-platform,
this is the way we need to go, despite it being slightly slower.
There are several static inner classes in CrossHash: Lightning, Falcon, and Mist,
each providing different hashing properties, as well as the inner IHasher interface and a
compatibility version of Wisp as a subclass. Older versions of SquidLib encouraged using a
subclass because the non-nested-class methods used a lousy implementation of the FNV-1a algorithm,
which was roughly 10x slower than the current methods in CrossHash and had poor correlation
properties. In the current version, you probably will be fine with the default functions in
CrossHash, using the Wisp algorithm. If you need a salt to alter the hash function,
using one of a large family of such functions instead of a single function like Wisp, then Mist
is a good choice. Lightning is mostly superseded by Wisp, but it can have better behavior on some
collections regarding collisions; Falcon is meant to be a faster version of Lightning.
IHasher values are provided as static fields, and use Wisp to hash a specific type or fall
back to Object.hashCode if given an object with the wrong type. IHasher values are optional
parts of OrderedMap, OrderedSet, Arrangement, and the various classes that use Arrangement
like K2 and K2V1, and allow arrays to be used as keys in those collections while keeping
hashing by value instead of the normal hashing by reference for arrays. You probably won't
ever need to make a class that implements IHasher yourself; for some cases you may want to
look at the
Hashers class for additional functions.
The inner classes provide alternate, faster hashing algorithms. Lightning, Wisp, and Falcon
have no theoretical basis or grounds in any reason other than empirical testing for why they
do what they do, and this seems to be in line with many widely-used hashes (see: The Art of
Hashing, http://eternallyconfuzzled.com/tuts/algorithms/jsw_tut_hashing.aspx ). That said, Wisp
performs very well, ahead of Arrays.hashCode (10.5 ms instead of 15 ms) for over a million
hashes of 16-element long arrays, not including overhead for generating them, while SipHash and
FNV-1a take approximately 80 ms and 135-155 ms, respectively, for the same data). Lightning and
Falcon perform less-well, with Lightning taking 17 ms instead of 15 ms for Arrays.hashCode, and
Falcon taking about 12.3 ms but slowing down somewhat if a 32-bit hash is requested from long
data. All of these have good, low, collision rates on Strings and long arrays. Sketch is only
slightly slower than Wisp, but offers little to no advantage over it yet.
Mist is a variant on Wisp with 128 bits for a salt-like modifier as a member variable, which can
make 2 to the 128 individual hashing functions from one set of code, and uses 64 bits for some other
hashes (only calls to hash() with data that doesn't involve long or double arrays). Mist has some
minor resemblance to a cryptographic hash, but is not recommended it for that usage. It is,
however ideal for situations that show up often in game development where end users may be able
to see and possibly alter some information that you don't want changed (i.e. save data stored on
a device or in the browser's LocalStorage). If you want a way to verify the data is what you
think it is, you can store a hash, using one of the many-possible hash functions this can
produce, somewhere else and verify that the saved data has the hash it did last time; if the
exact hashing function isn't known (or exact functions aren't known) by a tampering user,
then it is unlikely they can make the hash match even if they can edit it. Mist is slightly slower
than Wisp, at about 18 ms for Mist for the same data instead of Wisp's 10.5, but should never be
worse than twice as slow as Arrays.hashCode, and is still about three times faster than the similar
SipHash that SquidLib previously had here.
All of the hashes used here have about the same rate of collisions on identical data
(testing used Arrays.hashCode, all the hashes in here now, and the now-removed SipHash), with
any fluctuation within some small margin of error. Wisp (typically via the non-nested methods
in CrossHash) and Mist are the two most likely algorithms you might use here.
To help find patterns in hash output in a visual way, you can hash an x,y point, take the bottom 24 bits,
and use that as an RGB color for the pixel at that x,y point. On a 512x512 grid of points, the patterns
in Arrays.hashCode and the former default CrossHash algorithm (FNV-1a) are evident, and Sip (implementing
SipHash) did approximately as well as Lightning, with no clear patterns visible (Sip has been removed from
SquidLib because it needs a lot of code and is slower than all of the current hashes). The idea is from
a technical report on visual uses for hashing (PDF).
-
java.util.Arrays#hashCode(int[]): http://i.imgur.com/S4Gh1sX.png
-
CrossHash#hash(int[]): http://i.imgur.com/x8SDqvL.png
- (Former) CrossHash.Sip.hash(int[]): http://i.imgur.com/keSpIwm.png
-
CrossHash.Lightning#hash(int[]): http://i.imgur.com/afGJ9cA.png
Note: This class was formerly called StableHash, but since that refers to a specific
category of hashing algorithm that this is not, and since the goal is to be cross-
platform, the name was changed to CrossHash.
Note 2: FNV-1a was removed from SquidLib on July 25, 2017, and replaced as default with Wisp; Wisp
was later replaced as default by Hive. Wisp was used because at the time SquidLib preferred 64-bit
math when math needed to be the same across platforms; math on longs behaves the same on GWT as on
desktop, despite being slower. Hive passes SMHasher, a testing suite for hashes, where Wisp does
not (it fails just like Arrays.hashCode() does). Hive now uses a cross-platform subset of the
possible 32-bit math operations when producing 32-bit hashes of data that doesn't involve longs or
doubles, and this should speed up the default CrossHash.hash() methods a lot on GWT.
Created by Tommy Ettinger on 1/16/2016.