Thursday, 10 May 2012

java hash Code


While the Java language does not provide direct support for associative arrays -- arrays that can take any object as an index -- the presence of the hashCode() method in the root Object class clearly anticipates the ubiquitous use of HashMap (and its predecessor, Hashtable). Under ideal conditions, hash-based containers offer both efficient insertion and efficient retrieval; supporting hashing directly in the object model facilitates the development and use of hash-based containers.

Defining equality

The Object class has two methods for making inferences about an object's identity: equals() and hashCode(). In general, if you override one of these methods, you must override both, as there are important relationships between them that must be maintained. In particular, if two objects are equal according to the equals() method, they must have the same hashCode() value (although the reverse is not generally true).

The semantics of equals() for a given class are left to the implementer; defining what equals() means for a given class is part of the design work for that class. The default implementation, provided by Object, is simply reference equality:

  public boolean equals(Object obj) {
    return (this == obj);
  }


Under this default implementation, two references are equal only if they refer to the exact same object. Similarly, the default implementation of hashCode() provided by Object is derived by mapping the memory address of the object to an integer value. Because on some architectures the address space is larger than the range of values for int, it is possible that two distinct objects could have the same hashCode(). If you override hashCode(), you can still use the System.identityHashCode() method to access this default value.

Overriding equals() -- a simple example

An identity-based implementation for equals() and hashCode() is a sensible default, but for some classes, it is desirable to relax the definition of equality somewhat. For example, the Integer class defines equals() similarly to this:

  public boolean equals(Object obj) {
    return (obj instanceof Integer
            && intValue() == ((Integer) obj).intValue());
  }


Under this definition, two Integer objects are equal only if they contain the same integer value. This, along with Integer being immutable, makes it practical to use an Integer as a key in a HashMap. This value-based approach to equality is used by all the primitive wrapper classes in the Java class library, such as Integer, Float, Character, and Boolean, as well as String (two String objects are equal if they contain the same sequence of characters). Because these classes are immutable and implement hashCode() and equals() sensibly, they all make good hash keys.

Why override equals() and hashCode()?

What would happen if Integer did not override equals() and hashCode()? Nothing, if we never used an Integer as a key in a HashMap or other hash-based collection. However, if we were to use such an Integer object for a key in a HashMap, we would not be able to reliably retrieve the associated value, unless we used the exact same Integer instance in the get() call as we did in the put() call. This would require ensuring that we only use a single instance of the Integer object corresponding to a particular integer value throughout our program. Needless to say, this approach would be inconvenient and error prone.

The interface contract for Object requires that if two objects are equal according to equals(), then they must have the same hashCode() value. Why does our root object class need hashCode(), when its discriminating ability is entirely subsumed by that of equals()? The hashCode() method exists purely for efficiency. The Java platform architects anticipated the importance of hash-based collection classes -- such as Hashtable, HashMap, and HashSet -- in typical Java applications, and comparing against many objects with equals() can be computationally expensive. Having every Java object support hashCode() allows for efficient storage and retrieval using hash-based collections

Requirements for implementing equals() and hashCode()

There are some restrictions placed on the behavior of equals() and hashCode(), which are enumerated in the documentation for Object. In particular, the equals() method must exhibit the following properties:

    Symmetry: For two references, a and b, a.equals(b) if and only if b.equals(a)
    Reflexivity: For all non-null references, a.equals(a)
    Transitivity: If a.equals(b) and b.equals(c), then a.equals(c)
    Consistency with hashCode(): Two equal objects must have the same hashCode() value

The specification for Object offers a vague guideline that equals() and hashCode() be consistent -- that their results will be the same for subsequent invocations, provided that "no information used in equals comparison on the object is modified." This sounds sort of like "the result of the calculation shouldn't change, unless it does." This vague statement is generally interpreted to mean that equality and hash value calculations should be a deterministic function of an object's state and nothing else.

No comments:

Post a Comment