Introduction
Although at times the hashCode and equals methods seem to be a “no-glory” implementation, they are extremely important in maintaining correct data manipulation. Joshua Bloch's Effective Java has an excellent overview of the concerns involved when overriding these two methods. For convenience, a copy of this chapter can be found at http://java.sun.com/developer/Books/effectivejava/Chapter3.pdf.
Also, for a quick summary of what is involved in implementing these methods, please refer to the javadoc on these methods http://download.oracle.com/javase/6/docs/api/java/lang/Object.html#hashCode(). A good approach to coding these methods, and maintaining the general contracts, as described in the previous references, is to use the same objects to determine the hash code as the objects tested for equality in the equals method.
Issues with Object Persistence
I have encountered a few times where these methods have not been properly implemented and with testing discovered that weird things were happening, like the wrong object being deleted or updated, giving spurious results. Tracking down the cause was usually conclusive during an operation when some oblique modification occurred when it wasn't expected. In any case, an area that I've seen this occur was during database object mapping operations.
In one particular case we had this code in place:
...
public boolean equals(final Object obj) {
if (!(obj instanceof AttributeGroupAttribute)) {
return false;
}
return this.getAttribute().getUidPk() == ((AttributeGroupAttribute) obj).getAttribute().getUidPk();
}
public int hashCode() {
return ((Long) this.getAttribute().getUidPk()).hashCode();
}
...
There were debates on a simple check to determine equality between objects without getting wrapped up in which elements should be included in the equality equation that gave business value. Originally the uidPk, which represents the database unique identifier, for the object when persisted was being used as in the above case. But using this method broke down when we attempted to use the equality method across items in attempting to synchronize databases, as well as comparing items that were not yet persisted. To solve this we came up with using GUIDS as identifiers. This ensured that we could test equality between non-persisted objects and were not getting spurious results when synchronizing across databases. There is still debate on what we need to compare in object equivalence. Sometimes using GUIDs make sense, but then sometimes business logic comparisons makes sense as well, where GUIDs do not, as within import export of data.
As another example, previously in AttributeImpl we didn't have customized hash code / equals methods in place, so in looking at the parent class AbstractEntityImpl for something reasonable there was also no customized hash code and equals methods defined. This continued along the chain of extended classes from AbstractEntityImpl. This meant the default hash code / equals method was used, which only compares object references, and was not what we needed. This caused a database deadlock when trying to view attributes on the catalog view pages.
We also had issues with Customer Addresses, which used the AbstractAddressImpl implementation of hash code / equals. This caused different customer addresses to be updated than what was expected. In run time, the first customer address in the list was always the one that was updated, it wasn't able to differentiate the one that was modified on persisting.
...
public boolean equals(final Object obj) {
if (obj instanceof AbstractAddressImpl) {
AbstractAddressImpl addressEntity = (AbstractAddressImpl) obj;
if (checkIdentityStrings(this.lastName, addressEntity.lastName) && checkIdentityStrings(this.firstName, addressEntity.firstName)
&& checkIdentityStrings(this.street1, addressEntity.street1) && checkIdentityStrings(this.street2, addressEntity.street2)
&& checkIdentityStrings(this.city, addressEntity.city) && checkIdentityStrings(this.country, addressEntity.country)
&& checkIdentityStrings(this.faxNumber, addressEntity.faxNumber)
&& checkIdentityStrings(this.phoneNumber, addressEntity.phoneNumber)
&& checkIdentityStrings(this.subCountry, addressEntity.subCountry)
&& checkIdentityStrings(this.zipOrPostalCode, addressEntity.zipOrPostalCode)) {
return true;
}
}
return false;
}
private boolean checkIdentityStrings(final String string1, final String string2) {
boolean identity = false;
if (string1 == null && string2 == null) {
return true;
} else if ((string1 != null && string2 != null) && string1.equals(string2)) {
identity = true;
}
return identity;
}
public int hashCode() {
return 0;
}
...
On inspection, the equals method looks okay, but the underlying issue was that it could not determine differences in instances when the above fields in the equals method were the same. We introduced a GUID to help resolve that. The hashCode method is legal since it returns the same value for any equals comparison, but it is very inefficient since all values are placed in the same “bucket” location.
With OpenJPA the hash code equals methods are equally important. When persistence operations are not working as expected, a potential culprit could be an erroneous hash code or equals method. These cases are harder to track down since we enhance our classes which add methods to the class, among other things, for persistence management. These lines do not appear when stepping through the code in debug mode. When you expect your instance to be populated for persisted properties, you would expect the classes' setXXX method would be called. But in the case of extended OpenJPA classes, it uses a combination of the getXXX method name and the enhanced classes' getPCXXX / setPCXXX methods to populate the instance variable which can be easily missed when stepping through the stack. Another issue previously experienced with OpenJPA was that when hash code and equals methods used getters to compare items in the hash code equals methods, it would cause stack overflow issues as the enhanced methods would also call the getters in figuring out the state of the object and put itself in a loop to to determine the state of the object. That is why you will see particular instances in the code where we avoid using getters over calling the instance variables directly.
When approaching implementation of your own classes, Apache has a few helper utilities that have been helpful in constructing the hash code and equals contracts. The first is found within ObjectUtils, which makes things much more readable than the eclipse code generation alternative and has convenient methods to get the job done. The second is found within EqualsBuilder and HashCodeBuilder, which makes the operation even more readable and simpler than the first, since you do not need to specify a prime or result in the hash code case and the equals case uses the same equals signature in each case. Also a nicety of the second utility case is that you can chain appends on the builders. Both are used within our code base. Some examples of the two cases are as follows:
In the AbstractAttributeValueImpl Class,
import org.apache.commons.lang.ObjectUtils:
...
public boolean equals(final Object obj) {
if (this == obj) {
return true;
}
if (!(obj instanceof AbstractAttributeValueImpl)) {
return false;
}
AbstractAttributeValueImpl other = (AbstractAttributeValueImpl) obj;
return (ObjectUtils.equals(this.integerValue, other.integerValue)
&& ObjectUtils.equals(this.decimalValue, other.decimalValue)
&& ObjectUtils.equals(this.booleanValue, other.booleanValue)
&& ObjectUtils.equals(this.dateValue, other.dateValue)
&& ObjectUtils.equals(this.attribute, other.attribute)
&& StringUtils.equals(this.shortTextValue, other.shortTextValue)
&& StringUtils.equals(this.longTextValue, other.longTextValue)
&& StringUtils.equals(this.localizedAttributeKey, other.localizedAttributeKey)
&& this.attributeTypeId == other.attributeTypeId);
}
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + ObjectUtils.hashCode(this.integerValue);
result = prime * result + ObjectUtils.hashCode(this.decimalValue);
result = prime * result + ObjectUtils.hashCode(this.booleanValue);
result = prime * result + ObjectUtils.hashCode(this.dateValue);
result = prime * result + ObjectUtils.hashCode(this.attribute);
result = prime * result + ObjectUtils.hashCode(this.shortTextValue);
result = prime * result + ObjectUtils.hashCode(this.longTextValue);
result = prime * result + ObjectUtils.hashCode(this.localizedAttributeKey);
result = prime * result + this.attributeTypeId;
return result;
}
And in the CampaignImpl Class:
import org.apache.commons.lang.builder.EqualsBuilder;
import org.apache.commons.lang.builder.HashCodeBuilder;
...
public int hashCode() {
return new HashCodeBuilder().append(thirdPartyId).toHashCode();
}
public boolean equals(final Object obj) {
if (this == obj) {
return true;
}
if (!(obj instanceof CampaignImpl)) {
return false;
}
CampaignImpl other = (CampaignImpl) obj;
return new EqualsBuilder().append(thirdPartyId, other.thirdPartyId).isEquals();
}
Recursive Approach in Finding Potential Missing Hash Code and Equals Methods
In one project, I had the opportunity to implement change set object auditing within the data synchronization tool and was asked to reconcile the existing audit list with the one that we were initially given. I started going through each class and tried to follow each extension as well as each OpenJPA annotation to try to figure out if we indeed had covered each class in order to recreate the data that was modified. I started getting overwhelmed by the second or third class thinking this is crazy doing this by hand I won't be able to keep things straight going through this. So I created an application that would go through a number of target classes and create a list, with comments similarly to the list given for the expected auditable classes in the spring configuration. It was extremely helpful and thorough in it's output and we were able to isolate which items were missing from the list. We were able to feed back the classes we didn't want in the list to generate a new list with more exact results. Another side benefit of the application was to figure out which classes were not implementing hash code and equals methods because we were getting strange results when syncing to the remote database (adding of domain objects when they already existed and were not correctly updated). So again given a set of target classes, which was the same as the auditable list, we were able to isolate candidates that needed review from the unmanageable list of all classes to only a few. This proved quite useful and was used a few times during the project to verify these methods were being implemented.
In summary of the project, it uses recursion to iterate through the target classes, checking the parent classes as well as any classes defined in any OpenJPA annotations.
...
for (final Class< ? > clazz : targetClasses) {
recordMessage("", createComment("*** Class under change set policy " + clazz.getName() + " ***"));
recurse(clazz, null, DEFAULT_TAB_SPACE);
recordMessage("", LINE_SEPARATOR);
}
...
Each class traversed is checked for a hash code equals method. Also a log of classes recursed through is kept (for the audit table).
public static void recurse(final Class< ? > clazz, final Class< ? > subclass, final String space) {
if (clazz == Object.class) {
return;
}
if (processedClasses.contains(clazz)) {
recordMessage(space, createAlreadyCoveredComment(clazz, subclass));
return;
}
// log against the class itself
if (subclass != null) {
recordMessage(space, createComment("Parent class of " + subclass.getName()));
}
if (ignoredClasses.contains(clazz)) {
recordMessage(space, createComment("Ignoring class element " + createValueElement(clazz.getName())
+ " but still traversing it's candidates"));
} else {
recordMessage(space, createValueElement(clazz.getName()));
}
// track this as a processed class to trim the recursive tree
processedClasses.add(clazz);
// recurse on parent
recurse(clazz.getSuperclass(), clazz, space + DEFAULT_TAB_SPACE);
// get object classes used in annotations
final List<Class< ? >> annotatedClasses = getAnnotatedClasses(clazz);
// track object classes with no customised hashCode/equals methods
trackNoCustomHashCodeEqualsMethods(clazz);
// recurse on annotated classes
if (annotatedClasses.isEmpty()) {
recordMessage(space, createComment("No annotated classes under " + clazz.getName()));
} else {
recordMessage(space, createComment("Start annotated classes under " + clazz.getName()));
for (final Class< ? > annotatedClass : annotatedClasses) {
recurse(annotatedClass, null, space + DEFAULT_TAB_SPACE);
}
recordMessage(space, createComment("End annotated classes under " + clazz.getName()));
}
}
All recursed classes are added to a processed classes list to trim the recursion path and potentially weed out any circular dependencies. A full listing of the project is included as an attachment with this article.
As a post-commentary, perhaps the initial target classes could be populated from the list of persistent classes in the persistence-renamed.xml file instead of being hard-coded, but it did the trick in identifying areas of concern. We were able to modify the comments for the audit table so that it would generate a new list as we liked on demand, rather than going through things by hand again. You can try it out and tailor it as you wish to suit your own purpose. I placed it within the com.elasticpath.tools/com.elasticpath.tools.sync project under com.elasticpath.tools.sync.utils.impl for convenience. You can set it up within eclipse by right clicking on it and choosing either run or debug. Then in the debug configurations you can configure it in the arguments section to a different location or file name for the reports.