Massive Technical Interviews Tips: Java IdentityHashMap

Saturday, October 24, 2015

Java IdentityHashMap

http://www.geeksforgeeks.org/identityhashmap-class-java/

This class is not a general-purpose Map implementation. While this class implements the Map interface, it intentionally violates Map’s general contract, which mandates the use of the equals method when comparing objects.

IdentityHashMap vs HashMap

IdentityHashMap uses equality operator “==” for comparing keys and values while HashMap uses equals method for comparing keys and values inside Map.
Since IdentityHashMap doesn’t use equals() its comparatively faster than HashMap for object with expensive equals().
IdentityHashMap doesn’t require keys to be immutable as it is not relied on equals()

http://blog.csdn.net/tiwerbao/article/details/8936459

java.util.IdentityHashMap 类也是一种哈希表实现的Map接口，该类不是通用 Map 实现。此类实现 Map 接口时，它有意违反 Map的常规协定，比较键/值时使用引用相等性代替对象相等性，不使用hashcode，而只使用Object.equals()。

这里相等性比较的是地址：key1==key2，所以只有两个对象的地址相等才认为相等，是最严格的相等条件。

在JVM中每个对象的地址都是唯一的，IdentityHashMap就是利用这个唯一地址判断相等的，即使对象深度拷贝，copy了所有属性，但是地址还是不同的。

这里的hash实现也和散列链表HashMap的不同，IdentityHashMap 只是散列而没有链表。这里不用Entry来存放元素，key和value全部放在table中，挨着放：talbe[i]= key; table[i + 1] = value; 所以遍历查找和读取的时候，下标都需要+ 2递增。

private transient Object[] table;

这里hash只是负责简单定位，定位后还要遍历table，插入的时候变得不准确，所以查找效率要比HashMap低一些。

IdentityHashMap 这种特殊实现，会在一些特别的场合用到，比如说区分地址不同的对象，避免重复执行，如JVM中的关闭钩子，里面就用到了这个IdentityHashMap 。见关闭钩子源码《JVM关闭钩子（2） —— 源码浅析》

put / get

添加元素的时候，会先hash定位，然后遍历table，检查是否有重复元素(校验地址相等)。如果有重复，则覆盖值；没有重复，则找到一个空位插入。

查找元素，也是同样一个逻辑，会先hash定位，然后遍历table，找到==的key，然后返回。

当然，此处遍历的时候，下标是+2递增，因为key和value都挨着存放在table中。

public V put(K key, V value) {
Object k = maskNull(key);
Object[] tab = table;
int len = tab.length;
int i = hash(k, len);
Object item;
while ( (item = tab[i]) != null) {
if (item == k) { // 使用==来判断，严格的相等。所以可以出现相同的key
V oldValue = (V) tab[i + 1];
tab[i + 1] = value;
return oldValue;
}
i = nextKeyIndex(i, len); // i + 2
}
modCount++;
tab[i] = k; // 没有使用Entry来存放元素，而是挨着放在table[]中
tab[i + 1] = value;
if (++size >= threshold)
resize(len); // len == 2 * current capacity.
return null;
}
private static int nextKeyIndex(int i, int len) {
return (i + 2 < len ? i + 2 : 0);
}

public V get(Object key) {
Object k = maskNull(key);
Object[] tab = table;
int len = tab.length;
int i = hash(k, len);
while (true) {
Object item = tab[i];
if (item == k)
return (V) tab[i + 1];
if (item == null)
return null;
i = nextKeyIndex(i, len);
}
}

另外要注意的是，由于比较的地址，所以可能会出现我们平时觉得相等的Key或Value，但是IdentityHashMap 却认为是不相等的，而没有定位到，或者进行过滤。
http://brokendreams.iteye.com/blog/1929199
IdentityHashMap中会调用System.identityHashCode(x)来获得对象的hashCode(也就是对象的hashCode方法没有被覆盖情况下的返回值)，仅用“==”来进行后面key的比较。

IdentityHashMap和HashMap内部都实现了散列表，但有区别，体现在对散列冲突的处理上。HashMap中以分离链表的方式来解决散列冲突，也就是将散列在同一个桶内的数据组织成一个链表结构；IdentityHashMap中则以开放寻址的方式来解决散列冲突，当发生散列冲突时，数据被放入下一个空闲地址(也叫线程探测法)。

添加和获取的方法实现起来比较容易，但删除一个数据就蛋疼了。需要考虑冲突的问题，如果删除的位置之前发生过n次冲突，那么删除动作不仅要删除当前位置的数据，还要把之前发生过冲突的n个元素的位置往前调整(不严格的说)，否则线性探测链就会断掉，会影响其他操作(如get)的正确性。来看一下具体实现：

public V remove(Object key) {
Object k = maskNull(key);
Object[] tab = table;
int len = tab.length;
int i = hash(k, len);
while (true) {
Object item = tab[i];
if (item == k) {
modCount++;
size--;
V oldValue = (V) tab[i + 1];
tab[i + 1] = null;
tab[i] = null;
closeDeletion(i);
return oldValue;
}
if (item == null)
return null;
i = nextKeyIndex(i, len);
}
}

前面的代码和put类似，只是删除东西完成之后还要调用一个closeDeletion方法。
大概过程是从删除的位置开始往下找(通过nextKeyIndex方法)，当存在下一个元素的时候，通过重新计算元素hash值的方式，判断下一个元素放到本容器中时是否产生过冲突。如果产生过冲突，那么将该元素放到当前位置(d、d+1)，将该元素之前位置(i、i+1)清空。然后把i赋给d，继续往下找。知道找到null位置为止。

private void closeDeletion(int d) {
// Adapted from Knuth Section 6.4 Algorithm R
Object[] tab = table;
int len = tab.length;
// Look for items to swap into newly vacated slot
// starting at index immediately following deletion,
// and continuing until a null slot is seen, indicating
// the end of a run of possibly-colliding keys.
Object item;
for (int i = nextKeyIndex(d, len); (item = tab[i]) != null;
i = nextKeyIndex(i, len) ) {
// The following test triggers if the item at slot i (which
// hashes to be at slot r) should take the spot vacated by d.
// If so, we swap it in, and then continue with d now at the
// newly vacated i. This process will terminate when we hit
// the null slot at the end of this run.
// The test is messy because we are using a circular table.
int r = hash(item, len);
if ((i < r && (r <= d || d <= i)) || (r <= d && d <= i)) {
tab[d] = item;
tab[d + 1] = tab[i + 1];
tab[i] = null;
tab[i + 1] = null;
d = i;
}
}
}

transient Object[] table; // non-private to simplify nested class access

/**

* Value representing null keys inside tables.

static final Object NULL_KEY = new Object();

private static Object maskNull(Object key) {

return (key == null ? NULL_KEY : key);

}

/**

* Returns internal representation of null key back to caller as null.

static final Object unmaskNull(Object key) {

return (key == NULL_KEY ? null : key);

}

private static int hash(Object x, int length) {

int h = System.identityHashCode(x);

// Multiply by -127, and left-shift to use least bit as part of hash

return ((h << 1) - (h << 8)) & (length - 1);

}

System

public static native int identityHashCode(Object x);

Saturday, October 24, 2015

Java IdentityHashMap

Labels

Popular Posts