Sep 10, 2011

What is Hashing , HashTable, Hash Function and its collision resolution strategies

Hashing is the technique used for performing almost constant time search in case of insertion, deletion and find operation. Taking a very simple example of it, an array with its index as key is the example of hash table.
So each index (key) can be used for accessing the value in a constant search time. This mapping key must be simple to compute and must helping in identifying the associated value. Function which helps us in generating such kind of key-value mapping is known as Hash Function.

Hash Table a.k.a Hash Map is a data structure which uses hash function to generate key corresponding to the associated value.

lets look at some sample hash function for strings

Folding Method:-
int h(String x, int D)
     int i, sum;
     for (sum=0, i=0; i<x.length(); i++)
         sum+= (int)x.charAt(i);
     return (sum%D);

Cyclic Shift :-
static long hashCode(String key, int D)
  int h=0;
  for (int i=0, i<key.length(); i++)
        h = (h << 4) | ( h >> 27);
        h += (int) key.charAt(i);
  return h%D;

good link for hash function on string : click here

Coming to very important part of hashing , which is collision resolution. Since its always not possible to design perfect hash function with minimal overhead which would generate unique key. To address this problem following are the two main collision resolving techniques :-
1) Open Hashing also known as separate chaining 
2) Closed Hashing also known as open addressing

Lets understand the difference between them
1) Open Hashing :- In this strategy collision is resolved by keeping the conflicting element in a list. That is to keep all element in a list which generate same hash.
Open Hashing

From above figure its clear that how collision get resolved by keeping a linked list.

2) Closed Hashing :- In this strategy collision is resolved by placing the conflicting element near to the slot generated by the hash function. Associated with closed hashing is a rehash strategy:
     “If we try to place x in bucket h(x) and find it occupied, find alternative location h1(x), h2(x), etc. Try each in order, if none empty table is full,”
Lets take an example to understand it

Input data :- a,b,c,d   Hash for them H(a) = 0, H(b) = 3, H(c) = 7 and  H(d) = 3

Now as 'c' and 'd' has same hash, where to insert 'd' then ?
Finding position using linear hashing :
h1(d) = (h(d)+1)%8 = 4%8 = 4

Adding 1 to hash function of h(d) we get new position 4, and slot 4 is currently non occupied. So entering d at position 4. In this way Closed hashing works.

Disadvantage of closed hashing is that it consumes more space as compared to open hashing 
also it has less flexibility in accommodating for duplicate hash element.
Major advantage of closed hashing is that it reduces the overhead of introducing new data structure and reduces cost of new memory allocation per new element insertion.


its good with precise informtion

thanks dear
you give very simple language to understand
once again thnx.......

This is the most difficult and confusing portion which I am trying from several days to learn. A big thanks to you for simplifying all the things in such a simple and easy way.
e signatures

in example for closed hashing above 'c' is taken instead of 'd'.

Good can u give more explanation on other methods of hashing such as mid-square, multiplication and division....

very helpful information thanks

Very useful information but it will be even better if you provide questions to workout like Geeks for geeks.Try to update..

Post a Comment