20-ECES-228-001 Data Structures Fall 2007

Homework Assignment Number 5A

Hash Tables and Functions

Due: November 2, 2007
Submit source code implementing the HashTable class to xxxxx@email.uc.edu

Rationale:

    This is the first of a series of three homeworks, the last two building on the previous one(s). You will develop a general hash table class in this homework. Hash tables are very handy structures that provide the benefits of arrays, namely fast access anywhere in the structure, and the benefits of linked lists, namely dynamic changes in the structure are easily accomodated. This exercise will provide experience in understanding and implementing such low level structures. You will build hash tables from another low level structure: the List class.

Review of Hash Tables:

    There are several types of hash tables and hash functions. In this assignment we focus on one of the most successful variants of hash tables known as bucket hash. The hash function used by the bucket hash table class you develop will be user defined.

There are three parts to a bucket hash table: an array of pointers to lists, the lists themselves, and a hash function. The hash function can be applied to any object and the output of the hash function is an index into the array of list pointers. In other words, every number output by the hash function is a valid index into the array and every array index will be output by a "good" hash function for some input object. The best hash functions appear to map input objects randomly to array indices. Observe that the size of the array is limited and more than one object may map to the same array index. Thus, a list of such objects is maintained for each array index. Each list holds all objects that map to the same array index.

To find an object in a hash table, apply the hash function to the object to get the corresponding array index, then search the list for the object. This means some value function, for the sake of making comparisons, must also be supplied by the user of a hash table.

The performance of a hash table depends on the hash function and the ratio of number of stored objects to the size of the array of list pointers. Typically, a ratio of 2-3 is optimal (good compromise of space and time).

Homework:

    Implement a hash table class with the services listed in the following hashtable.h file
    #ifndef _HASHTABLE
    #define _HASHTABLE
    #include <iostream>
    #include "list.h"
    using namespace std;

    class HashTable {
       List **table;           // The array of List pointers
       int size;               // The size of the array
       int (*key)(void*);      // The hash function

    public:
       HashTable(int);         // Constructor takes array size as argument
       void *add(void*);       // Add object to table: return NULL if there already
       void *lookup(void*);    // Find object in table matching input, return pointer to it
       void *pop(void*);       // Remove object from table, return pointer to it
       void display(ostream&); // Output contents of table in meaningful way
       ~HashTable();           // Destructor
    };

    ostream &operator<<(ostream &, HashTable *);  // overload <<
    #endif
The hash function and value function are provided externally. The following hash_funcs.h file supplies the prototypes:
    #ifndef _HASH_FUNCS
    #define _HASH_FUNCS

    // These are declarations of functions needed by the HashTable class.
    // Individualized implementations of these functions should be defined 
    // in hash_funcs_xxx.cc.
    int hashfunction(void*);
    int valuefunction(void*);
    #endif
A sample use of the HashTable class is the following:
    #include <iostream>
    #include "hashtable.h"
    using namespace std;

    class A {
       int number;

    public:
       A(int n) { number = n; }
       int valueOf() {  return number;  }
    };

    int valuefunction (void *obj) {  return ((A*)obj)->valueOf();  }

    int hashfunction(void *object) { 
       long long n = (long long)valuefunction(object);
       return (int)(n*357 % 100);  
    }

    int main() {
       HashTable *h = new HashTable(100);
       cout << h;
       cout << "Add: " 
            << ((A*)h->add(new A(12)))->valueOf() << "\n";
       cout << "Add: " 
            << ((A*)h->add(new A(264)))->valueOf() << "\n";
       cout << h;
       cout << "Add: " 
            << ((A*)h->add(new A(112)))->valueOf() << "\n";
       cout << h;
       cout << "Add: " 
            << ((A*)h->add(new A(212)))->valueOf() << "\n";
       cout << "Add: " 
            << ((A*)h->add(new A(312)))->valueOf() << "\n";
       cout << "Add: " 
            << ((A*)h->add(new A(835)))->valueOf() << "\n";
       cout << h;
       cout << "Pop: " 
            << ((A*)(h->pop(new A(212))))->valueOf() << "\n";
       cout << h;
       cout << "Pop: " 
            << ((A*)(h->pop(new A(12))))->valueOf() << "\n";
       cout << h;
       cout << "Lup: " 
            << ((A*)(h->lookup(new A(835))))->valueOf() << "\n";
       cout << h;
    }
In this case class A objects are being stored, the value function returns an int value which depends on some class A state (namely the variable number), and the hash function uses an object's value directly by multiplying by a large number and taking the result modulo the array size (the function seems somewhat random and all array indices can be hit and no others can be hit).

For convenience, you may use the List class supplied in the files list.cc, list.h, cell.cc, cell.h, and list_funcs.h. The functions required by the List class are the valuefunction described above, a displayfunction, and a comparefunction. Such functions suitable for class A objects are

    void dispfunction(void *obj) {  cout << valuefunction(obj) << " ";  }

    int cmpfunction(void *obj1, void *obj2) {
       return (valuefunction(obj1) == valuefunction(obj2)) ? 1 : 0;
    }
Test your code on storing and retrieving int objects (created using new int(10), for example).