Machine Problem 2: Building a Hashtable

Objectives

  1. Define and use data types and linked structures in C
  2. Use C pointers
  3. Use the C standard library dynamic allocation API (malloc and free)
  4. Use C standard library string manipulation functions
  5. Write a non-trivial, robust, leak-free C program

Overview

For this machine problem you will complete the implementation of the hashtable data structure found in the mps/02 subdirectory of your lab code repository.

A hashtable is a mapping data structure used to associate keys with values --- in our implementation, keys are always going to be strings, while values will be arbitrary objects (though in our tests they'll also be strings). Your implementation will be built around an array of linked-lists of "buckets" (each of which contains a key/value pair). These structures will all be dynamically allocated, and you will assume that all keys and values passed to the put operation are also themselves dynamically allocated and therefore need to be free'd when unneeded (e.g., when values are updated, keys removed, or the hashtable itself deallocated).

When a key/value pair is inserted into the hashtable, its position in the array will be determined by the hash value of the key string, computed using the provided hash function (unsigned long hash(char *str);). To accommodate "collisions" (i.e., one or more key/value pairs mapping to the same array slot), a node for the key/value pair will be added to the linked-list whose head is pointed to by the corresponding array entry. If the provided key already exists in said list, however, its value will simply be updated (and the old key & value free'd).

The following functions make up the API for the hashtable you are to implement:

You should only change the hashtable.c file, and you should leave the hash function untouched, so that the bucket location of each inserted key and the output of your program can be checked against our reference implementation.

Reminder: Git usage

Remember to (git fetch upstream ; git merge upstream/master) before starting on this machine problem in order to grab my additions to the public repository. Also: commit often, and remember to push when you hit milestones.

Testing and Evaluation

Build the hashtable executable using the default Makefile target --- i.e., by just typing "make".

The main.c file contains code that will run your implementation through its paces based on commands found in one of six tracefiles (trace[01-06].txt). Each tracefile starts with the initial size of the hashtable to be created, and each line takes one of the following forms:

The following are the contents of trace01.txt:

10
p one two
p three four
p five six
p seven eight
p nine ten
info
g one
g three
g five
g seven
g nine
g two
g ten

To run your hashtable through on a given trace, simply use the command "make test[01-06]". E.g.,

$ make test01
Creating hashtable of size 10
Inserting one => two
Inserting three => four
Inserting five => six
Inserting seven => eight
Inserting nine => ten
Printing hashtable info
Num buckets = 5
Max chain length = 3
Avg chain length = 1.67
Looking up key one
Found value two
Looking up key three
Found value four
Looking up key five
Found value six
Looking up key seven
Found value eight
Looking up key nine
Found value ten
Looking up key two
Key not found
Looking up key ten
Key not found

To compare your output against the reference output, use the command "make diff[01-06]". If there is no output, you've successfully passed the trace file. Any output will indicate differences between your output (prefaced with < symbols) and the reference output (prefaced with > symbols). E.g.,

make diff02
14,16c14,16
< Num buckets = 7
< Max chain length = 4
< Avg chain length = 2.33
---
> Num buckets = 5
> Max chain length = 3
> Avg chain length = 1.67
make: *** [diff02] Error 1

To check for memory leaks, you can use the command "make leakcheck". If your program has no leaks, then you should see output that looks something like this:

==22999== HEAP SUMMARY:
==22999==     in use at exit: 0 bytes in 0 blocks
==22999==   total heap usage: 40,045 allocs, 40,045 frees, 1,171,860 bytes allocated
==22999==
==22999== All heap blocks were freed -- no leaks are possible
==22999==
==22999== For counts of detected and suppressed errors, rerun with: -v
==22999== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Otherwise it will tell you how many leaks occurred, and potentially what function(s) performed the original allocation(s) that were never freed.

This machine problem is worth a total of 30 points, per the following rubric: