20-ECES-228-001 Data Structures Fall 2007

Homework Assignment Number 3

Linked Lists and MergeSort

Due: October 12, 2007
Submit source code implementing linked list version of the MergeSort class to xxxxx@email.uc.edu

Rationale:

    A drawback to an array-based implementation of mergesort is that the "merge" of two increasing lists (arrays) of integers requires the movement of objects from the lists being merged to a third list representing the merged list. In this exercise you will eliminate the overhead of this sort of movement without incurring a large penalty for finding elements of a list by implementing mergesort using linked lists.

MergeSort:

    Given a list L of objects for which the operator < is defined (less than), the mergesort operation on L is given recursively by the following pseudocode:
    L mergeSort(L) {
        if (L has 1 element) return L;
	Let L1 be the first half of L;
        Let L2 be the other half of L;
	Let S1 = mergeSort(L1);
        Let S2 = mergeSort(L2);
        return merge(L1, L2);
    }
where merge(L1, L2) outputs a list of all objects in both L1 and L2 in increasing order.

Homework Problem:

    Write a MergeSort class implementing a mergesort on a given list of objects of any kind. The objects are constructed from data given in a file. The constructor of this class should have two arguments. The first is a pointer to a value function of the following prototype:
    long value(void *object); 
which is used to determine the order of objects in the input list. The input to value is a pointer to a list object and the output is its value. The second constructor argument is an object creation function prototyped as follows:
    void *createObject(fstream &);
which is used to create objects from data contained in a file. For example, for the simple class:
    class A {
       int number;
    public:
       A (int number) { this->number = number; }
       int valueOf() { return number; }
    };
assuming the data file contains a list of integers that are to be used as arguments for A constructors, the needed functions are:
    long value (void *object) { return ((A*)object)->valueOf(); }

    void *createObject(fstream &fin) {
       int number;
       fin >> number;
       return (void *)new A(number);
    }
The creation of a MergeSort object for a list of objects of class A would proceed as follows:
    MergeSort *ms = new MergeSort(value, createObject);

You will demonstrate the use of the MergeSort class on a list of ints. That is, write a program to sort a file of integers using your MergeSort class. Examples of input files are merge.3.dat (7 numbers), merge.10.dat (1024 numbers), and merge.16.dat (65536 numbers). This program will contain the value and object creation functions as well as the declaration of a MergeSort object for sorting the integers, the invocation of a method which reads the data, and the invocation of a sort method. Call is mergesort.cc.

Implement the MergeSort class in a file called mrgsrt.cc. The methods and data of this class are given by the following include file, called mrgsrt.h:

    #ifndef _MERGESORT
    #define _MERGESORT
    #include <fstream>
    #include "cell.h"
    using namespace std;

    class MergeSort {
       // Private Data
       long (*value) (void*);         // Value function
       void *(*create) (fstream&);    // Object creation function
       Cell *data;                    // Pointer to unsorted/sorted structure
       bool sorted;                   // true iff list is sorted
       // Private methods
       Cell *installData(fstream&, int);  // Creates input list from file data
       Cell *merge (Cell*, Cell*);        // Merges two increasing lists
       Cell *mergeSort(Cell*);            // Sorts list pointed to by Cell object
       void showTree(Cell*);              // Display the input list

    public: MergeSort (long (*)(void*), void *(*)(fstream&)); 
       void sort();          // Sort the input list using method mergeSort
       int readData(char*);  // Read data from file and build list using installData
       void showOutput();    // Display sorted list
       void showInput();     // Display unsorted list
    }; 
    #endif 
Private data consists of pointers to the value and object creation functions that are passed through the constructor, the data pointer serving the dual role of pointing to the unsorted list before a sort and the sorted list after the sort, and the sorted variable of type bool which has value true if and only if the list is sorted. There are private methods which operate on Cell objects (described below). The Cell objects are internal "carriers" of objects and are not available to users. Private methods include installData which builds all list objects from the data file and returns a pointer to the resulting list structure (which is more complex than a singly linked list - see below), merge which returns a singly linked list that is the merge of two input, ordered, singly linked lists, mergeSort which uses merge to sort a given input list, and showTree which is intended to display an unsorted input list. Methods available to a user, besides the constructor, are readData which creates an internal structure from a given data file name, pointed to by data, and holds a list of objects to be sorted (uses the private installData method and unsets sorted), sort which replaces the unsorted list pointed to by data with a sorted list (sets sorted and uses the private method mergeSort), and showInput and showOutput which display the unsorted and sorted lists, respectively.

The Cell class is protoyped as follows in mrgsrt.h:

    #ifndef _CELL
    #define _CELL
    #include <iostream>
    using namespace std;

    class Cell {
       friend class MergeSort;
       void *object;         // Pointer to an object, if terminal
       Cell *left, *right;   // Pointers to left and right sublists

    public: 
       Cell (Cell*, Cell*); // Called when building intermediate nodes 
       Cell (void*); // Called when building terminal nodes 
    };
    #endif 
The reason for the left and right pointers is to separate the left half of a list from its right half. The pointers allow the separation to take place in one step: the merge proceeds on the two sublists pointed to by these pointers. Details are given in the next section.

Linked list implementation of Mergesort:

    The stucture of data to be sorted will be based on linked lists. The structure needs to be more complex than a singly linked list: although merging two singly linked lists is straightforward and efficient, finding the middle of a singly linked list is highly inefficient as it requires visiting half of the elements in the list.

A solution is to use Cell objects that have two pointers, denoted left and right. The idea of using Cell objects to "carry" objects in a singly linked list was introduced in CSII: there, each Cell object had a single next pointer to the next Cell object in the list. We expand on this idea by using Cell objects in two different ways: to carry objects and to point to list middles.

There is a root Cell whose left pointer references a structure containing the first half of the given list and whose right pointer references the other half. Specifically, these pointers reference Cell objects that are intermediate roots for the first quarter, second quarter, third quarter, and fourth quarter of the list. This nesting continues until reaching root Cells referencing single objects. The following diagram illustrates the structure.

Mergesort on this structure begins on the bottom level. All pairs are merged into singly linked lists. Observe the merged lists are all sorted and that the left pointer of Cell objects on the level above point to the head of each merged list. Then, on the new bottom level these lists are merged. The process continues until the entire list is sorted. The following pictures show this in stages

After first stage - bottom row pairs combined to sorted lists

After second stage - bottom pairs of sorted lists merged

After third stage - bottom pairs of sorted lists merged

When the mergesort is completed, the left pointer of the root references the head of the sorted list.

Code hints:

    Some of the methods of the MergeSort class are given below. By studying these examples you should be able to build the entire class.
    /***--------------------------------------------------------------
       readData:
          Input:  A filename
          Output: Variable data is the root of a linked structure
                  holding all objects created from the data file
    ------------------------------------------------------------------***/
    int MergeSort::readData(char *filename) {
       fstream fin;
       fin.open(filename, ios::in);
       if (fin.fail()) {
          cout << "Cannot open " << filename << "\n";
          exit(0);
       }
       sorted = false;
       int count = 0;
       long number;
       while (fin >> number) count++;
       fin.close();
       fin.clear();
   
       fin.open(filename, ios::in);
       data = installData(fin, count);
       fin.close();
       fin.clear();
       return count;
    }

    /***--------------------------------------------------------------
       installData:
          Input:  A file input stream and the number of data objects
                  in that file
          Output: A pointer to the root of a tree structure containing
                  objects created from the file data
    ------------------------------------------------------------------***/
    Cell *MergeSort::installData(fstream &fin, int n) {
       int dleft;

       if (n == 1) {
          void *object = create(fin);
          return new Cell(object);
       }

       for (dleft=0 ; ((1 << dleft) < n/2) ; dleft++);
       Cell *q1 = installData(fin, (1 << dleft));
       Cell *q2 = installData(fin, (n - (1 << dleft)));
       return new Cell(q1,q2);
    }

    /***--------------------------------------------------------------
       mergeSort:
          Input:  A tree linked on Cell left and right, value cells
                  are on the bottom level
          Output: A linked list, in increasing order, of objects in
                  the input tree.  List is linked on Cell left
    ------------------------------------------------------------------***/
    Cell *MergeSort::mergeSort(Cell *c) {
       if (c == NULL) return NULL;
       if (c->left == NULL) return c;
       return merge(mergeSort(c->left), mergeSort(c->right));
    }