20-CS-122-001 Computer Science II Spring 2012
Homework Assignment 5

Virtual functions, classes, inheritance, lists, queues, stacks, applications

Due: May 10, 2012 (submit instructions: here)

Rationale:

Similar to the Minimum Cost Network Problem, the solution is another simple application of the Greedy Method. However, there is a slight complication.

Purpose:

Use classes in a meaningful way, to represent objects. Read input data from a file; make multiple passes through the data file to avoid using constants to establish memory for data structures. Think about solving a problem of complexity greater than those of the first two homeworks.

Homework Problem:

Write a C++ program for solving instances of the following problem known as the Integer Deadline Scheduling Problem:

Given: A collection of jobs. Each job has unit processing time, a deadline d, and a payoff p. A single processor capable of processing one job, from start to finish, at a time.

Find: A schedule of jobs to be processed by the single processor such that the total payoff of all jobs completing before their deadlines is maximized.

Example:

 Job d p 1 5 1328 2 4 134 3 3 1823 4 2 783 5 1 993 6 5 111 7 4 256 8 3 882 9 2 392 10 1 444 11 1 1200 12 6 411 13 5 1200 14 4 800 15 5 1000

A schedule of jobs 11,3,15,13,1,12, in order, has a payoff of 6962.

Method:

input: A collection of jobs with deadline and payoff
output: An optimal schedule for maximum total payoff of early jobs

```    1. Order the jobs by decreasing payoff.
2. Let S represent the solution schedule and make it empty.
3. Repeat the following until all the jobs are considered.
3a.   Choose the next job j in the ordered list.
3b.   Check whether all jobs in S plus j can be permuted
so that all jobs can finish before their deadlines.
3c.   If not, forget j and continue the loop.
3d.   Otherwise, include j in S.
4. Return S.
```

Analysis:

1. The analysis is quite simple: just show this is a Maximum Cardinality Matroid Maximization problem; the Greedy method then gives the optimal solution. The only problem is you probably do not know what a Matroid is. If you are curious see this link or wait a couple of years to see it in Design and Analysis of algorithms. If you are wondering how to top Matroids, check out this link to Greedoids. Then there are also PolyMatroids. Aren't you lucky to be in a field with so much to learn about?

2. The hardest part of the design task is to come up with an efficient approach to finding the permutation mentioned in 3b. We want you to use ideas illustrated in this presentation and described as follows.

Observe that it is not necessary to maintain all currently scheduled jobs in increasing order of deadline. Any placement of pointers to jobs in schedule is acceptable as long as, for all i such that schedule[i] references a job, the job of schedule[i] has a deadline that is no greater than i+1. Hence, (first try - to be modified later) we can do the following. Before instantiating the Schedule object, determine the maximum deadline of all jobs in the specified file. Create array schedule of class Schedule with size equal to this number (this size is passed through the constructor of the Schedule class). Initialize all elements of schedule to NULL. Let insert operate as follows where job is a reference to an object of class Job and is the input to insert. Suppose the deadline of the object referenced by job is d. Look at schedule[d-1]. If this element is NULL, set schedule[d-1] = job and return true. Otherwise, scan the elements schedule[i], for decreasing i, starting at i = d-2, until either schedule[i] is NULL or i = -1. In the later case return false. Otherwise, set schedule[i] = job and return true.

The above approach has two inefficiencies that are not acceptable. First, what if the maximum deadline of some job is some unrealistically large number? This may happen if some Job objects always pay off (that is, have no deadline) in which case some number representing infinity might be used for deadline. Fortunately, this can be handled easily: initially set the size of the array schedule to the number of input jobs and, during insert, for any input that references a Job object whose deadline is greater than the number of jobs, begin the search for a NULL value at the array element of maximum index.

The second inefficiency is that the search for a NULL during the invocation of insert will repeatedly and unnecessarily check schedule array elements that, on previous invocations of insert, have already been determined to reference Job objects. One fix is to add to class Schedule an array, called deadline_idx, and of size equal to that of array schedule. Then deadline_idx[i] specifies the greatest j<i such that schedule[j] is NULL. Unfortunately, the overhead in maintaining deadline_idx as described above is too great. We choose a "lazy" implementation which relaxes the meaning of deadline_idx[i] to specify some j'>j to try but does not guarantee that the value of schedule[j'] is NULL. When it is convenient, deadline_idx[i] is changed, each time advancing toward the current maximum j such that schedule[j] is NULL.

With this relaxation, an invocation of insert works as follows. First deadline_idx[d] is checked. Say this value is j1. Then schedule[j1] is checked for NULL. Suppose not. Then deadline_idx[j1] is checked. Say this value is j2. Then schedule[j2] is checked for NULL. If not, deadline_idx[j2] is checked and so on until, for some jx that is encountered along the way, schedule[jx] is NULL or the value of deadline[jx] is negative . During the hunt for NULL, each index of deadline_idx that is visited is stored in some list, say L. Upon completion of the hunt, the value of deadline_idx[i] is changed to either -1 (if no NULL was found) or to deadline_idx[schedule[jx]] (since an object is now referenced by schedule[jx]) for each i in L.

Requirements:

1. Functional requirements:
1. The program shall read a text file and display the results to the console.
2. The input file shall be identified on the command line (that is, use int main(int argc, char **argv)).
3. The input file shall contain printable ascii characters only plus newline characters for terminating a line.
4. Each line of the input file shall contain a sequence of three positive integers, separated by blanks, corresponding to data applicable to a single job. The first of three numbers is the job identity, the second is the deadline (given as a positive integer), the third is the payoff.
5. See dead.10.dat (output: results.10) for an example input file.
6. The output will consist of a list of identities of jobs in the order they should be processed from left to right, plus the total profit attained.

2. Performance requirements:
1. Time: Less than 1 second to read a 10000 object file such as dead.10000.dat (output: results.10000) on a hard drive of 20 MB/s throughput. Less than 1 second to solve a 10000 job input on a 1 GHz or faster i386 or compatible architecture.
2. Space: Less than 2MB of RAM for 10000 objects (my results: results.txt).

3. Implementation requirements:
1. Use a Job class to organize all job information as follows:
```class Job {
friend class Schedule;
friend int profitCompare (const void *, const void *);
friend ostream & operator<<(ostream & out, Schedule & sched);

public:
Job (int id, int p, int d) { identity = id; deadline = d; profit = p; }
};
```

where profitCompare is given as:

```int profitCompare (const void *x, const void *y) {
if ((**(Job **)x).profit < (**(Job **)y).profit) return 1;
if ((**(Job **)x).profit > (**(Job **)y).profit) return -1;
return 0;
}
```

2. Use a Schedule class to maintain an incrementally changing schedule of jobs:

```class Schedule {
friend ostream & operator<<(ostream & out, Schedule & sched);
Job **schedule;
int index;         // identifies the number of jobs in the schedule

public:
Schedule (int njobs) { schedule = new Job*[njobs]; index = 0; }

// The "insert" procedure checks jobs in the existing schedule,
// which are assumed to all complete before their deadlines.
// If there is no way for "job" to be added to the schedule
// so that all jobs can be compleleted before their deadlines,
// or if there is no array space left for referencing another
// job, then "false" is returned and the schedule remains untouched.
// Otherwise, "job" is added to the schedule and "true" is returned.
bool insert (Job *job);
};
```

The following is the overloaded "<<" operator.

```ostream & operator<<(ostream & out, Schedule & sched) {
long long total=0;
for (int k=0; k < sched.njobs ; k++) {
if (sched.schedule[k] != NULL) {
out << "[" << sched.schedule[k]->identity
<< ":" << sched.schedule[k]->profit
<< "," << sched.schedule[k]->deadline << "] ";
total += sched.schedule[k]->profit;
}
}
out << "\nTotal Profit: " << total << "\n";
return out;
}
```

3. Use the ideas in this presentation to implement the insert method of class Schedule..

4. Use a function called readData to read the data from a file:

```// Read in data from file.  Assume file contains all distinct jobs
// (that is, it is possible that two jobs have the same deadline
// and profit but they are still considered distinct).  You may use
// a heap to sort the jobs by profit "on the fly".
```

5. The following main goes with the above:

```int main(int argc, char **argv) {
int njobs=0;
Job **jobs;

if (argc != 2) {
cout << "\nUsage: " << argv[0] << " \n";
exit(0);
}

// Read all jobs from file and sort. Variable njobs will get the number of
// lines in the input file.  Returns a list of Jobs.