20-CS-122-001 Computer Science II Spring 2012
Homework Assignment 2

Virtual functions, classes, inheritance, lists, queues, stacks, applications

Minimum Cost Network

Due: April 14, 2012 (submit instructions: here)

Rationale:

It may seem to some of you that this problem is asking too much given your limited programming skills at this point. Please try it anyway. You should be able to produce some kind of solution, even if it is not efficient, based on a simple use of arrays. Later in the course we will revisit this problem to show how many different ways it can be solved and to investigate the efficiency of each approach. You will be amazed how much faster one version is than another. We will probably do a fair amount of programming this quarter and I thought this would be a great problem to start out with.

If you feel over-challenged by the problem below, read In Case of Trouble at the end of this page.

Purpose:

Use class objects in a meaningful way. Use arrays in a non-trivial way to represent data elements that can be used to make tests on the structure of a dynamic data base. Read input data from a file; make multiple passes through the data file to avoid using constants to establish memory for data structures. Use qsort; implement a function required as input to qsort for comparison.

Homework Problem (Objective):

Write a program to solve the Minimum Cost Network Problem. This problem and an outline of the code you can write are as follows:

Minimum Cost Network Problem:

Given a collection of n cities and the cost of routing cable between every pair of cities, find the minimum cost network of cables that allows a person in one city to connect with a person in any other city. Network cost is the sum of the costs of all cables used. Cables must be laid between cities. Cables connect only at cities (splicing of a crossing pair of cables between cities is not allowed). A person in city x can connect with a person in city y only if there is a sequence of distinct cities c1 , c2 , ..., cj , none of which are either x or y, such that cables are laid between x and c1 , between c1 and c2 , ... , between cj and y. Such a sequence is called a path between x and y.

Example:

The following is a table showing fictitious costs of laying cable between pairs of four familiar cities.
 Cleveland Columbus Cincinnati Chicago Cleveland - 120 96 300 Columbus 120 - 43 178 Cincinnati 96 43 - 200 Chicago 300 178 200 -
Note that the costs do not correspond with distances between cities: this mimicks some, but not all, real-life situations (reflecting a natural barrier such as a mountain to cross - thereby increasing cost).

For the above example, the minimum cost network consists of three cables: one connecting Cleveland to Cincinnati, one connecting Cincinnati to Columbus and one connecting Columbus to Chicago, for a total cost of 317. In general, there are n-1 cables in a minimum cost network connecting n cities.

Method:

input: A collection of cable costs between pairs of cities
output: A network of cables of minimum cost

```    1. Order the cables by increasing cost.
2. Let N represent the network of cables to be output and make it empty.
3. Repeat the following until all the cables are considered.
3a.   Choose the next cable c in the ordered list.
3b.   Check whether including c in the solution N allows you to
trace more than one path between some pair of cities using only
3c.   If it does, forget c and continue the loop.
3d.   Otherwise, include c in N.
4. Return N.
```

Requirements:

1. Functional requirements:
1. The program shall read a text file and display the results to the console.
2. The input file shall be identified on the command line (that is, use int main(int argc, char **argv)).
3. The input file shall contain printable ascii characters only plus newline characters for terminating a line.
4. Each line of the input file shall contain three positive integers (as printable characters): the first two identify cities, the third one gives a cost required to connect those two cities. Each city has a unique identity.
5. The output will contain those lines of the input file which represent a solution.

2. Performance requirements:
1. Time: Less than 5 seconds to read a 1000 city file such as net.1000.dat on a hard drive of 20 Mb/s throughput. Less than 0.3 seconds to sort all entered cable objects by increasing cost on a 1 GHz or faster i386 or compatible architecture. Less than 0.1 seconds to find a solution from the sorted list of cables on a 1 GHz or faster i386 or compatible architecture.
2. Space: Less than 3MB of RAM.

3. Implementation requirements:
1. Use a class to organize all cable information as follows:
```   class Cable { public: int city1, city2, cost; };
```
For purposes of this homework the Cable class will have the same effect as
```   typedef struct { int city1; int city2; int cost; } Cable;
```
2. Create an array of type Cable to hold all data read from the input file as was done in the first homework.
3. Read file data more than once, if necessary, to avoid "hard wiring" data structure size boundaries into the code. For example, do not use something like this:
```   class Cable { public: int city1, city2, cost; };
...
const int MAX_CABLES = 1000;
...
Cable cables[MAX_CABLES];
```
to make a Cable array. Instead use something like this:
```   class Cable { public: int city1, city2, cost; };
...
fstream fin;
fin.open(argv[1],ios::in);
...
int ncables = findNumberCablesInFile(fin);
Cable *cables = new Cable[ncables];
```
4. Use qsort to sort the array of cables by increasing cost.
5. Introduce a group array of type int, indexed on cities, to help determine whether two cities are connected by cables existing in the current partial solution. Interpret group[i] as follows: group[i] == group[j] if and only if city i is connected to city j by cables existing in the current partial solution. Initially, set group[i] = i. As new cables are added to the partial solution change all group[i] with value equal to the second of the two cities connected by the cable to the value of the group number of the first of the two cities.

Here is more detail. Suppose the first chosen cable connects cities ci and cj and these cities are assigned numbers i and j, respectively, and i < j. Then, as part of step 3a., change the number assigned to cj to be i. The fact that cities ci and cj have the same number means they are connected by a path (namely the cable). The fact that they have a number different from all other cities means there is no path from either of these two to any of the other cities.

For each cable remaining in the list, as it is chosen in step 3a., determine the numbers assigned to each of the cities the cable connects. If the numbers are the same, the cities are already connected by at least one path through cables in N. Hence, in this case, in step 3c. this cable is not added to N. However, if the numbers are different, then the two cities are not connected by the cables in N so, in step 3c. this cable is added to N. This fact can be recorded as follows. Some cities are assigned the same number as that of the highest numbered city the cable connects. Reassign the numbers of those cities to equal the number of the lowest numbered city the cable connects. Thus, at the beginning of step 3a., each time through the loop, all same numbered cities are connected and different numbered cities are not connected by paths through cables in N.

In Case Of Trouble:

If the above idea is too much for you at the moment, use some brute force search that considers all possible networks (there are n! such networks - that is, n factorial, not n!), keeps track of the total costs, then outputs the lowest cost network considered. If you run such a program on all the cities in the USA, go for coffee, rather to lunch...do not wait for a result - you will die first.