20-CS-110-001 Introduction to Computer Science Fall 2010

Lab Number 2

Algorithms

Due: October 12, 2010
Submit three English language descriptions of algorithms in one file via blackboard. See submit page for instructions (that page talks about matlab files but the procedure works for any files).

Rationale:
    In theory, if you understand how the computer performs actions in its rather limited environment, then you will be better able to develop algorithms and be better able to translate those algorithms to computer programs. In this lab you will attempt to solve some fairly fundamental, yet relatively simple problems using some set of actions which is quite limited and mimicks some of the actions that are available to a computer program.

Submission:
    Please submit three English language solutions to the problems presented below in a single file via blackboard. See the submit page for details. The solutions should use the operations stated for each problem. If you need additional operations, please make clear what they are.

Problem 1: Shortest Path
    Background: As you know, there are many cities in the United States of America. Some cities are directly connected to other cities by fiber optic cables (FO will mean fiber optic below). By directly connected we mean if one end of a single FO cable is in one city (say, Cincinnati) and the other end is in another city (say, Columbus) then Cincinnati and Columbus are directly connected. One city may be directly connected to several other cities but, for the purposes of this exercise, we will say not more than ten others.

Electronic messages can be sent between two cities that are directly connected. Messages can also be sent between two cities, call them Corigin and Cdest, that are not directly connected provided there is a sequence of cities {C1,C2,...,Cn} such that, for all i={1,2,..n-1}, Ci and Ci+1 are directly connected, Corigin is directly connected to C1, and Cn is directly connected to Cdest. Call such a connecting sequence of cities a path between Corigin and Cdest. Call each of the cities C1,...,Cn on a path an intermediate city. A message may travel along a path from Corigin to intermediate city C1 and so on to intermediate city Cn and finally to Cdest. There may be many possible paths between a given pair of cities but every message uses exactly one path in order to get from its city of origin to its destination city.

Although messages move rapidly from one end of a FO cable to the other, they encounter a significant delay when they are switched from one FO cable to another in an intermediate city. Hence, a message taking a path with fewest intermediate cities will arrive in less time than a message taking any other path (between the same two cities) with more intermediate cities even if such a path covers fewer miles.

Now, suppose a communications company wants to choose the fastest communications route between a specified pair of cities for a given customer. That is, the communications company wants to select a path connecting specified cities Corigin and Cdest over which messages reach their destination in the shortest time (minimum number of intermediate cities encountered) compared to all other possible paths between those cities. Our job is to provide a program that can choose the best path given a list of FO cables (including their connecting cities). Hypothetically, the company will run our code and construct routing tables for their customers from the result.

The problem: Given a collection of cities, a listing of pairs of cities that are directly connected, and two distinguished cities Corigin and Cdest, find the minimum distance (shortest) path between Corigin and Cdest where distance of a path connecting two cities is defined as the number of intermediate cities encountered on that path.

An algorithm: The applet below illustrates a simple way to solve this problem. The algorithm works as follows:

  1. Start out with all cities unassigned and set i to 0
  2. Assign value 0 to Corigin
  3. Repeat the following until Cdest is visited:
        Visit those unassigned cities that are connected directly to cities assigned the number which is the value of i and assign them the value of 1 plus the value of the number i. For each newly visited city record the neighbor from which it was reached in a variable called back. Increment i by 1.
  4. Trace a path from Cdest to Corigin using the path information saved in back of each city - that is the solution.
Observe that the number assigned to each city is the distance of the shortest path between Corigin and that city.

In this example Corigin is city 'A' and Cdest is city 'J'. Hit "Next..." and all the neighbors of city 'A' are visited. Shown beside those cities in parentheses are the values of the back variables of each city. Hit "Next..." again for further progress. After four hits 'J' is reached. The path back to 'A' is from 'J' to 'I', from 'I' to 'F', from 'F' to 'B', and from 'B' to 'A'.

Limitations of the computer environment do not allow this algorithm because the computer cannot perform several tasks simultaneously. However, we can use a Queue to simulate this effect. A Queue is a list where objects are added only on the right and removed only on the left. The operations available to a computer for this example are:

  1. Add a neighbor of the current city to the Queue
  2. Remove a city from the Queue and make it the current city
Use the applet below to help develop an algorithm that depends on using the Queue. Operation 1. above is executed by clicking one of the buttons labeled from 'A' to 'K'. Operation 2. above is executed by clicking the button 'Queue to Current'. The current city is shown in the textfield of that label. Write an English language description of the algorithm.

Problem 2: Topological Sort
    Background: A factory which produces complex machinery on a single assembly line may need to take some care in planning the order of subassembly or component completion on the line: if a component type is scheduled for completion after it is supposed to be installed then a problem will certainly exist. The problem is compounded by the fact that duplicates of the same component type may be used many times in assembling other components which themselves may be used many times and so on.

We need to find an order of assembly of components on a single production line so that no component requires a subassembly that has not been built earlier in the line. A visual depiction of the problem is shown in the figure below

Component types are depicted as circles and 10 components, labeled alphabetically from A to H, are shown in the figure. In the figure, observe a line is drawn the two circles representing A and D with an arrowhead pointing in the direction of D. That line tells us that we cannot build A components before we build D components. The figure shows a total of 18 such dependencies. If the component assemblies are scheduled in the following order

     E, G, H, B, F, D, A, C, I, J
then all the components can be assembled successfully. An ordering such as this, where all dependencies are satisfied, is called a topological sort (in this case, of components A through J). If the figure were such that for some circle you could visit one or more other vertices by moving in the direction of the arrows, and then return to the vertex at which you started, then a topological sort is impossible.

The problem: Given a collection of component types, and a collection of dependencies, each between pairs of component types, produce a (horizontal) list of types so that, from left to right, no type depends on a type that is to its right in the list.

An algorithm: One way to solve this problem is depicted here. Study that description then use the applet below to help develop an algorithm that depends on using a Stack.

A Stack is a list where objects are added only on the left and removed only on the left. The operations available to a computer for this example are:

  1. Add a dependent component type of the current component type (called a neighbor of the current component) to the Stack
  2. Add the current component type to the Stack
  3. Remove a component type from the Stack and make it the current component type
Operation 1. above is executed by clicking one of the buttons labeled from 'A' to 'K'. Operation 2. above is executed by clicking the button "Current to Stack". Operation 3. above is executed by clicking the button "Stack to Current". The current component type is shown in the textfield of that label. Write an English language description of the algorithm.

Problem 3: Minimum Cost Network
    Background: A power company needs to connect a power plant in one city to a number of other cities and wants to erect the cheapest power distribution grid that will do the job. The company knows the cost of erecting power lines between all pairs of cities. It needs to select those routes that minimize total cost while connecting all the cities.

For example, the following is a table showing fictitious costs of ejecting power cables between pairs of four familiar cities.
ClevelandColumbusCincinnatiChicago
Cleveland-12096300
Columbus120-43178
Cincinnati9643-200
Chicago300178200-
Note that the costs do not correspond with distances between cities: this mimicks some, but not all, real-life situations (reflecting a natural barrier such as a mountain to cross - thereby increasing cost).

For the above example, the minimum cost distribution grid consists of three power lines: one connecting Cleveland to Cincinnati, one connecting Cincinnati to Columbus and one connecting Columbus to Chicago, for a total cost of 317. In general, there are n-1 power lines in a minimum cost distribution grid connecting n cities.

The problem: Given a collection of cities and the cost of routing power cables between every pair of cities, find the minimum cost network of cables that connects every pair of cities (as in problem 1 above). Network cost is the sum of the costs of all power cables used. Cables must be erected between cities. Cable junctions exist only at cities (splicing of a crossing pair of cables between cities is not allowed).

An algorithm: Use the applet below to come up with an algorithm for solving this problem. The solution is a collection of power cables, taken from the given collection. Initially, there are no cables in the solution. Your algorithm will add cables to the solution. There are at least the following two operations that are available:

  1. Place a power line in the solution
  2. Test whether the cities at opposite ends of a power line are already connected by power lines that have been added to the solution.
We do not want to give you all available operations as that gives the solution away.

Click on a number to put that power line into the solution. The test of operation 2. is made by the computer and shows up as an error at the bottom of the applet if there is a mistake about putting a power line into the solution.

Alternatively, consider the following approach to solving the problem:

  1. Create solution matrix S, initially empty, that will contain cables;
  2. Create City vector C, initially empty, that will contain cities;
  3. Create a reference table R where the ith row will contain the
     minimum cost of connecting city i to a city in C (R(i,2)) and the
     city in C that city i would connect to (R(i,1)) to get that cost;
     initially, R(i,2) = ∞ for all i.
  4. Assume a cost matrix Cost where Cost(i,j) is the cost of
     connecting city i to city j;
  5. Find the least cost cable q (i,j such that Cost(i,j) is minimum)
     and do the following:
  6.   choose an endpoint city c in q and add it to C;
  7. Repeat the following until all cities are in C:
  8.   for all cities r that are not in C do the following:
  9.      if Cost(r,c) < R(r,2) then
 10.         R(r,2) = Cost(r,c);
 11.         R(r,1) = c;
 12.   find c not in C such that R(c,2) is least over all rows of R
 13.   add c to C
 14.   add cable <c,R(c,1)> to S
 15. Print the solution

Why would this work?

Claim: Just after line 11. is executed, if city i is not in C then R(i,2) is the least cost connection that can be made from city i to a city in C and city R(i,1) identifies the city in C that makes that connection.

Justification: After the first time lines 8. to 11. are executed, R(i,2) contains the only cost of connecting city i to city c, which is the only city in C, and R(i,1) is c (if there is no connection from the ith city then R(i,1)=∞). So, the claim holds after the 1st execution of lines 8. to 11. Suppose that the claim holds after the kth execution of lines 8. to 11, k≥1. Since these are the only lines which change R(i,2), the claim holds prior to the k+1st execution of those lines. But R(i,2) and R(i,1) are changed only if there is a lower cost connection to the last city c that was placed in C (either from line 13. or line 6.). There cannot be a lower cost connection to a city in C because it would have been the value of R(i,2) on the kth iteration, according to the hypothesis above, and the only new connection to a city in C is the one to R(i,1). Hence, the claim holds on the k+1st execution as well.

Claim: Just after line 14. is executed, regardless of the iteration of the Repeat loop, the cables in S are a minimum cost network over the cities in C.

Justification: A cable is added to S only in line 14. Just after executing line 14. for the first time, the two endpoints of the least cost cable are in C and the cable connecting them, the obvious minimum cost network for the two endpoints, is in S. Hence the claim holds after the first execution of line 14. Now, suppose the claim holds just after the kth execution of line 14, k≥1. By the first claim, R(i,2) is the least cost connection from city i to a city in C and R(i,1) is the city in C that city i connects to at minimum cost just prior to the k+1st execution of line 12. Line 12. finds the city c that connects to some city, namely R(c,1) in C, at lowest cost. Consider cables in A=S∪{<c,R(c,1)>}. If there exists a least cost network B connecting cities in C∪{c} which has lower cost than network A, it would not contain cable <c,R(c,1)> since that would violate the hypothesis that S is a least cost network connecting cities in C after the kth execution of line 14. But, then, in network B, replace the connection from c to another city in C with <c,R(c,1)> and the result will be a network of less cost B, violating the hypothesis that B is the least cost network. Hence, S∪{<c,R(c,1)>} is the lowest cost network connecting cities in C∪{c}. Since c is added to C in line 13. and <c,R(c,1)> is added to S in line 14., the claim (on the updated C and S) holds after the k+1st execution of line 14.