20-CS-122-001 Computer Science II Spring 2012
Homework Assignment 1

Virtual functions, classes, inheritance, lists, queues, stacks, applications

Read Objects from a File

Due: April 5, 2012 (submit instructions: here)


Objectives:

The code you write in this homework will be used in most of the remaining homeworks this quarter. The primary objective is to read data contained in a file and store it in some usable manner within a C++ program. The reason we want to do this is most problems we will consider, and most real-world problems for that matter, take megabytes of data as input. It would be impractical for such large amounts of data to be input by hand to a program.

Because we would like our program to find any file we name, and not have to recompile to do it, we will input the name of the file on the command line. Thus, a second objective is to learn how to input command line information.

Homework Problem:

Write a C++ program that reads a file specified on the command line. For example, if the program is named readit then executing

    readit net.1000.dat

will read data from the file net.1000.dat which is assumed to exist in the current directory. If the file does not exist an error message should be reported.

The file you should design your code to read is net.1000.dat. This file contains about 1/2 million lines. Each line contains three numbers and represents a possible cable that might be laid to connect two cities: the first two numbers identify the pair of cities the cable would connect and the third number represents the cost of laying a cable between those two cities. For example, the line

    10 23 7883

means the cost of laying a cable between cities identified as number 10 and number 23 is 7883 million dollars.

Your program will create an array called cables that stores the file information so that it can be used in some application. Information for a single cable will be held in an object of the Cable class which will be defined as follows:

   class Cable {  public: int city1, city2, cost; };
Your program will open the given file and read all lines, keeping count in a variable that will be defined and initialized as
   int ncables = 0;
It will then create an array of Cable objects like this:
   Cable *cables = new Cable[ncables];
It will then close, reopen, and read the file data into cables like this:
   for (int i=0 ; i < ncables ; i++)
      fin >> cables[i].city1 >> cables[i].city2 >> cables[i].cost;
On files with fewer than 30 cables, and only on such small files, the cable information will be printed out while reading the file is read. On all files the costs of all cables will be summed as the file is read. You can use a variable that is defined and initalized as follows for this:
   int sum = 0;
After reading the file data your program will print the sum and the number of cables. For example, if the input file is

   0 2 45
   0 4 22
   1 3 88
   0 1 23
   1 4 12
   2 3 15
   2 4 33
   3 4 44

The sum of the costs will be 282 and the number of cables will be 8. Try running this    to see what your code should be doing on such a file. For net.1000.dat the number of cables is 499500 and the sum is 1405676853.

Entering data from the command line:

Use the following main prototype in your C++ code to input information from the command line:

  void main (int argc, char **argv) {
     ...
  }
the variable argc holds the number of strings on the command line, including the command itself. For example, the value of argc in command 1 2 3 is 4. The variable argv is an array of strings, each element of which points to a command line string token, starting with the command itself. For example, given command line

   find eat food.bananas

argv[0] is find, argv[1] is eat, and argv[2] is food.bananas.

To transform one of the argv strings to a number use atoi, atoi, or atof. For example:

   int main (int argc, char **argv) {
      ...
      char command[128];
      strcpy (command,argv[0]);
      int x = atoi(argv[1]);
      long y = atol(argv[2]);
      double z = atof(argv[3]); 
      ...
   }

in which case executing find 1 200000 3.4 results in

    command same as find
x with value 1
y with value 200000
z with value 3.4

For the homework problem use

   int main (int argc, char **argv) {
      ...
      fstream fin;
      fin.open (argv[1], ios::in);
      if (fin.fail()) {
         ...
      }
      ...
   }
to open a connection to the file that is to be read.

Requirements:

  1. Functional requirements:
    1. The program shall read a text file and display the results to the console.
    2. The input file shall be identified on the command line (that is, use int main(int argc, char **argv)).
    3. The input file shall contain printable ascii characters only plus newline characters for terminating a line.
    4. Each line of the input file shall contain three positive integers (as printable characters): the first two identify cities, the third one gives a cost required to connect those two cities. Each city has a unique identity.
    5. The data contained in the file will be processed and the results stored in an array called cables, as described above. The output will a sequence of numbers, each indicating the sum of costs involving each cable.

  2. Performance requirements:
    1. Time: Less than 5 seconds to read a 1000 city file such as net.1000.dat on a hard drive of 20 Mb/s throughput.
    2. Space: Less than 3MB of RAM.

  3. Implementation requirements:
    1. Define a class Cable as above. Create the array cables as above.
    2. Read file data more than once, if necessary, to avoid "hard wiring" the array size into the code. For example, do not use something like this:

         const int MAX_CABLES = 1000;
         ...
         Cable cables[MAX_CABLES];
      

      to make the cables array. Instead use something like this:

         ...
         fstream fin;
         fin.open(argv[1],ios::in);
         ...
         int ncables = findNumberCablesInFile(fin);
         Cable *cables = new Cable[ncables];
      

      where findNumberCablesInFile is a procedure you write which returns the number of lines in the file referenced by fin.