References are to chapter 12 of Silberschatz "Mass Storage Systems"
-------------------------------------------------------------------
File system has three parts:
----------------------------
  - Programmer interface to the file system
  - Internal data structures and algorithms to implement the 
    interface
  - The actual hardware or secondary storage structures


Outline:
--------
  - Physical structure of mass storage devices
  - Disk scheduling algorithms for performance improvement
  - Disk formatting
  - Management of Boot blocks
  - Management of damaged blocks
  - Management of swap space
  - Disk reliability (RAID)
  - Maintaining stable storage
  - Tertiary storage


Magnetic Tape
-------------
  - Was early secondary-storage medium
  - Relatively permanent and holds large quantities of data
  - But access time is slow: about 1000 times slower than disk
  - Mainly used for backup, storage of infrequently used data
  - Density is 29.5 billion bits per square inch - 
    40x better than disk
  - Cartridges capable of 35 TB capacity to be developed
    http://www.technologyreview.com/news/417218/
           new-life-for-magnetic-tape/
  - Cost: < 1 penny per GB
  - Kept in spool and wound or rewound past read-write head
  - Once data is under the head, transfer rates are comparable 
    to disk
  - 10s of seconds to get the head in position for a read or write
  - You do not want to do random access directly with mag tape
    (schedule a run to pick up blocks, cache the read data)
  - Big plus: cartridges can be stored and loaded onto reader 
    as needed
    EB size archives can be managed inexpensively with mag tape

      KiloByte   (KB)  2^10    1,024 B
      MegaByte   (MB)  2^20    1,024 KB
      GigaByte   (GB)  2^30    1,024 MB
      TeraByte   (TB)  2^40    1,024 GB
      PetaByte   (PB)  2^50    1,024 TB
      ExaByte    (EB)  2^60    1,024 PB
      ZettaByte  (ZB)  2^70    1,024 EB
      YottaByte  (YB)  2^80    1,024 ZB
      Brontobyte (BB)  2^90    1,024 YB
      Geopbyte   (Ge?B)


Magnetic Disk  (Page 6)
-------------
  - Bulk of R/W secondary-storage medium in today's general 
    purpose devices
  - Rotate at 7200rpm typically
  - Data organized in tracks
  - Seek time (head moves to track): 3ms (high end), 15ms (mobile)
  - Rotation latency (data under head) - 4ms @ 7200 rpm
  - Data transfer rate - 1000 Mbits/s
  - Cost: 8 cents per GB
  - Big minus: platters are fixed hence reader cost is added 
    to storage cost
  - Another minus: vulnerable to head crash (head scrapes platter)
  - Another minus: due to high seek/latency, fragmentation is
    something to worry about
  - May be affected by magnetic fields - although they are 
    typically shielded


Solid State Drive (Demo)
-----------------
  - Big plus: no moving parts
    hence: quieter, less vulnerable to shock better access/latency
    but failures do happen and are usually completely catastrophic
  - Data is constantly being rearranged (unlike disk) so as to 
    even out the rate of erasures/rewrites over the entire medium
  - Unfortunately, most file systems have frequent rewrites in 
    small sections such as directory entries
  - Seek time (time to access) - .1ms
  - Data transfer rate - 5000 Mbits/s
  - Cost: about 60 cents per GB
  - Developed with disk-like interfaces as a drop in replacement
  - Not affected by magnetic fields
  - Much less power consumption than disk
  - coming on strong
  - favorite of young people


Organization of Magnetic Disk (Page 6)
-----------------------------
  - Several platters - two heads per platter
  - Track - circle of data on a platter - 2 tracks per platter
    location specified by a radius
  - Cyclinder - collection of tracks at same radius - 
    may be 1000s of these
  - Sector - portion of a track, 512 bytes in size - 
    100s of these/track
  - Addressing the sectors: sector/track/cylinder
      sec track cylinder
      -------------------
       0    0      0
       1    0      0
       ...
       N    0      0
       0    1      0
       ...
       N    1      0
       ...
       N    M      0
       0    0      1
       ...


Access to Disk (hdparm)
--------------
  - Through IO channel using some protocol
    * Integrated Drive Electronics (IDE) - parallel bus, 
      40/50 pin connector
      Devices are daisy chained - one is a master and the 
      other a slave
      Max speed: 133 MB/s 
      Several upgrades to Ultra-ATA/133
         16 bit data width 
         Uses PIO Modes 0, 1, 2, 3, 4
              Multiword DMA modes 0, 1, 2
              Ultra DMA modes 0, 1, 2, 3, 4, 5, 6
         
      where 
        PIO = Programmed input/output 
          CPU uses instructions that access the I/O address space 
          directly to perform data transfers to or from an I/O 
          device

                     Max Transfer  Time between transactions
            Mode 0     3.3 MB/s         600 ns
            Mode 1     5.2 MB/s         383 ns
            Mode 2     8.3 MB/s         240 ns
            Mode 3    11.1 MB/s         180 ns
            Mode 4    16.7 MB/s         120 ns
            Mode 5      20 MB/s         100 ns
            Mode 6      25 MB/s          80 ns

        Multiword DMA
          Direct Memory Access, all words are transfered before
          interrupt is raised for CPU to process

                     Max Transfer  Time between transactions
            Mode 0     4.2 MB/s         480 ns
            Mode 1    13.3 MB/s         150 ns
            Mode 2    16.7 MB/s         120 ns
            Mode 3      20 MB/s         100 ns
            Mode 4      25 MB/s          80 ns

        Ultra DMA
                     Max Transfer
            Mode 0    16.7 MB/s 
            Mode 1      25 MB/s
            Mode 2    33.3 MB/s
            Mode 3    44.4 MB/s
            Mode 4    66.7 MB/s
            Mode 5     100 MB/s
            Mode 6     133 MB/s
            Mode 7     167 MB/s

    * Serial Advanced Technology Attachment (SATA) - 
      serial bus, 4 pins
         Max transfer up to 600 MB/s

    * Small Computer System Interface (SCSI) - Parallel
        intelligent, buffered, peer-to-peer interface
        up to 16 devices can be attached
        NCR developed the first SCSI chip

                              Width  Max Transfer
            SCSI-1    (1986)    8      5 MB/s
            Fast SCSI (1994)    8     10 MB/s
            Fast-Wide SCSI     16     20 MB/s
            Ultra SCSI          8     20 MB/s
            Ultra Wide SCSI    16     40 MB/s
            Ultra2 SCSI         8     40 MB/s
            Ultra2 Wide SCSI   16     80 MB/s
            Ultra3 SCSI        16    160 MB/s
            Ultra-320 SCSI     16    320 MB/s
            Ultra-640 SCSI     16    640 MB/s

    * Fiber Channel (FC) - 
        Serial technology
        FCP - fiber channel protocol - serial SCSI (SCSI commands)
        For use in Storage Area Networks

            1GFC (1997)   200 MB/s
            2GFC (2001)   400 MB/s
            4GFC (2004)   800 MB/s
            8GFC (2005)  1600 MB/s
           10GFC (2008)  2550 MB/s
           16GFC (2011)  3200 MB/s
           32GFC (2014)  6400 MB/s

  - Via distributed file system (Network Attached Storage, Page 8)
    * Computers on a LAN share a pool of storage as though it 
      is local
    * But less efficient and has worse performance than 
      local storage options
    * Clients access network-attached storage via a RPCs over 
      NFS or CIFS
    * Usually implemented as a RAID array
    * ISCSI is the latest network-attached storage protocol
        uses the IP network protocol to carry the SCSI protocol
        allows hosts to treat storage as if it were directly 
        attached (Page 8)

  - Storage Area Network (SAN) - super version of NAS (Page 9)
    * Private network connecting servers and storage units
    * Multiple hosts and multiple storage arrays can attach to 
      the same SAN
    * Uses storage protocols rather than networking protocols
    * Separates communication between server and clients from 
      communication between servers and storage devices
      (otherwise these compete in a NAS organization)
    * Storage can be dynamically allocated to hosts - a host 
      running low on space might get a higher allocation
    * Storage access from hosts can be prohibited
    * Clusters of servers can share the same storage arrays
    * Fiber Channel is commonly used for the connection

  - InfiniBand (Serial)
    * Point-to-point bidirectional serial links intended for the 
      connection of processors with high-speed peripherals such 
      as disks
    * Switched fabric communications link
    * Used in high-performance computing and 
      enterprise data centers
    * Defines a connection between processor nodes and high
      performance storage devices (and other I/O devices)

           SDR     DDR     QDR     FDR-10        FDR        EDR
      1X  2 Gb/s  4 Gb/s  8 Gb/s 10.3125 Gb/s  13.64 Gb/s  25 Gb/s
      4X  8 Gb/s 16 Gb/s 32 Gb/s   41.25 Gb/s  54.54 Gb/s 100 Gb/s
     12X 24 Gb/s 48 Gb/s 96 Gb/s  123.75 Gb/s 163.64 Gb/s 300 Gb/s
 
     where SDR = Serial data rate
           DDR = Double data rate
           QDR = Quad data rate
           FDR = Fourteen data rate
           EDR = Enhanced data rate


Disk Scheduling
---------------
  Goal of OS/controller: minimize service time 
     (quickest response time) by using hardware most efficiently

  Time factors in reading (or writing) a disk sector are:
  - Seek time: time for heads to move to the cylinder 
    containing the desired sector
  - Rotational latency: time waiting for disk to rotate the 
    desired sector
  - Transfer time: time to read data and move it to the system
  - Bandwidth: total bytes transferred/(time completed-time started)

  Cannot do much about transfer and latency time 
     since these are hardware related
  CAN try to schedule reads/writes to minimize seek time and 
     maximize bandwidth
  - Do this by minimizing seek distance
  - If system/disk is reasonably busy, can have significant impact
  - Scheduling can be done by system, disk controller, or both

  Disk servicing algorithms:
    Based on a request queue 
    - requests enter a disk controller
    - request includes:
      * Whether this operation is input or output
      * What the disk address for the transfer is
      * What the memory address for the transfer is
      * What the number of sectors to be transferred is
    - there may be many enqueued requests in a multi-programmed 
      system
    - the disk controller needs to select one when it is ready 
      to perform I/O

    Evaluation criteria:
    - response time - time it takes before transfer begins
    - fairness - a totally fair system ensures that the mean 
      response time of the disk requests is the same for all 
      processes
    - complexity - 

    First Come First Serve (FCFS):
      Fair :-)
      Complexity :-)
      Response time :-< (not just bad - really bad)
      Example:
         Cylinder requests (rotation will not affect ave results)
         Entered as shown: (assume disk seeks begin at cylinder 53)
           98,  183,   37,  122,   14,   124,  65,  67

         Cylinder traversals:
       45 85 146 85 108 110 59 2 = 640 cyl traversed

    Shortest Seek Time First (SSTF):
      service all the requests close to the current head position
      before moving the head far away to service other requests.

      Fair :-(  - starvation is possible! 
                  (stream of requests near each other)
      Complexity :-|
      Response Time: :-)  but not optimal
      Example:

         Cylinder requests (rotation will not affect ave results)
         Entered as shown: (assume disk seeks begin at cylinder 53)
           98,  183,   37,  122,   14,   124,   65,   67

         Cyclinder visits:
           65, 67, 37, 14, 98, 122, 124, 183

         Cylinder traversals:
        12  2  30  19  84  24  2  59 = 232 cyl traversed

      But!!! Not optimal - 
        37-14-65-67-98-122-124-183 -> 208 cyl traversed

    SCAN (the elevator algorithm)
      Disk arm starts at one end of the platter, moves toward the
      other end taking care of requests as they match current
      cyclinder position - reverses when it gets to the end
      Fair: :-|     service time can be uneven - no starvation
      Complexity: :-)
      Response time: :-|

         Cylinder requests (rotation will not affect ave results)
         Entered as shown: (assume disk seeks begin at cylinder 53)
           98, 183, 37, 122, 14, 124, 65, 67

         Suppose the heads are heading for 0.  
         Then requests are serviced as follows:

           53 37 14   65 67 98 122 124 183  -> 183 + 53 
           ^        ^                        = 236 cyl traversed
           |        |
         starts   hits 0

         Suppose the heads are heading for a max of 200.  
         Then requests are serviced as follows:
         
           53 65 67 98 122 124 183   37 14  -> 147 + 186
           ^                       ^         = 333 cyl traversed 
           |                       |
         starts                 hits 200

    Circular SCAN (C-SCAN)
      Scans from one end to the other, but when it reaches the 
      end it pops back to the beginning and starts over 
      - works because there are few requests left that are close
        to the end of the scan

      Fairness :-)
      Complexity :-)
      Response time :-|

         Cylinder requests (rotation will not affect ave results)
         Entered as shown: (assume disk seeks begin at cylinder 53)
           98, 183, 37, 122, 14, 124, 65, 67

         Suppose the heads are heading for 0.  
         Then requests are serviced as follows:

           53 37 14   183 124 122 98 67 65  -> 400 - 12 
           ^        ^                        = 388 cyl traversed
           |        |
         starts   hits 0

         Suppose the heads are heading for 200.  
         Then requests are serviced as follows:

           53 65 67 98 122 124 183   14  -> 400 - 19
           ^                       ^      = 381 cyl traversed
           |                       |
         starts                hits 200

    LOOK
      Like scan but arm reverses on the last request in the 
      current direction

         Cylinder requests (rotation will not affect ave results)
         Entered as shown: (assume disk seeks begin at cylinder 53)
           98, 183, 37, 122, 14, 124, 65, 67

         Suppose the heads are heading for 0.  
         Then requests are serviced as follows:

           53 37 14 65 67 98 122 124 183  -> 169 + 39
           ^      ^                        = 208 cyl traversed
           |      |
         starts reverses

         Suppose the heads are heading for 200.  
         Then requests are serviced as follows:

           53 65 67 98 122 124 183 14  -> 169 + 130 
           ^                    ^       = 299 cyl traversed
           |                    |
         starts              reverses

   Circular LOOK (C-LOOK)
      Circular version of LOOK
	 
         Cylinder requests (rotation will not affect ave results)
         Entered as shown: (assume disk seeks begin at cylinder 53)
           98,  183,   37,  122,   14,   124,   65,   67

         Suppose the heads are heading for 0.  
         Then requests are serviced as follows:

           53 37 14 183 124 122 98 67 65  -> 39+172+135 
           ^      ^                        = 346 cyl traversed
           |      |
         starts  rescans

         Suppose the heads are heading for 200.  
         Then requests are serviced as follows:

           53 65 67 98 122 124 183 14  -> 130+172 
           ^                    ^       = 302 cyl traversed
           |                    |
         starts              rescans