EECE-4029 Operating Systems Fall 2016
Lab 11

processes, mutex, semaphores, memory management, producer-consumer, files, deadlock, more..

File System: ext3 and ext4 disk layout
Due: Dec 2 (submit instructions: here)

Rationale:
    Explore the layout of a file system on disk. Consider access structures from the perspective of maximizing efficiency for large files.
 
Documentation:
    Documentation for ext3/4 disk layout is here. Assistance will come from the ext2 file system library, libext2fs, whose documentation is here.
 
Preparation:
  1. Install e2fsprogs, sleuthkit, e2fslibs, e2fslibs-dev, if they are not installed already. In Ubuntu linux you can do this (all packages are listed for your information even though, in both cases, installing one will install the others):
      sudo apt-get install e2fsprogs sleuthkit 
      sudo apt-get install e2fslibs e2fslibs-dev
    
    Utilities that will be used are: debugfs from e2fsprogs, and istat, ifind, fsstat, and others from sleuthkit. The packages e2fslibs-dev provide include/ext2fs/* and some libext2fs functions that will be useful in completing this lab.
  2. Get two old USB flash drives of any size. Format one to have a ext2 file system and the other to have a ext4 file system. This can be done as follows.

    Insert a drive into a USB port. Run dmesg and look for something like this:

     
      [1839.819221] sd 8:0:0:0: [sdb] No Caching mode page found 
      [1839.819230] sd 8:0:0:0: [sdb] Assuming drive cache: write through 
      [1839.819237] sd 8:0:0:0: [sdb] Attached SCSI removable disk 
    
    In this case notice [sdb] shows up in all the lines. That will be the name of the block device corresponding to the USB drive. Note that although sdb usually is the device name, others, for example sdc, show up on occasion so this step is necessary. Now format:
      sudo mkfs.ext2 /dev/sdb 
    
    You will be asked whether you want to format the entire drive. Answer yes. Example output:
     
      /dev/sdb is entire device, not just one partition!  
      Proceed anyway? (y,n) y 
      Filesystem label= OS type: Linux Block size=1024 (log=0) 
      Fragment size=1024 (log=0) Stride=0 blocks, Stripe width=0
      blocks 64000 inodes, 256000 blocks 12800 blocks (5.00%) 
      reserved for the super user First data block=1 
      Maximum filesystem blocks=67371008 32 block groups 8192
      blocks per group, 8192 fragments per group 2000 inodes per group 
      Superblock backups stored on blocks: 8193, 24577, 40961, 57345, 73729, 204801, 221185
    
       Allocating group tables: done
       Writing inode tables: done
       Writing superblocks and filesystem accounting information: done
    
    which yields some information without using any tools - for example, blocksize.
  3. Mount each drive separately, for example like this:
       sudo mkdir /f
       sudo mount /dev/sdb /f
    
    (the first of these commands is executed only one time). Copy several text files to each drive - some should be short, some long. For example:
       sudo chown -R franco.franco /f
       cp ~franco/*.txt /f
       cd /f
    
    where the first command is done so you do not have to be superuser to make all the following manipulations of the file system contents (use your own login name - franco is just an example). Delete some of the shorter ones and copy more longer ones (to attempt to fragment the file system).
       rm short.txt
       cp ~franco/Dir/long.txt .
    
    Create directories and add files to those directories. Create symbolic links.
       mkdir A
       cp ~franco/*txt A/
       ln -s afile.txt symlnk.txt
       ln -s A/anotherfile.txt anotherlnk.txt
    
 
Lab:
  1. The superblock contains information about the entire file system such as blocksize, file system type, inode count, block count and so on. To find this information use tune2fs as follows (assume sdb is the USB device name):
       sudo tune2fs -l /dev/sdb
    
    The first block number and the first inode number will be part of this information. Usually the first inode number is 11 and the first block number is 1.
  2. Information on each entity is located in an inode. Inodes are identified by number - these are not block numbers. To find the inode numbers of each entity do this, for example:
       [franco@franco Filesystem]$ ls -i /f
       60001 A/            12 challenges.txt    13 cites.txt    17 luks02.txt
        2001 B/            14 challnk.txt@      15 fixes.txt    18 notes.txt
          11 lost+found/   19 chalsublnk.txt@   16 hitset.txt
    
  3. Use istat to find information at an inode. For example:
       [franco@franco Filesystem]$ sudo istat /dev/sdb 17
       inode: 17
       Allocated
       Group: 0
       Generation Id: 3288690782
       uid / gid: 1000 / 1000
       mode: rrw-------
       size: 66971
       num of links: 1
    
       Inode Times:
       Accessed:       2013-08-09 07:15:52 (EDT)
       File Modified:  2013-08-09 07:15:52 (EDT)
       Inode Modified: 2013-08-09 07:23:04 (EDT)
    
       Direct Blocks:
       1043 1044 1045 1046 1047 1048 1049 1050
       1051 1052 1053 1054 1055 1056 1057 1058
       561 562 563 564 565 566 567 568
       569 570 571 572 573 574 575 576
       577 578 579 580 581 582 583 584
       585 586 587 588 589 590 591 592
       593 594 595 596 597 598 599 600
       601 602 603 604 605 606 607 608
       641 642
    
       Indirect Blocks:
       525
    
    The Direct Blocks section shows the numbers of the blocks that contain data for the file named luks02.txt. Notice the blocks are not contiguous, although groups of blocks are.
  4. Use debugfs to examine the file system entities in great detail. Start debugfs like this:
       [franco@franco Filesystem]$ sudo debugfs /dev/sdb
       debugfs:  
       debugfs 1.42.7 (21-Jan-2013)
       debugfs:
    
    To examine luks02.txt do this:
       debugfs: cat <17>
    
    To find out where inode 17 is do this:
       debugfs:  imap <17>
       Inode 17 is part of block group 0
       located at block 263, offset 0x0000
    
    To find out information about inode 17, assuming an ext2 file system, do this:
       debugfs:  stat <17>
       Inode: 17   Type: regular    Mode:  0600   Flags: 0x0
       Generation: 3288690782    Version: 0x00000001
       User:  1000   Group:  1000   Size: 66971
       File ACL: 0    Directory ACL: 0
       Links: 1   Blockcount: 134
       Fragment:  Address: 0    Number: 0    Size: 0
       ctime: 0x5204d118 -- Fri Aug  9 07:23:04 2013
       atime: 0x5204cf68 -- Fri Aug  9 07:15:52 2013
       mtime: 0x5204cf68 -- Fri Aug  9 07:15:52 2013
       BLOCKS:
          (0-11):1043-1054, (IND):525, (12-15):1055-1058, (16-63):561-608,
          (64-65):641-642
       TOTAL: 67
    
    But when you do this on an ext4 file system you get something like this:
       debugfs:  stat <17>
       Inode: 17   Type: regular    Mode:  0600   Flags: 0x80000
       Generation: 767914364    Version: 0x00000001
       User:  1000   Group:  1000   Size: 66971
       File ACL: 0    Directory ACL: 0
       Links: 1   Blockcount: 132
       Fragment:  Address: 0    Number: 0    Size: 0
       ctime: 0x5204df91 -- Fri Aug  9 08:24:49 2013
       atime: 0x5204df66 -- Fri Aug  9 08:24:06 2013
       mtime: 0x5204df66 -- Fri Aug  9 08:24:06 2013
       EXTENTS:
         (0-65):8473-8538
    
    this comparison is interesting because: 1) the blocks (called extents) are contiguous with ext4, but not with ext2; 2) the flags field is 0x80000 in the case of ext4 (indicates extents are used) but 0x0 in ext2; 3) there is an extra (indirect) block in ext2.

    To get a mapping between logical and physical blocks do this (ext4):

       debugfs:  ex <17>
       Level Entries       Logical        Physical Length Flags
        0/ 0   1/  1     0 -    65   8473 -   8538     66
    
    This cannot be done in ext2 because there are no extents there. Look at a block (ext2 example):
       debugfs:  bd 1043
       0000  0a0a 5468 6520 436f 6d70 6c65 7869 7479  ..The Complexity
       0020  206f 6620 5379 6d6d 6574 7279 2d42 7265   of Symmetry-Bre
       0040  616b 696e 6720 466f 726d 756c 6173 0a0a  aking Formulas..
       ...
    
    To see other commands that debugfs understands, use:
       debugfs: help
    
  5. Write a program that does the following:
    • Opens a file system for examination.
    • Reads a super block on an open file system and prints the following information: number of inodes, number of blocks, number of free inodes, number of free blocks, block size, cluster size, number of blocks in a group, mount time, write time, time of last check, feature incompatibility flag, first inode number.
    • Reads a specified inode, determines if it belongs to a directory, symbolic link, or a regular file, determines whether the inode uses extents, and prints the contents as follows (ext2) for a directory:
         Inode information for inode 2:
           links:5  blocks:2  blocks array size:15  mode:41ed
           A directory? yes
           uid:1000  size:1024  flags:0
           blocks:
             FF 01 00 00 00 00 00 00 00 00 00 00
             00 00 00 00 00 00 00 00 00 00 00 00
             00 00 00 00 00 00 00 00 00 00 00 00
             00 00 00 00 00 00 00 00 00 00 00 00
             00 00 00 00 00 00 00 00 00 00 00 00
      
             511 0 0 0 0 0 0 0 0 0 0 0 0 0 0
             1ff 0 0 0 0 0 0 0 0 0 0 0 0 0 0
      
      and for a regular file (ext2):
         Inode information for inode 17:
           links:1  blocks:134  blocks array size:15  mode:8180
           A directory? no
           uid:1000  size:66971  flags:0
           blocks:
             13 04 00 00 14 04 00 00 15 04 00 00
             16 04 00 00 17 04 00 00 18 04 00 00
             19 04 00 00 1A 04 00 00 1B 04 00 00
             1C 04 00 00 1D 04 00 00 1E 04 00 00
             0D 02 00 00 00 00 00 00 00 00 00 00
      
        dec: 1043* 1044* 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 525 0 0
        hex:  413   414   415  416  417  418  419  41a  41b  41c  41d  41e 20d 0 0
      
      and then the contents of the indirect block (in this case 525):
          1055 1056 1057 1058 561 562 563 564 565 566 567 568 569 570 571 572
          573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589
          590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606
          607 608 641 642
      
      In case of a directory, the program determines the block number containing the directory information and prints it like this (ext2 or ext4):
         Contents of directory block 511:
          inode:2 rec_len:12 file_len:1 type:Directory name:.
          inode:2 rec_len:12 file_len:2 type:Directory name:..
          inode:11 rec_len:20 file_len:10 type:Directory name:lost+found
          inode:12 rec_len:24 file_len:14 type:File name:challenges.txt
          inode:13 rec_len:20 file_len:9 type:File name:cites.txt
          inode:14 rec_len:36 file_len:11 type:Symbolic link name:challnk.txt
          inode:15 rec_len:20 file_len:9 type:File name:fixes.txt
          inode:16 rec_len:20 file_len:10 type:File name:hitset.txt
          inode:17 rec_len:20 file_len:10 type:File name:luks02.txt
          inode:18 rec_len:40 file_len:9 type:File name:notes.txt
          inode:60001 rec_len:12 file_len:1 type:Directory name:A
          inode:2001 rec_len:12 file_len:1 type:Directory name:B
          inode:19 rec_len:776 file_len:14 type:Symbolic link name:chalsublnk.txt
      
      In case of a file, it prints the contents of the file's first 10 blocks, each like this:
         Contents of block 1044:
             0 | 75 73 69 6e 67 20 74 65 63 68 6e 69 71 75 65 73 | using techniques
            10 | 20 6f 66 20 63 6f 6d 70 75 74 61 74 69 6f 6e 61 |  of computationa
            20 | 6c 20 67 72 6f 75 70 20 74 68 65 6f 72 79 2c 20 | l group theory,
            30 | 77 65 20 64 65 73 63 72 69 62 65 20 61 20 72 65 | we describe a re
            40 | 6f 72 64 65 72 69 6e 67 20 72 65 6c 61 74 69 76 | ordering relativ
            50 | 65 20 74 6f 20 77 68 69 63 68 20 77 65 20 63 6f | e to which we co
            ...
           3d0 | 68 6e 69 71 75 65 20 77 61 73 20 65 78 74 65 6e | hnique was exten
           3e0 | 64 65 64 20 61 6e 64 20 73 75 63 63 65 73 73 66 | ded and successf
           3f0 | 75 6c 6c 79 20 61 70 70 6c 69 65 64 20 74 6f 20 | ully applied to
      

      For ext4, a directory inode looks like this:

         Inode information for inode 2:
           links:5  blocks:2  blocks array size:15  mode:41ed
           A directory? yes
           uid:1000  size:1024  flags:80000 (extents!)
           blocks:
             0A F3 01 00 04 00 00 00 00 00 00 00
             00 00 00 00 01 00 00 00 04 11 00 00
             00 00 00 00 00 00 00 00 00 00 00 00
             00 00 00 00 00 00 00 00 00 00 00 00
             00 00 00 00 00 00 00 00 00 00 00 00
      
        dec: 127754 4 0 0 1 4356+ 0 0 0 0 0 0 0 0 0
        hex:  1f30a 4 0 0 1 1104  0 0 0 0 0 0 0 0 0
      
      and points to block 4356 which looks like this:
         Contents of directory block 4356:
           inode:2 rec_len:12 file_len:1 type:Directory name:.
           inode:2 rec_len:12 file_len:2 type:Directory name:..
           inode:11 rec_len:20 file_len:10 type:Directory name:lost+found
           inode:12 rec_len:24 file_len:14 type:File name:challenges.txt
           inode:65025 rec_len:12 file_len:1 type:Directory name:A
           inode:32513 rec_len:12 file_len:1 type:Directory name:B
           inode:13 rec_len:32 file_len:11 type:Symbolic link name:challnk.txt
           inode:15 rec_len:20 file_len:9 type:File name:fixes.txt
           inode:16 rec_len:20 file_len:10 type:File name:hitset.txt
           inode:17 rec_len:20 file_len:10 type:File name:luks02.txt
           inode:18 rec_len:20 file_len:9 type:File name:notes.txt
           inode:14 rec_len:820 file_len:14 type:Symbolic link
           name:chalsublnk.txt
      
      an inode for a regular file in ext4 looks like this:
         Inode information for inode 17:
           links:1  blocks:132  blocks array size:15  mode:8180
           A directory? no
           uid:1000  size:66971  flags:80000 (extents!)
           blocks:
             0A F3 01 00 04 00 00 00 00 00 00 00
             00 00 00 00 42 00 00 00 19 21 00 00
             00 00 00 00 00 00 00 00 00 00 00 00
             00 00 00 00 00 00 00 00 00 00 00 00
             00 00 00 00 00 00 00 00 00 00 00 00
      
        dec: 127754 4 0 0 66 8473# 0 0 0 0 0 0 0 0 0
        hex:  1f30a 4 0 0 42 2119  0 0 0 0 0 0 0 0 0
      
      which means 66 consecutive blocks starting at block 8473.
    • Investigate what happens if a file, say the one at inode 17, is deleted using rm luks02.txt (but use a file that really exists on your system). What information remains? might it be possible to undelete the file? For extra credit, add such a feature to your code - only consider the above case.
 
Assistance:
You will have to compare the id of the user that is mounting the filesystem with the id that is given on the command line. To get the user id you can use the following:
   #include <unistd.h>
   #include <sys/types.h>
   uid_t uid = getuid();
where uid_t is just an unsigned int.

The kernel's USB driver registers a flash drive that has been inserted and assigns a device through which communication is made possible. The device is logged in /var/log/messages and is usually /dev/sdb but may be /dev/sdc or /dev/sdd. The device may be opened in a user space program using

   errcode_t errcode;
   char *fsname = "/dev/sdb";
   struct struct_ext2_filsys filsys;
   ext2_filsys fs = &filsys;
   errcode = ext2fs_open(fsname, EXT2_FLAG_RW, 0, 0, unix_io_manager, &fs);
The function ext2fs_open is provided by libext2fs with prototype declared in #include <ext2fs/ext2fs.h>. Thus code that includes the above lines should be compiled like this:
   gcc user_code.c -lext2fs -o user_code
The above mentioned include file also defines types struct_ext2_filsys and ext2_filsys. To see the fields of the first of these visit structures.h. The type struct io_manager is declared in #include <ext2fs/ext2_io.h> and shown in structures.h. Successful execution of ext2fs_open sets the io_manager field of fs to point to the unix_io_manager which is defined in libext2fs. Since the unix_io_manager is already implemented with methods to open and close channels, read and write blocks and so on, you do not have to worry about implementing your own io_manager. If the return value of ext2fs_open is 0 then the connection to the device is successfully opened and fs fields are filled.

The super block for the file system may be obtained from fs and used to get super block information like this:

   struct ext2_super_block* super = fs->super;
   int icnt = super->s_inodes_count;
   int bcnt = super->s_block_count;
   int fst_inode = super->s_first_ino;  /* number of first inode */

To read the first inode, you can do this:

   ext2_ino_t ino = super->s_first_ino;
   struct ext2_inode inode;
   errcode = ext2fs_read_inode (fs, ino, &inode);
but generally any inode can be read the same way by setting the value of ino to the number of the desired inode. The type ext2_inode is shown in structures.h and includes all the meta data concerning an entity that an inode refers to, including the blocks containing data asssociated with that entity.

The locations of those blocks, however, are determined differently depending on whether direct/indirect block addressing or extents are used: if inode.i_flags & 0x80000 is zero it's the former, otherwise extents are used. In the case of direct/indirect addressing, the first 12 entries of the i_blocks array, up to the first that is non-zero in case any is, each contain the number of a block that holds entity data; the data blocks are logically (not necessarily physically) in the same order as they are indexed in the array. The 13th entry, if non-zero, is a reference to a block that contains references to the next blocks of the entity. The number of such blocks referenced is the block size (usually 1024) divided by 4 (the number of bytes representing a block number). The 14th entry, if non-zero, references blocks that reference other blocks and so on. Thus, the i_blocks array for a directory inode might look like this:

   FF 01 00 00 00 00 00 00 00 00 00 00
   00 00 00 00 00 00 00 00 00 00 00 00
   00 00 00 00 00 00 00 00 00 00 00 00
   00 00 00 00 00 00 00 00 00 00 00 00
   00 00 00 00 00 00 00 00 00 00 00 00
and references a single block, namely block 511 (0x01FF), which contains the directory data. The i_blocks array of the inode of a regular file might look like this:
   13 04 00 00 14 04 00 00 15 04 00 00
   16 04 00 00 17 04 00 00 18 04 00 00
   19 04 00 00 1A 04 00 00 1B 04 00 00
   1C 04 00 00 1D 04 00 00 1E 04 00 00
   0D 02 00 00 00 00 00 00 00 00 00 00
and contains non-zero values from 1043-1054* (direct blocks) for the first 12 array entries and the value 525 (indirect block) for the 13th entry (all others are 0).

In the case of extents, the i_blocks array is partitioned into 12 byte records. The first of such records is a header of type ext3_extent_header or ext4_extent_header which, for the purposes of this lab, are essentially the same. The ext4_extent_header struct is shown in structures.h, where __le means "little endian" as an unsigned integer (for ext3 it's __u). From this record one checks the magic number (0xF30A), the number of extent or index records that follow, and whether this particular node is a leaf in the extents tree. The records that follow could be index or extents records. Both structures are shown in structures.h. In case they are extent records, the start block and the run are given in ee_start_lo and ee_len, respectively. Thus, the i_blocks array of a directory inode might look like this:

   0A F3 01 00 04 00 00 00 00 00 00 00
   00 00 00 00 01 00 00 00 04 11 00 00
   00 00 00 00 00 00 00 00 00 00 00 00
   00 00 00 00 00 00 00 00 00 00 00 00
   00 00 00 00 00 00 00 00 00 00 00 00
where, according to the second record, there is a run of 1 block starting at block 4356 (0x1104) and, according to the header, the second record is the only valid one and this inode is a leaf (hence the second record is of type ext4_extent). The i_blocks array of a regular file's inode might look like this:
   0A F3 01 00 04 00 00 00 00 00 00 00
   00 00 00 00 42 00 00 00 19 21 00 00
   00 00 00 00 00 00 00 00 00 00 00 00
   00 00 00 00 00 00 00 00 00 00 00 00
   00 00 00 00 00 00 00 00 00 00 00 00
which is different from above in the second record: now there is a run of 66 starting at block 8473.

If a block is a directory block it has a structured interpretation involving variable length records. A sketch is presented here and the student is referred to the ext4 file layout wiki for details. Structures used are ext4_dir_entry or ext4_dir_entry_2, with the latter as the default. These are shown in structures.h. Thus, consider the raw contents of the following directory block:

   02 00 00 00 0c 00 01 02 2e 00 00 00 02 00 00 00
   0c 00 02 02 2e 2e 00 00 0b 00 00 00 14 00 0a 02 
   6c 6f 73 74 2b 66 6f 75 6e 64 00 00 0c 00 00 00
   18 00 0e 01 63 68 61 6c 6c 65 6e 67 65 73 2e 74
   78 74 00 00 01 fe 00 00 0c 00 01 02 41 69 74 65
   01 7f 00 00 0c 00 01 02 42 00 00 00 0d 00 00 00
   20 00 0b 07 63 68 61 6c 6c 6e 6b 2e 74 78 74 65
   62 75 67 6c 6f 67 2e 74 78 74 00 00 0f 00 00 00
   14 00 09 01 66 69 78 65 73 2e 74 78 74 00 00 00
   10 00 00 00 14 00 0a 01 68 69 74 73 65 74 2e 74
   78 74 00 00 11 00 00 00 14 00 0a 01 6c 75 6b 73
   30 32 2e 74 78 74 00 00 12 00 00 00 14 00 09 01
   6e 6f 74 65 73 2e 74 78 74 00 00 00 0e 00 00 00 
   18 00 0e 07 63 68 61 6c 73 75 62 6c 6e 6b 2e 74 
   78 74 00 00 13 00 00 00 1c 03 06 01 74 65 73 74 
   65 72 00 00 00 00 00 00 00 00 00 00 00 00 00 00
   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
   ...
where all remaining values in the block are 0. Colors are used to distinguished the records. The first blue record is 12 bytes long from the 0c. It belongs to inode 2 from the first 02. The length of the name is 1 from the 01 and file type is directory from the 02 following the 01. The name is . from the single 2e that represents that character. The first red record is 12 bytes long corresponding to the directory named .. (note two trailing 00s where the first record had three - the name length is 2 but the record length is still 12). The second blue record corresponds to inode 12 (0c), is 24 bytes long (18), has a name that is 14 bytes long (0e), is a regular file (01), and has the name challenges.txt. The third blue record is a symbolic link (07) named challnk.txt with inode 14 (0d). The record is 32 bytes long which is a lot longer than the necessary 20 bytes because it uses some space from a record that had been deleted. Part of the former record remains as the partial name ebug.txt right after the end of challnk.txt.

To read a block into a buffer do this:

   errcode = io_channel_read_blk(fs->io, blocknr, 1, buffer);
where fs and errcode are as above, buffer is an unsigned char array that is big enough to hold all the bytes of a block, and blocknr is of type blk_t whose value will be a block number. The 1 in the argument list of this function means one block is to be read into buffer. Once in the buffer, the bytes of the block may be printed to console using some format that makes the output content clear to the user (see, for example, above blocks 525, 4356 in addition to other formats shown).

* 0x0413 is 1043 decimal, 0x0414 is 1044 decimal and so on
+ 0x1104 is 4356 decimal
# 0x2119 is 8473 decimal

Note: here is fs_start.c
Note: here is lab11_help.c
Note: here is a sample filesystem
Note: mount using sudo mount lab11.fs /mnt
Note: hints a.c
Note: hints structures.h
Note: hints b.c