EECE-4029 Operating Systems Fall 2016
Lab 9

processes, mutex, semaphores, memory management, producer-consumer, files, deadlock, more..

Memory Management: TLB and Page Replacement

Due: Nov 14 (submit instructions: here)

Rationale:
    To get some impression of how a memory management unit works and is implemented in a demand-paged system. This is accomplished via a simulation involving a translation lookaside buffer, a number of running processes, a number of pages that each process needs to access, and a page replacement algorithm.
 
Setup:
The action of a memory management unit on up to 10 running processes over up to 1000000 memory accesses, conducted sequentially, may be simulated by a package that is compiled as a shared library and is called libpager.so (64 bit version only). Source and Makefile for this library is here. If you have a 32 bit OS then compile the library using these sources (and Makefile). The README.md file may help you to understand how to use the library and how it works.

The simulation is controlled by two files: a parameter file and a file that contains a list of memory accesses. An example parameter file is params.txt with the following contents (parameter, value, explanation):

  numFrames          1024  // Number of Frames (4 MB Main Mem)
  TLBEntries         1024  // Number of TLB entries
  MMtime               60  // Read/write to Main Mem(nsec)
  TLBtime               2  // Read/write a TLB entry(nsec)
  DISKtime             20  // Read/write from/to disk(msec)
  pageReplAlgo          0  // 0 - FIFO; 1 - LRU; 2 - MFU
  cacheReplAlgo         2  // 0 - FIFO; 1 - LRU; 2 - MFU
  pageTablePageReplAlgo 1  // 0 - FIFO; 1 - LRU; 2 - MFU
  pageTableType         1  // 0 - single; 1 - double; 2 - inverted
  WSW                 100  // Working Set Window, number of instructions
  initWS               10  // initial working set
  minPageFault          2  // minimum number of page faults within the WSW
  maxPageFault         40  // maximum number of page faults within the WSW
  numPageTablePages     1  // Number of page table pages for multilevel
  singleLevelPercentage 1  // pct between 0 and 100 of index into
                           // page table to multiply MMtime by (quality
			   // factor) for single level
  collisionPercentage   1  // pct between 0 and 100 of load factor
                           // for inverted (i.e. 50 for hashed inverted,
			   // 100 for linear inverted) in a bucket for
			   // inverted
  modNum               50  // Number of buckets in the inverted page table

An example file that contains a sequence of memory accesses is trace.out. Each line of the file contains three objects. The first object, an integer, is the identity of the process. The second object, a character, is either R, W, or I for read data, write data, instruction fetch. The third object, an unsigned integer written in hexadecimal notation, is the virtual address being accessed. An example snipet from the file looks like this:

  ...
  1 W 0x9c3800
  1 R 0x9c3c00
  1 R 0x9c4000
  2 R 0x400
  2 R 0x800
  ...
According to the data of this file, processes are scheduled in round-robin fashion with a time-slice of 10000 accesses per process. Thus, process 1 makes 10000 accesses, then process 2, then 3, 4, then process 0 and the cycle repeats.

Actions are R for read, W for write, and I for intruction fetch. Assume all actions involve 4 bytes and that is the width of the memory bus. For this lab, I is treated as R but in future labs (years from now) there will be a separate instruction cache for instruction fetches.

To complete a simulation the module student_func.c, which has companion include files student_func.h, and structs.h, is compiled and linked to the library. This is done like this:

 gcc student_func.c -L. -lpager -o pager 
where it is assumed the OS is 64 bits, and student_func.h, student_func.c and libpager.so are all in the same directory and gcc is executed in that directory. The script run starts the simulation by setting LD_LIBRARY_PATH to the current directory and invoking pager. Two arguments can be given to run: namely, the name of the memory access file and the name of the parameter file. If no parameter file is specified, default values are used. If no memory access file is specified the simulator looks for a file named trace.out in the current directory and uses it for input if it exists, and otherwise aborts. So, run can be invoked four ways (suppose the paramter file is named params.txt and the memory access file is named input.txt):
  #> run -> default params, trace.out input
  #> run file=input.txt -> default params, input.txt input
  #> run paramFile=params.txt -> params.txt params, trace.out input
  #> run paramFile=params.txt file=input.txt

The file student_func.c contains stubs that should be implemented by you. These define eviction policies for the TLB, frames, and for the page tables. FIFO is currently implemented, you are to implement LRU and MFU. Use the FIFO code as a template.

The simulator assumes page size is the same as block size (in disk) and frame size and is 4096 bytes. The page table can be implemented inverted, 2-level, or single level (see the parameter file above) but the memory for the table(s) comes from main memory and is subject to being swapped out.

Output of the simulator looks like this:

  [franco@franco Lab9]$ run
  Can't open parameter file: (null)
  Loading default parameter file: params.txt
  Max number of TLB entries set to 1024.
  Time needed to read/write a main memorypage set to 60 (nsec).
  Time needed to read/write a page table entry in the TLB set to 2 (nsec).
  Time needed to read/write a page to/from disk set to 20 (msec).
  Page Replacement set to 0.
  Cache Replacement set to 0.
  Page Table Page Replacement set to 0.
  Page Table type set to 1.
  Working Set Window set to 100.
  Frame number set to 1024.
  Initial Working Set size set to 10.
  Minimum number of page faults in WSW set to 2.
  Maximum number of page faults is WSW set to 40.
  Number of Page Table Pages set to 1.
  Percentage of index to be added to time set to 1%(0.01).
  Percentage of load factor to add to time set to 1%(0.01).
  Trying default input file trace.out...success validating...success

  0%|-----------------------__-----------------------|100%
    ##################################################
  Average Access Time:  417.934387 ns
  Total Access Time:  0.417935 s
  Total Page Faults: 24128
  File closed.

  Cleaned up!

Documentation for the simulator is here

The simulation proceeds according to the flowchart below

The box labeled "Page fault" expands to the following flowchart:

The working set is updated as shown in the following flowchart:

 
Lab:
  1. Implement the LRU and MFU stubs in student_func.c
  2. Run the simulator on trace.out with various settings for the following parameters: pageReplAlgo, cacheReplAlgo, pageTablePageReplAlgo, pageTableType, WSW, minPageFault, maxPageFault. Find a combination that results in a "best" total access time. Turn in those settings along with your implemented student_func.c.

Assistance:
The following happens on a page fault (the page frame is not in memory):

  • an exception is raised - the OS handles it
  • the state of the process causing it is saved
  • it is determined that the exception was a page fault
  • it is determined that the address reference is valid
  • the location of the page on disk is found
  • if there is not a free page in memory
    • a "victim" page frame is found and written to disk (if it is modified)
    • the free-frame table is updated
  • the page is read from disk - this requires a wait to complete a seek plus some latency
  • the process that raised the exception is restarted because the CPU dropped it to act on another
  • the state of current I/O process is saved
  • the page table is updated
  • the state of process causing the fault is restored
  • the process is resumed
If a page must be swapped out to make room for a new frame, a victim frame must be chosen and swapped out. The following are strategies for choosing a victim:

  • FIFO: remove the frame that has been resident the longest. This is subject to an anomaly: increased frames may cause more faults!
  • LRU: remove a frame that has not been used for the longest period. LRU may be implemented using a stack: when a frame is accessed it is pulled from the stack and placed on top, the victim frame is the one on the bottom. But this is hard to implement efficiently so approximations are used. Examples: 1) use a bit in the page table entry to indicate recent use, use a timer to clear the bit periodically; 2) All pages are in a circular queue with a cursor pointing to a page that is considered for replacement - if the cursor points to a page with reference bit (PTE) set then the bit is reset (giving it a second chance) and the cursor is set to point to the next page in the queue and the test is made again - if all reference bits are set the one the cursor pointed to originally is the victim, otherwise the page with the first 0 in the reference bit is the victim.
  • MFU: remove a frame that has been used most frequently over a recent timespan. Requires keeping a reference count over that period. Approximations similar to LRU may be suitable. Motivation for this strategy is that the high count means the process probably was just brought in so is not yet entangled with needed resources. If so, removing such a page will probably cause fewer problems in the near term. MFU works well if there are a small number of pages that are referenced very frequently, and a large number of pages that are referenced infrequently - in case of the latter, associated processes can be held in cache for quick starts.
The following steps are taken on a virtual memory access:

  • The hardware looks at the TLB to see if the corresponding PTE is there
  • If it is, the PTE is fetched
  • If it is not, the OS checks the page table for the PTE
  • If it is there, the PTE is fetched
  • If is is not there, the page containing it is swapped in and the PTE is fetched
  • The PTE is checked to see whether the page is already loaded in main memory. The next step is point 2. of the above mentioned simulation.
To prevent thrashing, working sets of pages are maintained for each process:

  • define working-set-window (WSW) as some number of instructions and define a working-set (WS) as all pages accessed within the last WSW.
  • Require the sum of all WS sizes is no greater than number of pages available (otherwise thrashing results). In a real OS one or more jobs will be suspended to keep the WS sum less than pages available.
  • A crude way to keep up with changes is to look at a few bits that are updated at regular intervals and remove pages from the working set whose bits have fallen off the table
  • Decide on minimum and maximum allowable page fault rates per process. If a rate falls below the minimum, remove allocated pages from the process. If the rate rises above the maximum, add free pages to the working set.