EECE-4029 Operating Systems Fall 2016
Lab 7

processes, mutex, semaphores, memory management, producer-consumer, files, deadlock, more..

Network Device Driver

Due: Oct 24 (submit instructions: here)

Rationale:
    There are three device categories: character, block, and network. Drivers for each class have different requirements. We have worked a great deal in class on character devices. This is an opportunity to find out about network devices. I consider writing a device driver to be an important part of this class but in order for this lab to be done by all students, the target (in this case pseudo) device must be common to all computers. The assigned driver will work on any computer.
 
Lab:
Write a network device driver that has the following properties:
  • Two network interfaces named os0 and os1 are created.
  • These interfaces can be brought up and shut down by ifconfig in the usual way.
  • Addresses will be hard wired: 192.168.0.1 for os0 and 192.168.1.1 for os1.
  • Transmission of packets from one to the other will be allowed and observable using tcpdump or ethereal (a.k.a. wireshark).
The driver will be loaded in the usual manner:
   sudo insmod netdriver.ko
The interfaces will be brought up in the usual manner:
   sudo ifconfig os0 192.168.0.1
   sudo ifconfig os1 192.168.1.1
Invoking ifconfig should give something like:
   os0: flags=4291<UP,BROADCAST,RUNNING,NOARP,MULTICAST>  mtu 1500
        inet 192.168.0.1  netmask 255.255.255.0  broadcast 192.168.0.255
        inet6 fe80::253:4eff:fe55:4c30  prefixlen 64  scopeid 0x20
        ether 00:01:02:03:04:05  txqueuelen 1000  (Ethernet)
        RX packets 13  bytes 2827 (2.7 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 19  bytes 3823 (3.7 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

   os1: flags=4291<UP,BROADCAST,RUNNING,NOARP,MULTICAST>  mtu 1500
        inet 192.168.1.1  netmask 255.255.255.0  broadcast 192.168.1.255
        inet6 fe80::253:4eff:fe55:4c31  prefixlen 64  scopeid 0x20
        ether 00:01:02:03:04:06  txqueuelen 1000  (Ethernet)
        RX packets 19  bytes 3823 (3.7 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 13  bytes 2827 (2.7 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
where 00:01:02:03:04:05 and 00:01:02:03:04:06 are MAC addresses that are made up. It would be great to build a driver for a specific card but the details are lengthy and specific to hardware so such a lab is impractical for the entire class.

Add the following to /etc/hosts:

   192.168.0.1     butter-near
   192.168.0.2     butter-far
   192.168.1.1     jelly-near
   192.168.1.2     jelly-far
and add the following to /etc/networks:
   peanut 192.168.0.0
   grape 192.168.1.0
The result is two pseudo networks are created (third octet differs).

Invoking route should give something like:

   Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
   default         10.52.240.1     0.0.0.0         UG    0      0        0 eth1
   10.52.240.0     *               255.255.240.0   U     0      0        0 eth1
   peanut          *               255.255.255.0   U     0      0        0 os0
   grape           *               255.255.255.0   U     0      0        0 os1
Test the driver like this:
   prompt> ping 192.168.0.1
   PING 192.168.0.1 (192.168.0.1) 56(84) bytes of data.
   64 bytes from 192.168.0.1: icmp_req=1 ttl=64 time=0.042 ms
   64 bytes from 192.168.0.1: icmp_req=2 ttl=64 time=0.039 ms
   ^C
   --- 192.168.0.1 ping statistics ---
   2 packets transmitted, 2 received, 0% packet loss, time 999ms
   rtt min/avg/max/mdev = 0.039/0.040/0.042/0.006 ms

   prompt> ssh 192.168.1.1
   Last login: Fri Jan 25 05:49:49 2013

   You have no mail.
   --------------------------------------------------------------------------
    07:36:59 up  1:47,  8 users,  load average: 0.37, 0.22, 0.17
   --------------------------------------------------------------------------
   Note: if you get something like this:
   ssh: connect to host 192.168.1.1 port 22: Connection refused
then you need to start the sshd daemon like this:
   sudo sshd
or this
   sudo service sshd start
  

Then open two shell windows. In one do this:

   sudo tcpdump -i os0
In the other do this:
   prompt> ping butter-far
   PING butter-far (192.168.0.2) 56(84) bytes of data.
and observe the following in the other window:
   07:42:19.403534 IP butter-near > butter-far: ICMP echo request, id 9599, seq 1, length 64
   07:42:20.402734 IP butter-near > butter-far: ICMP echo request, id 9599, seq 2, length 64
 
Assistance:
Documentation: netdocs.pdf
Data Structures: The fundamental data structures in networks are net_device, and net_device_stats, which are defined in linux/netdevice.h, and struct sk_buff, which is defined in linux/skbuff.h and considered in more detail when skbuff operations are described below. Important fields of struct net_device are shown here:
   struct net_device {
      char *name;
      unsigned long base_addr;
      unsigned char addr_len;
      unsigned char dev_addr[MAX_ADDR_LEN];
      unsigned char broadcast[MAX_ADDR_LEN];
      unsigned short hard_header_len;
      unsigned char irq;
      const struct net_device_ops *netdev_ops;
      const struct header_ops *header_ops;
      ...
   };
which depends on this (important fields only):
   struct net_device_ops {
      int (*ndo_open)(struct net_device *dev);
      int (*ndo_stop)(struct net_device *dev);
      netdev_tx_t (*ndo_start_xmit) (struct sk_buff *skb, struct net_device *dev);
      struct net_device_stats* (*ndo_get_stats)(struct net_device *dev);
      ...
   };
	
   struct header_ops {
      int(*create) (struct sk_buff *skb, struct net_device *dev,
                    unsigned short type, const void *daddr,
                    const void *saddr, unsigned int len);
      ...
   }
Notice in struct net_device the MAC address field dev_addr and broadcast field broadcast. The irq field stores the assigned interrupt line. The ndo_open, ndo_stop, and ndo_start_xmit fields of struct net_device_ops point to the most basic functions that are used to control the NIC. The priv field in struct net_device holds data that is used locally by the driver as it sees fit. There is no field for a receive function because reception is handled via interrupt. The purpose of create in struct header_ops is to create a header.

There is room in a net_device object for private data but access is restricted to be through the function netdev_priv only, like this:

   struct os_priv *priv;
   priv = netdev_priv(dev);
The private data holds the device specific information which, for this lab, is as follows:
   struct os_priv {
      struct net_device_stats stats;
      struct sk_buff *skb;
      struct os_packet *pkt;
      struct net_device *dev;
   };        
The fields are self explanatory. Private information may be updated like this:
   struct net_device *os1;
   ...             /* allocate space for os1 and fill fields */
   priv = netdev_priv(dev);
   priv->dev = os1;

A socket is an endpoint of an inter-process communication flow across a computer network. A socket is characterized by local address, remote address, protocol, and several other things. Packets are communicated through sockets. For this purpose, linux maintains a struct sk_buff type which is defined as follows:

   struct sk_buff {
      struct sk_buff       *next;
      struct sk_buff       *prev;
      struct sk_buff_head  *list;
      struct sock          *sk;
      struct net_device    *dev;
      unsigned int          len, data_len;
      unsigned char        *head;
      unsigned char        *data;
      unsigned char        *tail;
      unsigned char        *end;
      __be16                protocol;
      u8                    ip_summed;
      ...
   }
Packets can exist on several kinds of doubly linked lists and queues, e.g., a TCP socket send queue. An sk_buff (SKB) object connects to a list with the next and prev fields. The sk_buff_head field identifies the list. The sk field points to the socket itself and the dev field to the net_device object. Fields head, data, tail, and end point to regions in the SKB, the first being "headroom" and the second being "data". Not shown are variables to store the location of various protocol layer headers as outgoing packets are built and incoming ones are parsed plus other fields that control the packet transmission. A description of how SKB objects are manipulated is given below when SKB operations are described.

The data structure that holds an IP packet header is this (defined in linux/ip.h):

   struct iphdr {
      __u8    protocol; /* TCP (0x06) */
      __be32  saddr;    /* source address (e.g. 192.168.0.1)
      __be32  daddr;    /* destination address (e.g. 192.168.1.1)
      __u8    ihl;      /* header length */
      ...
   };
Field saddr is the source address and field daddr is the destination address of the packet. The protocol field specifies the protocol that the packet is using (see this for a list of accepted protocols and their numbers). The following:
   struct ethhdr {
      unsigned char  h_dest[ETH_ALEN];    /* MAC address of destination */
      unsigned char  h_source[ETH_ALEN];  /* MAC address of source */
      __be16         h_proto;             /* ecapsulated protocol (IP=0x800) */
   };
is the ethernet header, which preceeds the IP header in the packet, and contains the source and destination ethernet addresses and the packet protocol. See this for a list of LAN data link layer protocols (including IP). See this for information on how a packet is received.

Functions: Kernel functions that will be used are described below. Many of these are SKB operations. Use the description found here to better understand what these functions are trying to do. Some of these are netif functions. See this (old) article for information on how these are used in packet reception.

  • dev_alloc_skb(int length) -
        allocate an empty SKB object and assign it a usage count of one. The head, data, and tail fields point to the beginning of the SKB.
  • dev_kfree_skb(struct sk_buff *skb) -
        free the skb memory because it will no longer be used
  • skb_reserve(struct sk_buff *skb, unsigned int len) -
        increase the headroom by reducing tail room by len bytes. Only allowed for empty SKBs. Fields data and tail are equal.
  • skb_put(struct sk_buff *skb, unsigned int len) -
        increase data space of a SKB by len at the rear of the data space.
  • skb_push(struct sk_buff *skb, unsigned int len) -
        increase data space of a SKB by len at the front of the data space.
  • skb_pull(struct sk_buff *skb, unsigned int len) -
        reduce data space of a SKB by len at the front of the data space.
  • skb_trim(struct sk_buff *skb, unsigned int len) -
        reduce data space of a SKB by len at the rear of the data space.
  • eth_type_trans (struct sk_buff *skb, struct net_device *dev) -
        determine the packet's protocol ID. The default is IEEE 802.3 which is typically good enough for an application.
  • alloc_etherdev(int len) -
        allocate space for a net_device object where len is the space used for the private structure that is to be part of the device. Return value is a pointer to the allocated space.
  • netif_rx(struct sk_buff *skb) -
        receive a packet from a device driver and queue it for the upper protocol levels to process.
  • netif_queue_stopped(struct net_device *dev) -
        return true if and only if the queue on device dev is unable to send.
  • netif_wake_queue(struct net_device *dev) -
        allow upper layers to call the device hard_start_xmit function to control flow when transmit resources are available.
  • netif_stop_queue(struct net_device *dev) -
        stop upper layers from calling the device's hard_start_xmit function to prevent data flow when transmit resources are unavailable.
  • netif_start_queue(struct net_device *dev) -
        allow upper layers to call the device's hard_start_xmit function.
  • htons(u32 addr) -
        converts addr from host byte order to network byte order.

Initialization: The objects that will be interfaces os0 and os1 are of type struct net_device* and are defined globally.

Everything needed to make the interfaces os0 and os1 active can be added to init_module. In what follows, space is allocated for struct net_device objects and the fields of those structs are initialized. Allocation of space is done like this:

      os0 = alloc_etherdev(sizeof(struct os_priv));
      os1 = alloc_etherdev(sizeof(struct os_priv));
The MAC address (dev_addr), broadcast (broadcast), and header length (hard_header_length) can be set like this:
   for (i=0 ; i < 6 ; i++) os0->dev_addr[i] = (unsigned char)i;
   for (i=0 ; i < 6 ; i++) os0->broadcast[i] = (unsigned char)15;
   os0->hard_header_len = 14;
for os1 as well with the addition of the following line:
   os1->dev_addr[5]++;
Note the broadcast address is FF:FF:FF:FF:FF:FF and the MAC address is the fictitious 00:01:02:03:04:05 for os0 and 00:01:02:03:04:06 for os1 - you can make up your own MAC addresses as you please (but for the discussion here, the os1 and os0 MAC addresses must differ by 1) - maybe a hidden message where the ascii value of each letter becomes an octet in the address.

The name fields of the interfaces can be filled in like this:

   memcpy(os0->name, "os0\0", 4);
A net_device_ops structure must be instantiated for both interfaces. This entails creating stubs:
   int os_open(struct net_device *dev) { return 0; }
   int os_stop(struct net_device *dev) { return 0; }
   int os_start_xmit(struct sk_buff *skb, struct net_device *dev) { return 0; }
   struct net_device_stats *os_stats(struct net_device *dev) {
      return &(((struct os_priv*)netdev_priv(dev))->stats);
   }
and mapping struct net_device_ops pointers to the above function pointers like this:
   static const struct net_device_ops os_device_ops = {
      .ndo_open = os_open,
      .ndo_stop = os_stop,
      .ndo_start_xmit = os_start_xmit,
      .ndo_get_stats = os_stats,
   };
The kernel gets stats on its own dime so this kernel module must be ready to supply some information, even if it is bogus.

Then this can be done in init_module:

    os0->netdev_ops = &os_device_ops;
    os1->netdev_ops = &os_device_ops;
There also needs to be a header creation function that is mapped to create in the struct header_ops field. Make the stub like this:
   int os_header(struct sk_buff *skb, struct net_device *dev,
                 unsigned short type, const void *daddr, const void *saddr,
                 unsigned int len) {
      return 0;
   }
use this to map:
   static const struct header_ops os_header_ops = {
      .create  = os_header,
   };
and tie it to the interfaces like this:
   os0->header_ops = &os_header_ops;
   os1->header_ops = &os_header_ops;
The header will have to be changed in this lab because source and destination addresses will be swapped - that is the reason for using a create header function.

You will also need to disable ARP recognition like this:

   os0->flags |= IFF_NOARP
   os1->flags |= IFF_NOARP
Otherwise, ARP requests will be sent before packets are transmitted and this driver will not have the code to respond to those requests - hence, no packets will be sent (try it). On the other hand, if you are interested, it is not that difficult to add code that responds to such requests.

Next in init_module, the private area can be initiatized. It already has space due to alloc_etherdev. It probably should be zeroed out like this:

   priv = netdev_priv(os0);
   memset(priv, 0, sizeof(struct os_priv));
Note: there is no direct access to the priv area - access is only though the function netdev_priv().

This is a convenient time to add the space for the packet that will carry the header and payload.

   priv->pkt = kmalloc (sizeof (struct os_packet), GFP_KERNEL);
   priv->pkt->dev = os0;
and similarly for os1. The struct os_packet declaration can be this simple:
   struct os_packet {
      struct net_device *dev;
      int datalen;
      u8 data[ETH_DATA_LEN];
   };
where dev is needed to start and stop the transmission of packets, data will be the packet, and datalen will be the length of the packet.

The interfaces may now be registered like this:

   register_netdev(os0);
   register_netdev(os1);
Since resources are used, it is wise to free them in the exit module like this:
   struct os_priv *priv;
   if (os0) {
      priv = netdev_priv(os0);
      kfree(priv->pkt);
      unregister_netdev(os0);
   }
and similarly for os1.

First test: Compiling (do not include the ... ellipses in the code) and loading the module will be enough to activate and view the interfaces with ifconfig, check the routes with route, and even ping. If you do this:

   sudo tcpdump -i os0
in one shell window and do this in another
   ping 192.168.0.2
you see this
   tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
   listening on os0, link-type EN10MB (Ethernet), capture size 65535 bytes
   16:44:32.123205 40:00:40:01:b9:55 (oui Unknown) > 45:00:00:54:00:00 (oui Unknown), 
    ethertype Unknown (0xc0a8), length 84:
        0x0000:  0001 c0a8 0002 0800 2111 0ad3 0001 c0e5 ........!.......
        0x0010:  0a51 14e1 0100 0809 0a0b 0c0d 0e0f 1011 .Q..............
        0x0020:  1213 1415 1617 1819 1a1b 1c1d 1e1f 2021 ...............!
        0x0030:  2223 2425 2627 2829 2a2b 2c2d 2e2f 3031 "#$%&'()*+,-./01
        0x0040:  3233 3435 3637                           234567
in the tcpdump window because the packets are not yet formed. So the next step is to expand the stubs.

Expand stubs: The open and stop functions are straightforward - they just start and stop the ability to receive packets:

   int os_open(struct net_device *dev) { netif_start_queue(dev); return 0; }
   int os_stop(struct net_device *dev) { netif_stop_queue(dev); return 0; }

Every packet that is sent or received uses a socket buffer (struct sk_bff). The header function os_header receives an sk_buff object as argument skb from the OS when a packet is pushed into one of the interfaces os0 or os1. The header function needs to place a data-link header (MAC addresses) in front of the packet which currently begins with an IP header (IP addresses). The first operation, then, is

   struct ethhdr *eth = (struct ethhdr*)skb_push(skb,ETH_HLEN);
The space is filled with source MAC address, destination MAC address, which come from the net_device object as argument dev, and the protocol which comes from the unsigned short object as argument type. The MAC addresses may be filled with this:
   memcpy(eth->h_source, dev->dev_addr, dev->addr_len);
   memcpy(eth->h_dest, eth->h_source, dev->addr_len);
which makes the source and destination addresses the same so add this:
   eth->h_dest[ETH_ALEN-1] = (eth->h_dest[ETH_ALEN-1] == 5) ? 6 : 5;
or something like it if you created MAC addresses different from the ones above: the result is just to make the destination address differ from the source address by 1. The protocol is filled like this:
   eth->h_proto = htons(type);
to meet the internet standard for the order of data byte transmission. The return value of the header function is the number of bytes in the header. This can be done like this:
   return dev->hard_header_len;

Transmission of a packet is accomplished with os_start_xmit. This function is called by the OS after the header function above with the same struct sk_buff object as argument skb. Ordinarily, the next step would be to shove the packet out through the NIC. But, in this lab, the packet is merely going to be looped back. The action of receiving the looped back packet through a NIC will be simulated by directly calling another function which would be the receive interrupt handler if a NIC had requested an interrupt. The looped back packet will be prepared from the one held by skb. The first step is to extract the packet and its length from skb like this:

   char *data = skb->data;
   int len = skb->len;
The skb holds the packet data so it is needed until the handlers are called. So it gets put into the private area like this:
   priv->skb = skb;
The IP header needs to be changed: the source and destination networks need to be reversed (for example, 192.168.0.1 is reversed to 192.168.1.1 and 192.168.1.1 is reversed to 192.168.0.1). The IP header may be referenced like this:
   struct iphdr *ih = (struct iphdr *)(<data-of-skb>+sizeof(struct ethhdr));
since it follows the data-link header that was added earlier. Reversals are accomplished as follows:
   u32 *saddr = &ih->saddr;
   u32 *daddr = &ih->daddr;
   ((u8*)saddr)[2] ^= 1;
   ((u8*)daddr)[2] ^= 1;
Since IP requires checking a checksum, a new one should be created like this:
   ih->check = 0;
   ih->check = ip_fast_csum((unsigned char *)ih, ih->ihl);
where ihl is the IP header length. However, I have found this is not checked in this simple lab so it can be ignored. Before calling the pseudo-receive interrupt handler, put the modified packet into the private area like this if the destination is os1:
   priv = netdev_priv(os1);
   priv->pkt->length = len;
   memcpy(priv->pkt->data, data, len);
or like this if the destination is os0:
   priv = netdev_priv(os0);
   priv->pkt->length = len;
   memcpy(priv->pkt->data, data, len);
Next, call the receive interrupt handler like this if the destination is os1:
   os_rx_i_handler(os1);
or like this if the destination is os0:
   os_rx_i_handler(os0);
Typically, the NIC will generate a transmit interrupt and handling this can be simulated like this if the source is os1:
   os_tx_i_handler(os1);
or like this if the source is os0:
   os_tx_i_handler(os0);
After this, the skb object is freed with this:
   priv = netdev_priv(src);
   dev_kfree_skb(priv->skb);
where src is whichever of os0 or os1 that is the source of the packet.

All that's left to do are write the interrupt handlers. Consider os_rx_i_handler first. The objective is to create a socket buffer (struct sk_buff object) and fill it with packet data. This can be done like this:

   skb = dev_alloc_skb(length-of-data); 
   memcpy(skb_put(skb, length-of-data), data, length-of-data);
where data and length-of-data are obtained from the private area. Then add the metadata (the dev object was brought in as the argument):
   skb->dev = dev;
   skb->protocol = eth_type_trans(skb, dev);
and finally process the packet:
   netif_rx(skb);
   Note: netif_rx eventually frees space allocated to skb.   

But if the queue is stopped, enable it:

   if (netif_queue_stopped(priv->pkt->dev)) netif_wake_queue(priv->pkt->dev);
The transmit interrupt handler just does this:
   priv = netdev_priv(dev);
   if (netif_queue_stopped(priv->pkt->dev)) netif_wake_queue(priv->pkt->dev);

Organization: To be somewhat faithful to the way things are done in practice, the following organization of the above mentioned code might be tried - it might also organize your thoughts so that you finish the lab sooner:

   int your_open(dev) {  /* start the network queue */  }

   int your_stop(dev) {  /* stop the network queue */  }

   static void your_tx_i_handler(dev) {
      /* normally stats are kept - you do not need to do much
         here if you do not want to except resume the queue 
         if it is not accepting packets.
       */
   }

   static void your_rx_i_handler(dev) {
      /* allocate space for a socket buffer
         add two bytes of space to align on 16 byte boundary
         copy the packet from the private part of dev to the socket buffer
         set the protocol field of the socket buffer using 'eth_type_trans'
         set the dev field of the socket buffer to dev (argument)
         invoke 'netif_rx()' on the socket buffer
         resume the network queue if it is not accepting packets
       */
   }

   netdev_tx_t your_start_xmit(skb, dev) {
      /* pull the packet and its length from skb
         locate the IP header in the packet (after the eth header below)
         switch the third octet of the source and destination addresses
         save the modified packet (even add some data if you like)
           in the private space reserved for it
         simulate a receive interrupt by calling 'your_rx_i_handler'
         simulate a transmit interrupt by calling 'your_tx_i_handler'
         free skb
       */
   }

   int your_header(skb, dev, type, daddr, saddr, len) {
      /* make room for the header in the socket buffer (from argument 'skb')
         set the protocol field (from argument 'type')
         copy the address given by the device to both source and
           destination fields (from argument 'dev')
         reverse the LSB on the destination address
       */
   }

   static const struct header_ops your_header_ops = {
      .create  = your_header_op,
   };

   static const struct net_device_ops your_device_ops = {
      .ndo_open = your_open,
      .ndo_stop = your_stop,
      .ndo_start_xmit = your_start_xmit,
      .ndo_get_stats = your_stats,
   };

   int init_module (void) {
      /* allocate two ethernet devices
         set MAC addresses and broadcast values
         set device names
         set network device operations
         set network header creation operation
         set NOARP flags
         kmalloc space for a packet
         register both network devices
       */
   }

   void cleanup_module (void) {
      /* free the packet space
         unregister the network devices
       */
   }
Try this!:
    Implement both the ARP and PING response messages. Documentation for these is:
  ARP
  PING
Test as follows:
  prompt> ping 192.168.1.2
  PING 192.168.1.2 (192.168.1.2) 56(84) bytes of data.
  64 bytes from 192.168.1.2: icmp_seq=1 ttl=64 time=0.108 ms
  64 bytes from 192.168.1.2: icmp_seq=2 ttl=64 time=0.048 ms
  64 bytes from 192.168.1.2: icmp_seq=3 ttl=64 time=0.068 ms
  64 bytes from 192.168.1.2: icmp_seq=4 ttl=64 time=0.053 ms
  64 bytes from 192.168.1.2: icmp_seq=5 ttl=64 time=0.067 ms
  ...
Also check results of wireshark: