University of Cincinnati Logo
 

EECE-4029 Operating Systems
Electrical Engineering and Computer Science

Notes Contributed by Students
    William Kirovski
        question: What type is the second argument in function module_param?
        answer: It is just a plain old, built-in type. But what type is a type you ask? There is a struct param_ops_int for type int and so on for primitive types. These are of type struct kernel_param_ops which looks like this:
  struct kernel_param_ops {
     /* Returns 0, or -errno.  arg is in kp->arg. */
     int (*set)(const char *val, const struct kernel_param *kp);
     /* Returns length written or -errno.  Buffer is 4k (ie. be short!) */
     int (*get)(char *buffer, const struct kernel_param *kp);
     /* Optional function to free kp->arg when module unloaded. */
     void (*free)(void *arg);
  };
So, when int appears as the second argument to module_param, a kernel_param_ops object gets set, get, and free functions according to the int type.
 
    Alain Kuchta
        question: What, exactly, is module_param?
        answer: module_param is a compiler macro: at compile time it is replaced with the appropriate code for setting up a given parameter for a module. A fair amount of magic happens behind the scenes of module_param. Because C is a strictly typed language, the procedures that make module parameters have to be implemented once for each C type that parameters can have. This makes for quite a few functions in moduleparam.h.

The kernel writers created a compiler macro to decide which implementation to use for a specific "type" of parameter. This is handy for module developers, but the really important thing is that this lets the code that should be the same for all kinds of parameters be reused.

The "type" is input as a string and is interpreted by the macro to create the appropriate kernel_param_ops object as in the answer to the above question.

 
    Alain Kuchta
        question: What, exactly, is charp?
        answer: The type specified as a parameter to the module_param function is simply intepreted as a string. It is concatenated to other strings to make the correct C function name to call. Since * is an operator in the C language, it can't be part of a function name. So charp is used because this type "alias" can be part of a valid C function's name.
 
    William Kirovski
        question: Does module_param have to appear at the top of a module?
        answer: No, it can appear anywhere. However, the parameter it refers to must obey the usual placement conventions for C code which means it must be defined ahead of code that uses it. Also, it must be initialized where it is declared.
 
    Daniel Griffin
        question: Does set_timer create a hardware timer or a software timer?
        answer: Timers that are set up in software depend on interrupts raised by a processor but there are several ways these could be handled. One possible way is create and instantiate timer objects and put them into a priority queue (uses a red-black tree) that is maintained in the kernel. The interrupts occur at regular intervals, say every millisecond. When an interrupt is raised, the OS calls a handler (software) that compares the current "clock tick" (or jiffies) with the "firing time" in the first timer object in the queue. When the firing time is reached, the function associated with the first timer object is invoked and the timer object is removed from the queue. Of course, maintaining the red-black tree does not scale well and such a scheme is suitable for a few high precision timers. A second possibility is to maintain interval blocks of, say, 10 milliseconds each. Still use a priority queue but now each element in the queue is a list of timer objects, all of which are scheduled to fire within a particular interval of time. When the comparison indicates, all of the timers in the earliest interval fire (at least they try to)! A third possibility is to make use of a hardware timer that is set to fire by machine instruction. Linux has been using the second approach where the priority queue is of a fixed size - this, known as a timer wheel, has obvious limitations (for one, a timer too far into the future can overflow the queue - but this can be handled by maintaining an overflow queue where timer objects will be placed in the priority queue when the appropriate interval becomes available). This timer wheel scales wonderfully due to relatively efficient placement and execution of timer objects but the price paid is a loss of accuracy: not only the interval over which a timer is set to fire is relatively large, but also the times at which interrupts occur may not line up with the times at which intervals begin and the latter results in as much as half an interval of error. See this article for more information on Linux timers.
 
    Jacob Chesley
        question: What happens to memory when it is not set free by a kernel module or a program running in kernel space? Does the kernel recognize this and return it to the system or is the memory still taken even when the program or kernel module ends? I have seen in user space on windows, when a program terminates without freeing memory, the operating system reclaims the memory, but is this the case in kernel space?
        answer: Two interesting things before the main question. First, in all reasonable OSes, when a user-space C program terminates, all stack space is returned. An OS is not required to return heap space when a user-space C program terminates. However, this is done in some other languages. It is possible that some C compilers on Windows result in programs that return heap space when a program terminates. Unfortunately, Microsoft has a history of distributing tools that do not comply with international standards. So, we should discount talking about such tools. Jacob may be referring to C#. C# began as "polluted java" with the intention of knocking Sun's java out of existence. When that failed Microsoft gave the language a new name and did all kinds of things that violated standard language specifications. Thus, although garbage collection is not part of C or C++, it is part of C#. Because Microsoft is not a standards agency, I tend not to bring up C# when talking about how user-space C or C++ behaves.

 
Second, the linux OS defers allocation of heap space until it is actually used. In other words, doing this:

    int *x = (int*)malloc(10000);
 
does nothing except hand a pointer to variable x until something like this happens:
    x[2] = 42;
 
whereupon some space is allocated.

 
Now to the question at hand. Kernel memory is never freed automatically. This includes kmalloc. See this forum for more information.

 
    Jacob Chesley
        question: When executing a sync_fetch_and_add or similar function with build in hardware support, we see in user space the hardware instruction is executed. You said this is not the case in kernel space, and instead it is executed in software. Is there a way to execute the hardware instruction in kernel space, or is that something that should be avoided? If it should be avoided, why should it be avoided?
        answer: What I should have said was the instruction may or may not be executed using direct atomic processor instructions, generally depending on the architecture the code is compiled for. The type atomic_t and functions such as atomic_set present an API that is uniform over all architectures.
 
    Daniel Griffin
        question: Why aren't memory barriers found in user space? In kernel space, barriers are used to ensure that a set of loads and stores for concurrently accessed data is synchronized. Why isn't this the case for parallel programs?
        answer: From LWN.net: Most parallel code does not use explicit memory barriers. After all, the required memory barriers are coded into the underlying primitives, such as locking, atomic operations, and RCU. This allows most developers to use these primitives and cheerfully ignore memory barriers.
 
However, if you need the ultimate in performance and scalability, or if you need to work on the underlying primitives, then you will need a solid understanding of memory barriers.
 
    Daniel Griffin
        question: Is it possible to have a "master" lock for resources such that a process doesn't need to acquire sets of locks?
        answer: while this seems to be a good idea, I have not yet come across an operating system that lists this as a feature. If anyone does, they get 5 brownies if they can give me a valid citation to that information.
 
    Daniel Griffin
        question: Is there software available that can check code for circular locking and other causes of deadlock?
        answer: From UW in Seattle: The Microsoft SQL Server Database Engine automatically detects deadlock cycles within SQL Server. The Database Engine chooses one of the sessions as a deadlock victim and the current transaction is terminated with an error to break the deadlock.
 
But: the NT kernel architecture is a deadlock minefield. With the multi-threaded re-entrant kernel there is plenty of deadlock potential. Lock ordering is great in theory, and NT was originally designed with mutex levels, but they had to be abandoned.
 
    Daniel Griffin
        question: In the following example used to demonstrate livelock doesn't at least one thread have to acquire the trylock before other threads are "bounced" out? So I do not see how two threads, say, can always prevent each other from acquiring the trylock, even if they are invoking it at the same time.
   top:
     lock(L1)
     if (trylock(L2) == -1) {
        unlock(L1);
        goto top;
     }
        answer: the above does not give the complete story. Consider the following:
   void Func1() {
      while(true) {
         Lock(L1);
         if (trylock(L2) == -1) {
            unlock(L1);
            continue;
         }
         /* do work */;
      }
   }

   void Func2() {
      while(true) {
        Lock(L2);
        if (trylock(L1) == -1) {
           unlock(L2);
           continue;
        }
        /* do work */
      }
   }
One thread runs Func1 and one thread runs Func2. Both threads get locks L1 and L2 simultaneously (more or less). Then both locks execute the trylock simultanously (more or less). But both will fail causing the process to repeat. Sorry - brain not working well today. Minus 10 brownies for the professor. If I lose more brownies my punishment will be to give a hard final exam.
 
    Daniel Griffin
        question: Are hash tables used for directory structures on systems with large amounts of files?
        answer: Yes, but with some caveats. FreeBSD uses an implementation of hashing, called "Dirhash" for quickly finding files within a directory. This function works by building a hash table in memory when a directory is first accessed. This means that the Dirhash feature is backwards compatible because it builds the hash table during run time, regardless of how the file is represented on disk. Furthermore, the hash table is only built in memory after the directory is accessed (meaning it's demand based). If the directory is accessed multiple times, then this feature is very effective. If the directory isn't accessed again, then it wastes memory. The Dirhash was ported into OpenBSD in 2003 and NetBSD in 2005. The FreeBSD website below gives some pretty cool graphical stats on the speed of the Dirhash function.
 
The Dirhash feature contrasts with the "HTree" feature used in ext3 and ext4 Linux file systems, which is similar to a B-tree. Apparently, the only difference between HTrees and B-trees is the way they do hashing.
 
Dirhash Sites
FreeBSD Dirhash Stats: http://wiki.freebsd.org/DirhashDynamicMemory
Wikipedia Article: http://en.wikipedia.org/wiki/Dirhash
 
HTree Sites
Wikipedia Article: http://em.wikipedia.org/wiki/HTree
 
    Alain Kuchta
        question: What's the deal with work objects not taking parameters?
        answer: Actually, one can set an atomic_long_t parameter (see the declaration of struct work_struct below) but a developer must keep mits off this variable as it is used by the OS. Anyway, it is possible to send any number of parameters and types into a work function using some pretty sneaky casting. But, note that doing so is potentially dangerous and the reader should treat what follows as an academic exercise.

When using a workqueue in the linux kernel the work function must match a strict function signature:

  typedef void (*work_func_t)(struct work_struct *work); 
That is, the work function must (1) return nothing and (2) take only one parameter which is a pointer to a struct work_struct object. An example of how a work function is enqueued is the following:
 
  struct workqueue_struct* = create_workqueue("queue"); 
  struct delayed_work* dwork 
     = (struct delayed_work*)kmalloc(sizeof(struct delayed_work), GFP_KERNEL);
  INIT_DELAYED_WORK((struct delayed_work*)dwork, function);
  queue_delayed_work(queue, dwork, HZ); 
Looking at the above, there is no apparent way to pass arbitrary parameters to the work function. The following shows how this might be done, although not in any best-practice style.

Consider the way that C structs are stored in memory: the address of a struct is also the address of its first member. This can be exploited by creating a new struct extended_work declaration which contains, as first member, a struct delayed_work field, followed by fields for parameters to be passed to the work function. A struct extended_work_struct object can be created instead of the usual struct work_struct object and a pointer to this object can be cast to struct delayed_work*, allowing it to be enqueued in the normal way and allowing for parameter values that are expected by the work function to be set. The workqueue is happy because it has access to all the fields of an apparent struct delayed_work object and is not concerned with addresses beyond the declared fields of such an object: addresses that contain values to be used by the work function. But, inside the work function, the work_struct* work parameter is cast to struct extended_work* to enable easy access to those values.

The following is an example. First, observe the struct work and struct delayed_work declarations from the kernel and a struct extended_work declaration that applies only to a work function that expects two extra unsigned ints:

 
  struct work_struct { 
     atomic_long_t data; 
     struct list_head entry; 
     work_func_t func; 
  };

  struct delayed_work {
     struct work_struct work;
     struct timer_list timer;
     struct workqueue_struct *wq;
     int cpu;
  };

  typedef struct extended_work {
     struct delayed_work work;
     unsigned int param1;
     unsigned int param2;
  } extended_work;

The following sets up the work function in the workqueue:

  struct workqueue_struct* = create_workqueue("queue");
  extended_work* work 
    = (extended_work*)kmalloc(sizeof(extended_work), GFP_KERNEL);  
  work->param1 = 25;  /* parameters are set before INIT_DELAYED_WORK    */
  work->param2 = 78;  /* to ensure that 'function' is able to read them */
  INIT_DELAYED_WORK((struct delayed_work*)work, function);
  queue_delayed_work(queue, dwork, HZ);
The following is a work function that uses parm1 and parm2:
  void work_function(work_struct* work) {
     extended_work* ex_work = (extended_work*) work;
     printk(KERN_INFO "The parameters are: %d, %d", 
            ex_work->param1, ex_work->param2);
  }
Unfortunately, potentially more than one struct extended_work declaration will be needed using this approach. Observe that the atomic_long_t variable of struct work_struct may be set in a manner similar to that described above - but it is probably the case that doing so will screw things up big time.
 
    Jeff Hatton
        question: What is the relationship between flush_workqueue and cancel_delayed_work?
        answer: flush_workqueue pauses execution until all work that has been enqueued before flush_workqueue is called is run to completion.

cancel_delayed_work cancels specified delayed work. This function is used when scheduled work needs to be removed from the queue or doesn't need to be completed before execution continues.

In the context of the example in class, flush_workqueue should be called before cancel_delayed_work to avoid a possible race case. Specifically, flush_workqueue is called first to allow unscheduled, enqueued work to be scheduled. Then cancel_delayed_work is called to remove and complete that scheduled work. flush_workqueue should then be called again to ensure that all work has completed. Observe, flush_workqueue does not "need" to be called again because, theoretically, there should not be any work running or pending anymore.

 
    Jeff Hatton
        question: It is safe to enqueue a delayed work that does not exist?
        answer: Function queue_delayed_work returns 0 if work to enqueue is successfully enqueued - of course that work does not exist. If queue_delayed_work is applied to NULL or a work object with non-existent function, it will return a number that is not 0 - that number indicates the kind of error that occured and no object will be enqueued.
 
    Jeff Hatton
        question: What is the purpose of the second argument in
int sem_init(sem_t *sem, int pshared, unsigned int value)?
        answer: This function initializes a semaphore for use. The argument semt_t *sem is the semaphore that is to initialized. The scope of this variable and where in memory the variable is located becomes important for the second argument. The second argument, int pshared, marks if the semaphore is to be shared between threads in the same process or between threads in multiple processes. If pshared is zero, the semaphore is shared between threads in the same process and must be accessible to threads by either being a global variable or allocated on the heap. If pshared is non-zero, the semaphore is shared between threads of multiple processes, generally forked children. In that case, the semaphore must be placed in some shared memory space which is accessed using shm_open, mmap, and shmget. The third argument, unsigned int value, initializes the semaphore's counter.
 
    Jeff Hatton
        question: What is the point of hyper-threading?
        answer: Hyper-threading is implemented by duplicating the components of a process that store the processor architectural state (that is, the pipeline), while sharing the execution resources (that is, the cache, buses, ALU, branch predictor, instruction fetch, and so on). Since the execution state is independent of the virtual cores, the operating system can schedule work as if there were more cores than physically exist. Since each virtual core has its own state it can be individually halted, interrupted, or told to run a specific thread. It also allows more instructions to be in the pipelines than if the cores were not Hyper-threaded.

The gain of this architecture is that it allows two independent processes to efficiently share execution time of a processor. If on process stalls waiting for a resources to become available, the other process can then use the execution components of the processor without much overhead switching between the execution of the processes. For hyper-threading to be taken advantage of fully, either the OS or the process running the threads needs to be aware of the hyper-threads so it can schedule them efficiently.

Hyper-Threading has the most gain when there are processes running on a system that do not require a lot of processing time but can stall waiting for resources. If a core is not hyper-threaded, a high-overhead context switch to another thread would be needed to keep the processor busy but in the worst case the whole core could be stalled waiting for resources. But, if the core is running, say two hyper-threads, it can do a low-overhead switch to one thread if the other is stalled waiting for a resource.

A more detailed discussion is here.

 
    Matthew Kathman
        question: Can you interrupt a thread in user space?
        answer: reference

Matthew questions whether threads should be used in the first place. He researched this and found that Linux provides several advantages for applications (presumably to use threads) including more robust and flexible process management, a standardized system-call interface, simpler resource management, a large number of libraries for XML, and regular expression parsing, among others. It also makes applications more straightforward to debug by providing memory isolation and independent restarts.

Specific to the question at hand, Linux provides a standard User I/O (UIO) framework for developing user-space based device drivers. The UIO framework defines a small kernel-space component that performs the following key tasks:

  • indicate device memory regions to user space
  • register for device interrupts and provide interrupt indication to user space
The kernel-space UIO component then exposes the device via a set of sysfs entries (e.g. /sys/devices/pci0000:00/0000:00:00.0/...). The user-space component searches for these entries, reads the device address ranges, and maps them to user-space memory.

The user-space component can perform all device-management tasks including I/O from the device. For interrupts, however, it needs to perform a blocking read() on the device entry, which results in the kernel component putting the user-space application to sleep and waking it once and interrupt is received.

 
    Katie Dew
        question: Why can't we use locks for barriers and call it a day?
        answer: A memory barrier is an instruction that is used to enforce the order of threads being issued before and after the memory barrier instruction. The need for memory barriers comes from the enhancements a compiler makes that could cause some threads to be executed in a different order than the programmer expects. Therefore, memory barriers are used when threads need to be executed in a specific (partial) order.

Locks (mutexes) are used when only one thread needs to be modifying a section of code at a time, therefore acting like a single thread environment.

But locks are not as efficient as using memory barriers. Why?

Hypothesis: When a lock is used on a section of code, only one thread can execute that section of code at a time, effectively making that section of code a single threaded environment. This can take a while if there are a lot of threads. When a memory barrier is used, the threads can execute simultaneously but have to wait at the barrier until all of the threads reach that point. Then once all the threads have reached that point, they can all continue executing simultaneously. I (Katie) hypothesize that this method is much quicker than the method of using mutexes.

Answer from web: A memory barrier suspends all of the threads and the entire processor is stopped until a condition has been met, which takes a fraction of a second at most. However, when a thread tries to grab a mutex lock that another thread owns, the thread is suspended until the owner frees it, which could be a fraction of a second or minutes.

 
    Alain Kuchta
        question: When using workqueues in the linux kernel, what happens if the same work is re-enqueued?
        answer: There are two cases:
  1. The work is already in the queue: Queuing the work again has no effect and queue_work returns 1 indicating that the work is already queued.
  2. The work was in the queue but has already run or is currently running: there are two points:
    • Queuing the work will put the work back into the queue. If the work is currently running, the work function will be executed two times in total.
    • However, running two or more times can cause problems and some test needs to be made to determine whether a repeat execution should proceed. That test could be in the function that enqueue's the work or in the work function itself. The following example illustrates the problem.
Example:
Consider the following kernel module:
/**
 * cap_queue
 * Author: nmoadev (Alain Kuchta)
 * Date: 2/12/2015
 * Description:
 *  This module is meant to demonstrate a pitfall of using workqueues 
 *  and requeing the same work_struct.  This program takes input via 
 *  a proc file, processes it through a workqueue, and outputs through 
 *  the same proc file.  If the input is ASCII characters, they will 
 *  be capitalized, otherwise the bytes are passed through.
 *
 *  Usage (once the module is installed):
 *      $ echo "aaa" > /proc/cap_queue
 *      (wait a bit... see definition of 'do_work')
 *      $ cat /proc/cap_queue
 *      AAA
 *
 *  Try writing to the procfile quickly several times. Then try reading 
 *  from the proc file the same number of times.  Some data goes missing.
 */

#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/workqueue.h>
#include <linux/slab.h>
#include <linux/delay.h>
#include <linux/proc_fs.h>
#include <asm/uaccess.h>

// Compiler Directive constants
// ============================
#define BUFF_BITE 20
#define PROC_FILE_NAME "cap_queue"
#define PROC_ROOT_DIR NULL
#define PROC_PERMISSIONS 438

// Type definitions
// ====================
typedef struct {
  char* bytes;
  size_t size;
} chunk_t;

typedef struct {
   void **data;
   int tail, head, size, nelem;
} queue;

/* Function prototypes *
 * =================== */
ssize_t write(struct file* f, const char __user *buf, size_t len, loff_t* off);
ssize_t read(struct file *fd, char __user *buf, size_t c, loff_t *off);
/* Silly function to some fictitious work on a chunk */
void capitalize(chunk_t* chunk);
/* Work function to be used in work queue */
void do_work(struct work_struct* work);
/* Function to add work to the work queue, called by write interface */
ssize_t add_work(const char __user * buf, size_t len);
/* Utilities for a simple queue */
void drain_queue(queue* d_queue);
void init_queue(queue*);
void enqueue(queue*, void*);
void *dequeue(queue*);
/* Module init and cleanup */
int init_cap_queue(void);
void unload_cap_queue(void);

/* Globals *
 * ======= */
queue out_buf;
queue work_buf;
unsigned add_work_calls = 0;
unsigned do_work_calls = 0;
bool close_read = 0;
struct workqueue_struct* workqueue;
struct work_struct work;
struct proc_dir_entry* proc_entry;
static const struct file_operations file_ops = {
  .owner = THIS_MODULE,
  .read  = read,
  .write = write,
};

/* Module Configuration *
 * ==================== */
MODULE_LICENSE("GPL");
MODULE_AUTHOR("nmoadev");
module_init(init_cap_queue);
module_exit(unload_cap_queue);

/* Functions *
 * ========= */
ssize_t write(struct file* f, const char *b, size_t len, loff_t* o) {
   return add_work(buf, len);
}

ssize_t read (struct file *fd, char *buf, size_t c, loff_t *off) {
   ssize_t size;
   chunk_t* chunk;
   if (close_read) {
     printk(KERN_INFO "{CAP_QUEUE} <read>(Already got chunk)<end:read>\n");
     close_read = 0;
     return 0;
   }
   printk(KERN_INFO "{CAP_QUEUE} <read>\n");
   chunk = (chunk_t*) dequeue(&out_buf);
   if (!chunk || !chunk->bytes) {
     printk(KERN_INFO "{CAP_QUEUE} <end:read> No chunks in queue.\n");
     return 0;
   }
   size = chunk->size;
   printk(KERN_INFO "{CAP_QUEUE} chunk->bytes=%s chunk->size=%lu\n",\
          chunk->bytes, (unsigned long)chunk->size);
   if (copy_to_user(buf, chunk->bytes, chunk->size)) return -EFAULT;

   kfree(chunk->bytes);
   kfree(chunk);

   close_read = 1;
   printk(KERN_INFO "{CAP_QUEUE} <end:read>\n");
   return size;
}

void capitalize(chunk_t* chunk) {
   size_t i;
   char* cur;
   printk(KERN_INFO "{CAP_QUEUE} <capitalize>\n");
   for (i = 0; i < chunk->size; i++) {
     cur =  (chunk->bytes + i);
     /* If the character is an ascii lower case character */
     if (96 < *cur && *cur < 123) *cur = *cur - 32;
   }
   printk(KERN_INFO "{CAP_QUEUE} <end:capitalize>\n");
}

void do_work(struct work_struct* work) {
   chunk_t* chunk;
   do_work_calls++;
   printk(KERN_INFO "{CAP_QUEUE} <do_work:%u>\n", do_work_calls);
   /* Take the next set of data from the work_buffer */
   chunk = (chunk_t*) dequeue(&work_buf);

   if (!chunk) return;

   /* In order for the demonstration to work correctly, 
      an artificial delay needs to be introduced. */
   msleep(5000);

   /* Captilize the chars in the chunk */
   capitalize(chunk);
   /* Put chunk into the output queue */
   enqueue(&out_buf, chunk);
   printk(KERN_INFO "{CAP_QUEUE} <end:do_work:%u>\n", do_work_calls);
}

ssize_t add_work(const char __user * buf, size_t len) {
   chunk_t* chunk;
   int ret;

   /* Keep track of times add_work is called */
   add_work_calls++;
   printk(KERN_INFO "{CAP_QUEUE} <add_work:%u>\n", add_work_calls);

   /* Make a new chunk */
   chunk = (chunk_t*)kmalloc(sizeof(chunk_t), GFP_KERNEL);
   chunk->bytes = (char*)kmalloc(sizeof(char)*len, GFP_KERNEL);

   /* Copy bytes form user to chunk */
   if (copy_from_user(chunk->bytes, buf, len)) return -EFAULT;
   chunk->size = len;

   /* Put the chunk in the queue */
   enqueue(&work_buf, chunk);

   /* Queue some work */
   ret = queue_work(workqueue, &work);
   printk(KERN_INFO "{CAP_QUEUE}    queue_work returned: %d\n", ret);
   printk(KERN_INFO "{CAP_QUEUE} <end:add_work:%u>\n", add_work_calls);

   return len;
}

void drain_queue(queue* d_queue) {
   chunk_t* chunk;
   printk(KERN_INFO "{CAP_QUEUE} <drain_queue>\n");
   if (!d_queue) return;

   chunk = dequeue(d_queue);
   while (chunk) {
     if (chunk->bytes) {
       printk("{CAP_QUEUE} Freeing '%2s'\n", chunk->bytes);
       kfree(chunk->bytes);
     }
     kfree(chunk);
     chunk = dequeue(d_queue);
   }
   printk(KERN_INFO "{CAP_QUEUE} <end:drain_queue>\n");
}

int init_cap_queue(void) {
   printk(KERN_INFO "{CAP_QUEUE} <init_cap_queue>\n");
   proc_entry = proc_create(PROC_FILE_NAME, PROC_PERMISSIONS, \
                            PROC_ROOT_DIR, &file_ops);
   if (proc_entry == NULL) {
     printk(KERN_ERR "{CAP_QUEUE} Couldn't create proc entry\n");
     return -ENOMEM;
   }

   init_queue(&out_buf);
   init_queue(&work_buf);
   workqueue = create_workqueue("cap_queue_queue");
   INIT_WORK(&work, do_work);

   return 0;
}

void unload_cap_queue(void) {
   printk(KERN_INFO "{CAP_QUEUE} <unload_cap_queue>\n");
   printk(KERN_INFO "{CAP_QUEUE}    add_work called %u\n", add_work_calls);
   printk(KERN_INFO "{CAP_QUEUE}    do_work called %u\n", do_work_calls);

   drain_workqueue(workqueue);
   destroy_workqueue(workqueue);
   drain_queue(&out_buf);
   drain_queue(&work_buf);
   remove_proc_entry(PROC_FILE_NAME, PROC_ROOT_DIR);
   printk(KERN_INFO "{CAP_QUEUE} <end:unload_cap_queue>\n");
}

void init_queue(queue *q) {
   q->size = 10;
   q->data = (void**)kmalloc(q->size*sizeof(void*), GFP_KERNEL);
   q->head = q->tail = q->nelem = 0;
}

void enqueue(queue *q, void *item) {
   if (item == NULL) return;
   if ((q->head == 0 && q->tail == q->size-1) || q->head == q->tail+1) {
      printk(KERN_INFO "No space left - cannot add - 1\n");
      return;
   }
   q->data[q->tail] = item;
   q->nelem = q->nelem+1;
   /* Special case, move head to empty element */
   if (q->head == q->tail) q->head = q->size-1;
   if (q->tail == q->size-1) q->tail = 0; else q->tail++;
}

void *dequeue(queue *q) {
   void *object = NULL;

   if (q->tail == q->head) return NULL; /* empty list */
   if (q->tail > q->head) {
      q->head = q->head+1;
      object = q->data[q->head];
      /* an empty queue, initialize */
      if (q->head+1 == q->tail) q->head = q->tail = 0;
      q->nelem = q->nelem-1;
      return object;
   } else {
      q->head = (q->head+1) % q->size;
      object = q->data[q->head];
      /* an empty queue, initialize */
      if (((q->head+1) % q->size) == q->tail) q->head = q->tail = 0;
      q->nelem = q->nelem-1;
      return object;
   }
}
Use the following python script to test it
#! /usr/bin/python
from time import sleep

proc_file = "/proc/cap_queue"

def write_to_proc(string):
    with open(proc_file, "w") as f:
        f.write(string)
def read_from_proc():
    string = ""
    with open(proc_file, "r") as f:
        string = f.read()
    if string == "":
        string = "<empty>"

    return string


print "Writing 'aaa' to proc file"
write_to_proc("aaa")
print "Wrinting 'bbb' to proc file"
write_to_proc("bbb")
print "Writing 'ccc' to proc file"
write_to_proc("ccc")

print "Giving a little time for module to work"
sleep(7)
print "Reading from proc file..."
print "Expected = 'AAA' | Actual = '{0}'".format(read_from_proc())
sleep(5)
print "Expected = 'BBB' | Actual = '{0}'".format(read_from_proc())
sleep(5)
print "Expected = 'CCC' | Actual = '{0}'".format(read_from_proc())
What happens is that, although 3 chunks were written into the module, only two were read. The output of the python script should look like this:
Writing 'aaa' to proc file
Wrinting 'bbb' to proc file
Writing 'ccc' to proc file
Giving a little time for module to work
Reading from proc file...
Expected = 'AAA' | Actual = 'AAA'
Expected = 'BBB' | Actual = 'BBB'
Expected = 'CCC' | Actual = '<empty>'
The third read is empty because only two executions of the work function are actually queued. The module tried to re-queue work that was already sitting in the work queue. 'ccc' is stuck in work_buf so 'CCC' is never put into out_buf. This is verified by looking at the module output with dmesg:
{CAP_QUEUE} <unload_cap_queue>
{CAP_QUEUE}    add_work called 3
{CAP_QUEUE}    do_work called 2
{CAP_QUEUE} <drain_queue>
Notice that while add_work was called 3 times, do_work was only called twice.
 
    Jeffrey Hatton
        question: Device driver question: does open() create a lock on some /dev file?
        answer: By default, open() does not create a lock. To have a lock created on open(), the lock function in the corresponding struct file_operations object needs to be implemented. For device files, this is almost never done. Instead, if a lock is needed, a pseudo lock is created using semaphores or mutexes in the open() function. Alain Kuchta supplies the following links:
   Understanding concurrent file writes from multiple processes: here
   File locking: here
   Talking to Device Files (writes and IOCTLs): here
 
    Jeffrey Hatton
        question: Why does cat give a 64k buffer for reads into procfs?
        answer: The search turned up two possible reasons:
  1. cat creates a pipe between the terminal process and the procfs read. By looking into the source code for pipes Jeffrey found that the default pipe size is 16 memory pages and with each page being 4k each, it would come out to be a 64k buffer.
  2. The other possibility is that 64k is just the default size for cat probably following the same logic as above.
 
    Jeffrey Hatton
        question: Device driver question: I would like to see an example where the offset is actually used on read or write?
        answer: forthcoming
 
    Alain Kuchta
        question: Device driver question: why does driver-02.c work without using copy_from_user?
        answer: Using strcpy 'worked' because, luckily, the buffer pointer was a legitimate pointer to an actual user space buffer. copy_from_user performs safety checks on the pointer provided to try to guard the kernel from crashing if the pointer is invalid or a page fault occurs while trying to access the buffer. The pointer is not translated or changed at all during this process, and once the checks are done, the kernel effectively does an optimized memcpy on the data (It's important to remember that the kernel can access any memory it wants to). So, strcpy 'works' in that for modern x86 systems, this is all that is actually needed to move the data (on other architectures it may not be the case, for example on ARM an interrupt is needed to change memory mode). But, this is a serious safety issue because trivial programming mistakes (like passing a null pointer) made in user space could crash the kernel if copy_from_user is not used.
ERC
MainStreet
Paul Erdos
NIT
Ladies on Campus
Oscar Robinson