Forks and Threads

A process created using the UNIX fork() function is expensive in setup time and memory space. In fact it is sometimes called a heavyweight process. Properties of a heavyweight process are:
  1. Heavyweight processes run independently and do not share resources
  2. They consist of code, stack, and data
The fork() system call in UNIX causes creation of a new process. The new process (called the child process) is an exact copy of the calling process (called the parent process) except for the following:
  1. The child process has a unique process ID.
  2. The child process has a different parent process ID (i.e., the process ID of the parent process).
  3. The child process has its own copy of the parent's descriptors. These descriptors reference the same underlying objects, so that, for instance, file pointers in file objects are shared between the child and the parent, so that an lseek(2) on a descriptor in the child process can affect a subsequent read(2) or write(2) by the parent. This descriptor copying is also used by the shell to establish standard input and output for newly created processes as well as to set up pipes.
  4. The child process' resource utilizations are set to 0; see setrlimit(2).
  5. All interval timers are cleared; see setitimer(2).
The return value from fork() is used to distinguish the parent from the child; the parent receives the child's process id, but the child receives zero.

Often it is sufficient to run a partial copy of a process with the other parts shared with other processes. Such copies can be realized by a thread (an example of a lightweight process) which has the following properties:

  1. Resources and data can be shared between threads
  2. Each thread has its own stack
  3. Context switching is fast

A thread is a stream of instructions that can be scheduled as an independent unit. It is important to understand the difference between a thread and a process. A process contains two kinds of information: resources that are available to the entire process such as program instructions, global data and working directory, and schedulable entities, which include program counters and stacks. A thread is an entity within a process that consists of the schedulable part of the process.

A fork() duplicates all the threads of a process. The problem with this is that fork() in a process where threads work with external resources may corrupt those resources (e.g., writing duplicate records to a file) because neither thread may know that the fork() has occurred.

When a new perl thread is created, all the data associated with the current thread is copied to the new thread, and is subsequently private to that new thread! This is similar in feel to what happens when a UNIX process forks, except that in this case, the data is just copied to a different part of memory within the same process rather than a real fork taking place.

A fork() induces a parent-child relationship between two processes. Thread creation induces a peer relationship between all the threads of a process.

  1. The main() thread runs first, but has no other priority
  2. The main thread may have some special properties
Processes and threads can
  1. Terminate
  2. Start a new process or thread
  3. Wait for each other to terminate
  4. Terminate each other
  5. Share data and communicate
  6. Run independently