Forks and Threads
A process created using the UNIX fork() function is expensive
in setup time and memory space. In fact it is sometimes called a
heavyweight process. Properties of a heavyweight process are:
- Heavyweight processes run independently and do not share resources
- They consist of code, stack, and data
The fork() system call in UNIX causes creation of a new
process. The new process (called the child process) is an
exact copy of the calling process (called the parent
process) except for the following:
- The child process has a unique process ID.
- The child process has a different parent process ID (i.e., the
process ID of the parent process).
- The child process has its own copy of the parent's descriptors.
These descriptors reference the same underlying objects, so that, for
instance, file pointers in file objects are shared between the child
and the parent, so that an lseek(2) on a descriptor in the child
process can affect a subsequent read(2) or write(2) by the parent.
This descriptor copying is also used by the shell to establish
standard input and output for newly created processes as well as to
set up pipes.
- The child process' resource utilizations are set to 0; see
setrlimit(2).
- All interval timers are cleared; see setitimer(2).
The return value from fork() is used to distinguish the
parent from the child; the parent receives the child's process id, but
the child receives zero.
Often it is sufficient to run a partial copy of a process with the
other parts shared with other processes. Such copies can be realized
by a thread (an example of a lightweight process) which
has the following properties:
- Resources and data can be shared between threads
- Each thread has its own stack
- Context switching is fast
A thread is a stream of instructions that can be scheduled as
an independent unit. It is important to understand the difference
between a thread and a process. A process contains two kinds
of information: resources that are available to the entire process
such as program instructions, global data and working directory, and
schedulable entities, which include program counters and stacks. A
thread is an entity within a process that consists of the schedulable
part of the process.
A fork() duplicates all the threads of a process. The
problem with this is that fork() in a process where threads
work with external resources may corrupt those resources (e.g.,
writing duplicate records to a file) because neither thread may know
that the fork() has occurred.
When a new perl thread is created, all the data associated with the
current thread is copied to the new thread, and is subsequently
private to that new thread! This is similar in feel to what happens
when a UNIX process forks, except that in this case, the data is just
copied to a different part of memory within the same process rather
than a real fork taking place.
A fork() induces a parent-child relationship between two
processes. Thread creation induces a peer relationship
between all the threads of a process.
- The main() thread runs first, but has no other priority
- The main thread may have some special properties
Processes and threads can
- Terminate
- Start a new process or thread
- Wait for each other to terminate
- Terminate each other
- Share data and communicate
- Run independently