A Historic Look at the fork() Call The original UNIX fork() call required that the full scope of the calling process's kernel run environment be duplicated. This included the allocation of new process and thread table structures, the duplication of the entire vas/pregion structures, and creation of new region structures in the case of private memory regions and copies of their data pages. If the fork() was called in prelude to an exec() call, then much of this work was almost immediately deconstructed, as the kernel replaced the original process view with the new one. Another Approach: The Berkeley Virtual Fork With the introduction of Berkeley's virtual memory UNIX kernel, a new "virtual" fork call was added to the growing system call list. The Berkeley kernel hacks decided that in the case of a "fork() immediately followed by exec()" scenario, it would make sense to simply allow the new child process to run in the parent's virtual memory space until it was time to start building its own view. This methodology was implemented in the Berkeley vfork() call. The idea behind the vfork() may seem very logical at first, but it flies in the face of many basic UNIX rules. The basic kernel memory management subsystem is designed specifically to make sure that one process doesn't corrupt the memory space of another. With the vfork(), there is no such protection between the parent and the newly created child. A child process of a vfork() should not modify any of the parent's data, but this is only a programming convention, and in actual practice there are no guarantees. To some, this aspect of Berkeley's vfork() was considered a bug, while other creative programmers considered it a feature and used it to provide a very primitive form of threading. We don't recommend this approach by any means! To address these concerns, Hewlett-Packard made several modifications to the basic fork() and vfork() calls over the years. HP's New and Improved fork() To improve system performance and avoid the potential corruption of the parent data by the child during a vfork(), HP reworked the mechanics of the basic fork() call and redirected the vfork() call to the same entry point in the kernel. Hewlett-Packard's fork() was built around the concept of copy-on-write for the parent and copy-on-access for the child (referred to after this as copy-on-write/access). As in the older fork model and as illustrated in Figure 9-2, a new process structure is allocated and the parent's vas, pregion, and private memory region structures are copied. What makes this version different is that while private memory region structures are copied, their current in-core pages are not. Active vfd entries in the newly copied child region point to the same physical pages as those of the parent. This means that the parent and child have access to the same physical data space. To assure that no data corruption occurs, all the core resident data pages have their pdir entries, and tlb entries invalidated. Figure 9-2. The fork()
When a page fault occurs for one of these pages, the kernel fault handler discovers a valid vfd entry. If the parent causes the fault, there is an entry in the pdir; if the child causes the fault, there is no pdir entry, as it is using a newly allocated space ID and thus has its own unique virtual page number. This seeming inconsistency in the synchronization of the various kernel and process memory-mapping structures is used to determine the disposition of the fault. If either thread (parent or child) is attempting a first write, a copy of the page data is made and mapped to the faulting thread's region. A first read request by the parent is allowed, but if the request is from the child (indicated by the lack of a valid pdir entry), the kernel makes a copy of the data page for the child and maps it in the appropriate kernel tables. In this manner, each process ends up with its own private copy of the data, but the overhead of creating the page copies is postponed until they are actually accessed. If the fork() is followed by an exec(), we will not have wasted our time copying pages unnecessarily. Later, Hewlett-Packard upgraded its fork() call to work with full copy-on-write access rules. As the parent and the child each has its own private data quadrants with unique space IDs, the kernel needed the ability to alias a single physical page to multiple virtual pages. This feature was introduced with the HP-UX 10.0 kernel release. The advantage to true copy-on-write is that a read access by either the parent or the child is allowed to function in a shared mode, avoiding the overhead of a page copy. In the case of a write request, a copy is made if the page shows more than one virtual address currently mapped (this means that the last to access the page ends up with the original copy). The only complication to the HP approach is that since we create new private memory regions, swap reservations must be made. When this feature was first introduced in a workstation release of HP-UX several years back, it created a bit of a predicament. If a large parent program, such as a computer-aided design program with a relatively large private data region, needed to fork a simple child, say, to perform some menial task such as checking the user's email, the initial fork() resulted in a large swap reservation, which was immediately deconstructed by the following exec() call. This required that the system be configured with a large amount of swap space that wasn't actually used. As disk space was still quite expensive at the time, the original Berkeley vfork() was patched back into the kernel to avoid having to over state the system's swap space. Hewlett-Packard also revisited the vfork() call and modified it to allow the child to simply map the parent's vas, pregion, and region structures to its proc table. Because this gives the child unrestricted access to the parent's data space, the only supported use of this call is when it is immediately followed by an exec() or exit() in the child's logic. As you may be thinking, this gets to be a bit confusing, so let's clear the deck a bit and create a timeline of the various flavors of fork() and vfork() that have been and are being used. Then, we examine the semantics of the current implementation of each call. HP-UX 7.x: fork() uses the original UNIX model and copies the parent's data area, and vfork() uses the original Berkeley model, sharing the data area between the parent and the child. HP-UX 8.x: HP-UX servers and workstations used the new copy-on-write/access fork() model implemented to handle both fork() and vfork() calls. The previously mentioned issue with swap reservation caused a patch for the workstation release of the O/S, which reimplements the original vfork() code. HP-UX 9.x: Servers use the combined copy-on-write/access fork()/vfork() calls, while workstations used the copy-on-write/access fork() and the original Berkeley vfork(). HP-UX 10.x and beyond (at least through 11i, the current release at time of writing): Servers and workstations alike use the new, true copy-on-write fork() call and a newer improved version of the vfork() call. |