9.1. History and Overview When networking first became widely available in 4.2BSD, users who wanted to share files all had to log in across the net to a central machine on which the shared files were located. These central machines quickly became far more loaded than the user's local machine, so demand quickly grew for a convenient way to share files on several machines at once. The most easily understood sharing model is one that allows a server machine to export its filesystems to one or more client machines. The clients can then import these filesystems and present them to the user as though they were just another local filesystem. Numerous remote-filesystem protocol designs and protocols were proposed and implemented. The implementations were attempted at all levels of the kernel. Remote access at the top of the kernel resulted in semantics that nearly matched the local filesystem but had terrible performance. Remote access at the bottom of the kernel resulted in awful semantics but great performance. Modern systems place the remote access in the middle of the kernel at the vnode layer. This level gives reasonable performance and acceptable semantics. An early remote filesystem, UNIX United, was implemented near the top of the kernel at the system-call dispatch level. It checked for file descriptors representing remote files and sent requests on those descriptors off to the server. No caching was done on the client machine. The lack of caching resulted in slow performance but in semantics nearly identical to a local filesystem. Because the current directory and executing files are referenced internally by vnodes rather than by descriptors, UNIX United did not allow users to change directory into a remote filesystem and could not execute files from a remote filesystem without first copying the files to a local filesystem. At the opposite extreme was Sun Microsystem's network disk, implemented near the bottom of the kernel at the device-driver level. Here, the client's entire filesystem and buffering code was used. Just as in the local filesystem, recently read blocks from the disk were stored in the page cache. Only when a file access requested a block that was not already in the cache would the client send a request for the needed physical disk block to the server. The performance was excellent because the cache serviced most of the file-access requests just as it does for the local filesystem. Unfortunately, the semantics suffered because of incoherency between the client and server caches. Changes made on the server would not be seen by the client, and vice versa. As a result, the network disk could be used only by a single client or as a read-only filesystem. The first remote filesystem shipped with System V was RFS [Rifkin et al., 1986]. Although it had excellent UNIX semantics, its performance was poor, so it was seldom used. Research at Carnegie-Mellon lead to the Andrew filesystem [Howard, 1988]. The Andrew filesystem was commercialized by Transarc and eventually became part of the Distributed Computing Environment promulgated by the Open Software Foundation and was supported by many vendors. It is designed to handle widely distributed servers and clients and also to work well with mobile computers that operate while detached from the network for long periods. It has not seen wide commercial use, but it continues as a research vehicle. In the Microsoft family of operating systems, remote filesystem access is provided by the Common Internet File System (CIFS), which runs on top of the Server Message Block (SMB) [SNIA, 2002]. In FreeBSD, support for SMB and CIFS client and server is provided by Samba, which resides in /usr/ports/net/samba. Since this book deals with the kernel, and Samba runs mostly external to the kernel, we will not discuss it further. The most commercially successful and widely available remote-filesystem protocol is the network filesystem (NFS), originally designed and implemented by Sun Microsystems [Sandberg et al., 1985; Walsh et al., 1985]. There are two important components to the success of NFS. First, Sun placed the protocol specification for NFS in the public domain. Second, Sun sells that implementation to anyone who wants it, for less than the cost of implementing it themselves. Thus, most vendors chose to buy the Sun implementation. They are willing to buy from Sun because they know that they can always legally write their own implementation if the price of the Sun implementation goes up unreasonably. The 4.4BSD implementation was written from the protocol specification rather than being incorporated from Sun because of the developers' desire to be able to redistribute it freely in source form. The first widely released implementation of NFS was version 2 by Sun in 1984. Although version 3 was expected to be released within a year or two of version 2, it suffered several iterations of hugely complicated proposals before an incremental improvement on version 2 was finally released in 1992. The final release of 4.4BSD included an implementation of NFS that supported both versions 2 and 3. FreeBSD's NFS implementation is a direct descendant of the code released in 4.4BSD. The changes involve bug fixes and performance improvements along with keeping it working with the ever-evolving vnode interface. Although versions 2 and 3 of NFS were done entirely within Sun, the growing set of companies providing NFS-based products put increasing pressure on Sun to bring others into the design of NFS version 4. After much political maneuvering, Sun agreed to pass the responsibility for defining the specification of NFS version 4 to the Internet Engineering Task Force (IETF). Version 4 greatly expands the functionality of the earlier versions of NFS. It incorporates the mounting validation and locking that used to be done by separate daemons running their own protocols. File access and attribute semantics are expanded to interoperate more easily with other important protocols such as CIFS. At publication time (2004), the NFS version 4 specification is still a draft standard [Shepler et al., 2003]. There are people who are actively working on writing an NFS version 4 implementation for BSD, but it has not yet been tried in FreeBSD. The rest of this chapter examines the NFS version 3 that is currently deployed in FreeBSD. NFS was designed as a client-server application. Its implementation is divided into a client part that imports filesystems from other machines and a server part that exports local filesystems to other machines. The general model is shown in Figure 9.1. In FreeBSD, the kernel can be configured to support just the client, just the server, or both client and server. Many goals went into the NFS design: The protocol is designed to be stateless. Because there is no state to maintain or recover, NFS can continue to operate even during periods of client or server failures. Thus, it is much more robust than a system that operates with state. NFS is designed to support UNIX filesystem semantics. However, its design also allows it to support the possibly less rich semantics of other filesystem types, such as MS-DOS. The protection and access controls follow the UNIX semantics of having the process present a UID and set of groups that are checked against the file's owner, group, and other access modes. The security check is done by filesystem-dependent code that can do more or fewer checks based on the capabilities of the filesystem that it is supporting. For example, the MS-DOS filesystem cannot implement the full UNIX security validation, and it makes access decisions solely based on the UID. The protocol design is transport independent. Although it was originally built using the UDP datagram protocol in version 2, it was easily moved to the TCP stream protocol in version 3. It has also been ported to run over numerous other non-IP-based protocols. Figure 9.1. The division of NFS between client and server.
Some of the design decisions limit the set of applications for which NFS is appropriate: The design envisions clients and servers being connected on a locally fast network. The NFS protocol does not work well over slow links. When using UDP as the transport, it does not work well between clients and servers with intervening gateways. It also works poorly for mobile computing that has extended periods of disconnected operation. The caching model assumes that most files will not be shared. Performance suffers when files are heavily shared. The stateless protocol requires some loss of traditional UNIX semantics. Filesystem locking (flock) has to be implemented by a separate stateful daemon. Deferral of the release of space in an unlinked file until the final process has closed the file is approximated with a heuristic that sometimes fails. Despite these limitations, NFS proliferated because it makes a reasonable tradeoff between semantics and performance; its low cost of adoption has now made it ubiquitous. |