Operating Systems Design and Implementation (3rd Edition)
3.6. RAM Disks
Now we will get back to the individual block device drivers and study several of them in detail. The first one we will look at is the memory driver. It can be used to provide access to any part of memory. Its primary use is to allow a part of memory to be reserved for use like an ordinary disk, and we will also refer to it as the RAM disk driver. A RAM disk does not provide permanent storage, but once files have been copied to this are a they can be accessed extremely quickly. A RAM disk is also useful for initial installation of an operating system on a computer with only one removable storage device, whether a floppy disk, CD-ROM, or some other device. By putting the root device on the RAM disk, removable storage devices can be mounted and unmounted as needed to transfer data to the hard disk. Putting the root device on a floppy disk would make it impossible to save files on floppies, since the root device (the only floppy) cannot be unmounted. RAM disks also are used with "live" CD-ROMs that allow one to run an operating system for tests and demonstrations, without copying any files onto the hard disk. Having the root device on the RAM disk makes the system highly flexible: any combination of floppy disks or hard disks can be mounted on it. MINIX 3 and many other operating systems are distributed on live CD-ROMs. As we shall see, the memory driver supports several other functions in addition to a RAM disk. It supports straightforward random access to any part of memory, byte by byte or in chunks of any size. Used this way it acts as a character device rather than as a block device. Other character devices supported by the memory driver are /dev/zero, and /dev/null, otherwise known as the great bit bucket in the sky. 3.6.1. RAM Disk Hardware and Software
The idea behind a RAM disk is simple. A block device is a storage medium with two commands: write a block and read a block. Normally, these blocks are stored on rotating memories, such as floppy disks or hard disks. A RAM disk is simpler. It just uses a preallocated portion of main memory for storing the blocks. A RAM disk has the advantage of having instant access (no seek or rotational delay), making it suitable for storing programs or data that are frequently accessed. As an aside, it is worth briefly pointing out a difference between systems that support mounted file systems and those that do not (e.g., MS-DOS and Windows). With mounted file systems, the root device is always present and in a fixed location, and removable file systems (i.e., disks) can be mounted in the file tree to form an integrated file system. Once everything has been mounted, the user need not worry at all about which device a file is on. In contrast, with systems like MS-DOS, the user must specify the location of each file, either explicitly as in B: \ DIR \ FILE or by using certain defaults (current device, current directory, and so on). With only one or two floppy disks, this burden is manageable, but on a large computer system, with dozens of disks, having to keep track of devices all the time would be unbearable. Remember that UNIX-like operating systems run on hardware ranging from small home and office machines to supercomputers such as the IBM Blue Gene/L supercomputer, the world's fastest computer as of this writing; MS-DOS runs only on small systems. Figure 3-20 shows the idea behind a RAM disk. The RAM disk is split up into n blocks, depending on how much memory has been allocated for it. Each block is the same size as the block size used on the real disks. When the driver receives a message to read or write a block, it just computes where in the RAM disk memory the requested block lies and reads from it or writes to it, instead of from or to a floppy or hard disk. Ultimately the system task is called to carry out the transfer. This is done by phys_copy, an assembly language procedure in the kernel that copies to or from the user program at the maximum speed of which the hardware is capable. Figure 3-20. A RAM disk.
A RAM disk driver may support several areas of memory used as RAM disk, each distinguished by a different minor device number. Usually, these areas are distinct, but in some fairly specific situations it may be convenient to have them overlap, as we shall see in the next section. 3.6.2. Overview of the RAM Disk Driver in MINIX 3
The MINIX 3 RAM disk driver is actually six closely related drivers in one. Each message to it specifies a minor device as follows:
The first special file listed above, /dev/ram, is a true RAM disk. Neither its size nor its origin is built into the driver. They are determined by the file system when MINIX 3 is booted. If the boot parameters specify that the root file system is to be on the RAM disk but the RAM disk size is not specified, a RAM disk of the same size as the root file system image device is created. A boot parameter can be used to specify a RAM disk larger than the root file system, or if the root is not to be copied to the RAM, the specified size may be any value that fits in memory and leaves enough memory for system operation. Once the size is known, a block of memory big enough is found and removed from the memory pool by the process manager during its initialization. This strategy makes it possible to increase or reduce the amount of RAM disk present without having to recompile the operating system. The next two minor devices are used to read and write physical memory and kernel memory, respectively. When /dev/mem is opened and read, it yields the contents of physical memory locations starting at absolute address zero (the real-mode interrupt vectors). Ordinary user programs never do this, but a system program concerned with debugging the system might possibly need this facility. Opening /dev/mem and writing on it will change the interrupt vectors. Needless to say, this should only be done with the greatest of caution by an experienced user who knows exactly what he is doing. The special file /dev/kmem is like /dev/mem, except that byte 0 of this file is byte 0 of the kernel's data memory, a location whose absolute address varies, depending on the size of the MINIX 3 kernel text segment. It too is used mostly for debugging and very special programs. Note that the RAM disk areas covered by these two minor devices overlap. If you know exactly how the kernel is placed in memory, you can open /dev/mem, seek to the beginning of the kernel's data area, and see exactly the same thing as reading from the beginning of /dev/kmem. But, if you recompile the kernel, changing its size, or if in a subsequent version of MINIX 3 the kernel is moved somewhere else in memory, you will have to seek a different amount in /dev/mem to see the same thing you now see at the start of /dev/kmem. Both of these special files should be protected to prevent everyone except the superuser from using them. The next file in this group, /dev/null, is a special file that accepts data and throws them away. It is commonly used in shell commands when the program being called generates output that is not needed. For example, a.out >/dev/null
runs the program a.out but discards its output. The RAM disk driver effectively treats this minor device as having zero size, so no data are ever copied to or from it. If you read from it you will get an immediate EOF (End of File). If you have looked at the directory entries for these files in /dev/ you may have noticed that, of those mentioned so far, only /dev/ram is a block special file. All the others are character devices. There is one more block device supported by the memory driver. This is /dev/boot. From the point of view of the device driver it is another block device implemented in RAM, just like /dev/ram. However, it is meant to be initialized by copying a file appended to the boot image after init into memory, rather than starting with an empty block of memory, as is done for /dev/ram. Support for this device is provided for future use and it is not used in MINIX 3 as described in this text. Finally, the last device supported by the memory driver is another character special file, /dev/zero. It is sometimes convenient to have a source of zeros. Writing to /dev/zero is like writing to /dev/null; it throws data away. But reading /dev/zero gives you zeros, in any quantity you want, whether a single character or a disk full. At the driver level, the code for handling /dev/ram, /dev/mem, /dev/kmem, and /dev/boot is identical. The only difference among them is that each one corresponds to a different region of memory, indicated by the arrays ram_origin and ram_limit, each indexed by minor device number. The file system manages devices at a higher level. The file system interprets devices as character or block devices, and thus can mount /dev/ram and /dev/boot and manage directories and files on these devices. For the devices defined as character devices the file system can only read and write streams of data (although a stream read from /dev/null gets only EOF). 3.6.3. Implementation of the RAM Disk Driver in MINIX 3
As with other disk drivers, the main loop of the RAM disk driver is in the file driver.c. The device-specific support for memory devices is in memory.c (line 10800). When the memory driver is compiled, a copy of the object file called drivers/libdriver/driver.o, produced by compiling drivers/libdriver/driver.c, is linked with the object file drivers/memory/memory.o, the product of compiling drivers/memory/memory.c. It may be worth taking a moment to consider how the main loop is compiled. The declaration of the driver structure in driver.h (lines 10829 to 10845) defines a data structure, but does not create one. The declaration of m_dtab on lines 11645 to 11660 creates an instance of this with each part of the structure filled in with a pointer to a function. Some of these functions are generic code compiled when driver.c is compiled, for instance, all of the nop functions. Others are code compiled when memory.c is compiled, for instance, m_do_open. Note that for the memory driver seven of the entries are do-little or do-nothing routines and the last two are defined as NULL (which means these functions will never be called, there is no need even for a do_nop). All this is a sure clue that the operation of a RAM disk is not terribly complicated. The memory device does not require definition of a large number of data structures, either. The array m_geom[NR_DEVS] (line 11627) holds the base and size of each of the six memory devices in bytes, as 64 bit unsigned integers, so there is no immediate danger of MINIX 3 not being able to have a big enough RAM disk. The next line defines an interesting structure that will not be seen in other drivers. M_seg[NR_DEVS] is apparently just an aray of integers, but these integers are indices that allow segment descriptors to be found. The memory device driver is unusual among user-space processes in having the ability to access regions of memory outside of the ordinary text, data, and stack segments every process owns. This array holds the information that allows access to the designated additional memory regions. The variable m_device just holds the index into these arrays of the currently active minor device. To use /dev/ram as the root device the memory driver must be initialized very early during startup of MINIX 3. The kinfo and machine structures that are defined next will hold data retrieved from the kernel during startup that is necessary for initializing the memory driver. One other data structure is defined before the executable code begins. This is dev_zero, an array of 1024 bytes, used to supply data when a read call is made to /dev/zero. The main procedure main (line 11672) calls one function to do some local initialization. After that, it calls the main loop, which gets messages, dispatches to the appropriate procedures, and sends the replies. There is no return to main upon completion. The next function, m_name, is trivial. It returns the string "memory" when called. On a read or write operation, the main loop makes three calls: one to prepare a device, one to do the actual data transfer, and one to do cleanup. For a memory device, a call to m_prepare is the first of these. It checks that a valid minor device has been requested and then returns the address of the structure that holds the base address and size of the requested RAM area. The second call is for m_transfer (line 11706). This does all the work. As we saw in driver.c, all calls to read or write data are transformed into calls to read or write multiple contiguous blocks of dataif only one block is needed the request is passed on as a request for multiple blocks with a count of one. So only two kinds of transfer requests are passed on to the driver, DEV_GATHER, requesting a read of one or more blocks, and DEV_SCATTER, a request to write one or more blocks. Thus, after getting the minor device number, m_transfer enters a loop, repeated for the number of transfers requested. Within the loop there is a switch on the device type. The first case is for /dev/null, and the action is to return immediately on a DEV_GATHER request or on a DEV_SCATTER request to fall through to the end of the switch. This is so the number of bytes transferred (although this number is zero for /dev/null) can be returned, as would be done for any write operation. For all of the device types that refer to real locations in memory the action is similar. The requested offset is checked against the size of the device to determine that the request is within the bounds of the memory allocated to the device. Then a kernel call is made to copy data either to or from the memory of the caller. There are two chunks of code that do this, however. For /dev/ram, /dev/kmem, and /dev/boot virtual addresses are used, which requires retrieving the segment address of the memory region to be accessed from the m_seg array, and then making a sys_vircopy kernel call (lines 11640 to 11652). For /dev/mem a physical address is used and the call is to sys_physcopy. The remaining operation is a read or write to /dev/zero. For reading the data is taken from the dev_zero array mentioned earlier. You might ask, why not just generate zero values as needed, rather than copying from a buffer full of them? Since the copying of the data to its destination has to be done by a kernel call, such a method would require either an inefficient copying of single bytes from the memory driver to the system task, or building code to generate zeros into the system task. The latter approach would increase the complexity of kernel-space code, something that we would like to avoid in MINIX 3. A memory device does not need a third step to finish a read or write operation, and the corresponding slot in m_dtab is a call to nop_finish. Opening a memory device is done by m_do_open (line 11801). The job is done by calling m_prepare to check that a valid device is being referenced. More interesting than the code that exists is a comment about code that was found here in older versions of MINIX. Previously a trick was hidden here. A call by a user process to open /dev/mem or /dev/kmem would also magically confer upon the caller the ability to execute instructions which access I/O ports. Pentium-class CPUs implement four privilege levels, and user processes normally run at the least-privileged level. The CPU generates a general protection exception when an process tries to execute an instruction not allowed at its privilege level. Providing a way to get around this was considered safe because the memory devices could only be accessed by a user with root privileges. In any case, this possibly risky "feature" is absent from MINIX 3 because kernel calls that allow I/O access via the system task are now available. The comment remains, to point out that if MINIX 3 is ported to hardware that uses memory-mapped I/O such a feature might need to be reintroduced. The function to do this, enable_iop, remains in the kernel code to show how this can be done, although it is now an orphan. The next function, m_init (line 11817), is called only once, when mem_task is called for the first time. This routine uses a number of kernel calls, and is worth study to see how MINIX 3 drivers interact with kernel space by using system task services. First a sys_getkinfo kernel call is made to get a copy of the kernel's kinfo data. From this data it copies the base address and size of /dev/kmem into the corresponding fields of the m_geom data structure. A different kernel call, sys_segctl, converts the physical address and size of /dev/kmem into the segment descriptor information needed to treat the kernel memory as a virtual memory space. If an image of a boot device has been compiled into the system boot image, the field for the base address of /dev/boot will be non-zero. If this is so, then information to access the memory region for this device is set up in exactly the same way it was done for /dev/kme m. Next the array used to supply data when /dev/zero is accessed is explicitly filled with zeros. This is probably unnecessary; C compilers are supposed to initialize newly created static variables to all zeros. Finally, m_init uses a sys_getmachine kernel call to get another set of data from the kernel, the machine structure which flags various possible hardware alternatives. In this case the information needed is whether or not the CPU is capable of protected mode operation. Based on this information the size of /dev/mem is set to either 1 MB, or 4 GB - 1, depending upon whether MINIX 3 is running in 8088 or 80386 mode. These sizes are the maximum sizes supported by MINIX 3 and do not have anything to do with how much RAM is installed in the machine. Only the size of the device is set; the compiler is trusted to set the base address correctly to zero. Also, since /dev/mem is accessed as physical (not virtual) memory there is no need to make a sys_segctl kernel call to set up a segment descriptor. Before we leave m_init we should mention another kernel call used here, although it is not obvious in the code. Many of the actions taken during initialization of the memory driver are essential to proper functioning of MINIX 3, and thus several tests are made and panic is called if a test fails. In this case panic is a library routine which ultimately results in a sys_exit kernel call. The kernel and (as we shall see) the process manager and the file system have their own panic routines. The library routine is provided for device drivers and other small system components. Surprisingly, the function we just examined, m_init, does not initialize the quintessential memory device, /dev/ram. This is taken care of in the next function, m_ioctl (line 11863). In fact, there is only one ioctl operation defined for the RAM disk; this is MIOCRAMSIZE, which is used by the file system to set the RAM disk size. Much of the job is done without requiring any services from the kernel. The call to allocmem on line 11887 is a system call, but not a kernel call. It is handled by the process manager, which maintains all of the information necessary to find an available region of memory. However, at the end one kernel call is needed. At line 11894 a sys_segctl call is made to convert the physical address and size returned by allocmem into the segment information needed for further access. The last function defined in memory.c is m_geometry. This is a fake. Obviously, cylinders, heads, and sectors are irrelevant in addressing semiconductor memory, but if a request is made for such information for a memory device this function pretends it has 64 heads and 32 sectors per track, and calculates from the size how many cylinders there are. |
Категории