How Linux Works: What Every Superuser Should Know

2.4 Filesystems

A filesystem is a database of files and directories that you can attach to a Unix system at the root ( / ) or some other directory (like /usr ) in a currently attached filesystem. At one time, filesystems resided on disks and other physical media used exclusively for data storage. However, the tree-like directory structure and I/O interface of filesystems is quite versatile, so filesystems now perform a variety of tasks .

2.4.1 Filesystem Types

Linux supports an extraordinarily large number of filesystems, including native designs optimized for Linux, foreign types such as the Windows FAT family, universal filesystems like ISO9660, and others. The following list includes the most common types of filesystems for data storage; the type names as recognized by Linux are in parentheses next to the boldfaced filesystem names.

2.4.2 Creating a Filesystem

You cannot mount and store files on a partition that does not contain a filesystem. The partitioning process described in Section 2.3.4 does not create any filesystems; you must place the filesystems on the partitions in a separate step. To create a Second Extended (ext2) filesystem, use the mke2fs program on the target device, as in this example for /dev/hdc3 :

mke2fs /dev/hdc3

The mke2fs program automatically determines the number of blocks in a device and sets some reasonable defaults. Unless you really know what you're doing and feel like reading the mke2fs(8) manual page in detail, you shouldn't change these.

When you create a filesystem, you initialize its database, including the superblock and the inode tables . The superblock is at the top level of the database, and it's so important that mke2fs creates a number of backups in case the original is destroyed . You may wish to record a few of the superblock backup numbers when mke2fs runs, in case you need to recover it later in the event of a disk failure (see Section 2.4.8).

Warning  

Filesystem creation is a rare task that you should only need to perform after adding a new disk or repartitioning an old disk. You should create a filesystem just once for each new partition that has no preexisting data (or data that you want to remove). Creating a new filesystem on top of an existing filesystem will effectively destroy the old data.

Creating ext3 Filesystems

The only substantial difference between ext2 and ext3 filesystems is that ext3 filesystems have a journal file containing changes not yet written to the regular filesystem database. To create an ext3 filesystem, use the -j option to mke2fs :

mke2fs -j /dev/ disk_device

Don't worry if you forget the -j option when creating a filesystem. You can add a journal file to an existing filesystem with the utility. Here's an example:

tune2fs -j /dev/hda1

When upgrading a filesystem to ext3, don't forget to change the ext2 to ext3 in the /etc/fstab file.

2.4.3 Mounting a Filesystem

On Unix, the process of attaching a filesystem is called mounting . When the system boots, the kernel reads some configuration data and mounts / based on that data. To mount a filesystem, you must know the following:

When mounting a filesystem, the common terminology is "mount a device on a mount point." To learn the current filesystem status of your system, run mount . The output looks like this:

/dev/hda1 on / type ext2 (rw,errors=remount-ro) proc on /proc type proc (rw) /dev/hda3 on /usr type ext2 (rw) tmpfs on /dev/shm type tmpfs (rw) none on /proc/bus/usb type usbdevfs (rw)

Each line corresponds to one currently mounted filesystem, with items in this order:

To mount a filesystem, use the mount command as follows with the filesystem type, device, and desired mount point:

mount -t type device mountpoint

For example, to mount the Second Extended filesystem /dev/hdb3 on /home/extra , use this command:

mount -t ext2 /dev/hdb3 /home/extra

To unmount (detach) a filesystem, use the umount command:

umount mountpoint

See Section 2.4.6 for a few more long options.

2.4.4 Filesystem Buffering

Linux, like other versions of Unix, buffers (caches) all requested changes to filesystems in memory before actually writing the changes to the disk. This cache system is transparent to the user and improves performance because the kernel can perform a large collection of file writes at once instead of performing the changes on demand.

When you unmount a filesystem with umount , the kernel automatically synchronizes with the disk. At any other time, you can force the kernel to write the changes in its buffer to the disk by running the sync command. If (for whatever reason) you can't unmount a filesystem before you turn off the system, make sure that you run sync first.

2.4.5 Filesystem Mount Options

There are many ways to change the mount command behavior. This is often necessary with removable media or when performing system maintenance.

The total number of mount options is staggering. The very extensive mount(8) manual page is a good reference, but it's hard to know where to start and what you can safely ignore.

Options fall into two rough categories: general options and filesystem-specific options. General options include -t for specifying the filesystem type, which was mentioned earlier. By contrast, a filesystem-specific option pertains only to certain filesystem types. To activate a filesystem option, use the -o switch followed by the option. For example, -o norock turns off Rock Ridge extensions on an ISO9660 filesystem, but it has no meaning for any other kind of filesystem.

Short Options

The most important general options are the following:

Long Options

Short options like -r are too limited for the ever-increasing number of mount options; there are too few letters in the alphabet to accommodate all possible options. Short options are also troublesome because it is difficult to determine an option's meaning based on a single letter. Many general options and all filesystem-specific options use a longer, more flexible option format.

To use long options with mount on the command line, start with -o and supply some keywords. Here is a complete example with the long options in boldface:

mount -t vfat /dev/hda1 /dos -o ro,conv=auto

There are two long options here, ro and conv=auto . The ro option specifies read-only mode, and it is the same as the -r short option. The conv=auto option is a filesystem option telling the kernel to automatically convert certain text files from the DOS newline format to the Unix style (which will be explained shortly).

The most useful long options are the following:

2.4.6 The /etc/fstab Filesystem Table

To mount filesystems at boot time and take the drudgery out of the mount command, Linux systems keep a permanent list of filesystems and options in /etc/fstab . This is a plain text file in a very simple format, as this example shows:

/dev/hda1 / ext2 defaults,errors=remount-ro 0 1 /dev/hda2 none swap sw 0 0 /dev/hda3 /usr ext2 defaults 0 2 proc /proc proc defaults 0 0 /dev/hdc /cdrom iso9660 ro,user,nosuid,noauto 0 0

Each line corresponds to one filesystem, broken into six fields:

When using mount , you can take some shortcuts if the filesystem you want to work with is in /etc/fstab . For the example fstab above, to mount a CD-ROM, you need only run

mount /cdrom

You can also try to mount all entries in /etc/fstab that do not contain the noauto option at once, with this command:

mount -a

You may have noticed some new options in the preceding fstab listing, namely defaults , errors , noauto , and user . These aren't covered in Section 2.4.5 because they don't make any sense outside of the /etc/fstab file. The meanings are as follows:

2.4.7 Filesystem Capacity

To view the size and utilization of your currently mounted filesystems, use the df command. The output looks like this:

Filesystem 1024-blocks Used Available Capacity Mounted on /dev/hda1 1011928 71400 889124 7% / /dev/hda3 17710044 9485296 7325108 56% /usr

The listing has the following fields:

It is relatively easy to see that the two filesystems here are roughly 1GB and 17.5GB in size. However, the capacity numbers may look a little strange because 71400 + 889124 does not equal 1011928, and 9485296 does not constitute 56 percent of 17710044. In both cases, 5 percent of the total capacity is unaccounted for. Nevertheless, the space is there. These hidden blocks are called the reserved blocks, and only the superuser may use the space if the rest of the partition fills up. This keeps system servers from immediately failing when they run out of disk space.

If your disk fills up and you need to know where all of those space-hogging, illegal MP3s are, use the du command. With no arguments, du prints the disk usage of every directory in the directory hierarchy, starting at the current working directory. (That's kind of a mouthful, so just run cd /; du to get the idea. Press CONTROL-C when you get bored.) The du -s command turns on summary mode to print only the grand total. If you want to evaluate a particular directory, change to that directory and run du -s * .

Note  

1024-byte blocks in df and du output is not the POSIX standard. Some systems insist on displaying the numbers in 512-byte blocks. To get around this, use the -k option (both utilities support this). The df program also supports the -m option to list capacities in one-megabyte blocks.

The following pipeline is a handy way to create a searchable output file ( du_out ) and see the results on the terminal at the same time.

du tee du_out

2.4.8 Checking and Repairing Filesystems

The optimizations that Unix filesystems offer are made possible by a sophisticated database-like mechanism. For filesystems to work seamlessly, the kernel has to trust that there are no errors in a mounted filesystem. Otherwise , serious errors such as data loss and system crashes can happen.

The most frequent cause of a filesystem error is shutting down the system in a rude way (for example, with the power switch on the computer). The system's filesystem cache in memory may not match the data on the disk, and the system also may be in the process of altering the filesystem when you decide to give the computer a kick. Even though a new generation of filesystems supports journals to make filesystem corruption far less common, you should always shut the system down properly (see Section 3.1.5). Furthermore, filesystem checks are still necessary every now and then as sanity checks.

You need to remember one command name to check a filesystem: fsck . However, there is a different version of this tool for each filesystem type that Linux supports. The information presented here is specific to second and third extended (ext2/ext3) filesystems and the e2fsck utility. You generally don't need to type e2fsck , though, unless fsck can't figure out the filesystem type, or you're looking for the e2fsck manual page.

To run fsck in interactive manual mode, use the device or the mount point (in /etc/fstab ) as the argument. For example:

fsck /dev/hdd1

Warning  

Never use fsck on a mounted filesystem. The kernel may alter the disk data as you run the check, causing mismatches that can crash your system and corrupt files. There is only one exception. If you mount the root as read-only in single user mode, you may use fsck on the root filesystem.

In manual mode, fsck prints verbose status reports on its passes , which should look something like this when there are no problems:

Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information /dev/hdd1: 11/1976 files (0.0% non-contiguous), 265/7891 blocks

If fsck finds a problem in manual mode, it stops and asks you a question relevant to fixing the problem. These questions deal with the internal structure of the filesystem, such as reconnecting loose inodes and clearing blocks. The reconnection business means that fsck found a file that doesn't appear to have a name; reconnecting places the file in the lost+found directory filesystem as a number. You need to guess the name based on the content of the file.

In general, it's pointless to sit through the fsck process if you just made the mistake of an impolite shutdown. e2fsck has a -p option to automatically fix silly problems without asking you, aborting if there is a serious error. This is so common that Linux distributions run some variant of fsck -p at boot time ( fsck -a is also common).

However, if you suspect that there is some major disaster, such as a hardware failure or device misconfiguration, you need to decide on a course of action, because fsck can really mess up a filesystem with larger problems. A telltale sign of a serious problem is a lot of questions in manual mode.

If you think that something really bad happened , try running fsck -n to check over the filesystem without modifying anything. If there's some sort of problem with the device configuration (an incorrect number of blocks in the partition table, loose cables, whatever) that you think you can fix, then fix it before running fsck for real. You're likely to lose a lot of data otherwise.

If you suspect that only the superblock , a key filesystem database component, is corrupt (for example, someone wrote to the beginning of the disk partition), you might be able to recover the filesystem with one of the superblock backups that mke2fs creates. Use fsck -b num to replace the corrupted superblock with an alternate at block num .

You may not know where to find a backup superblock, because you didn't write the numbers down when mke2fs ran. If the filesystem was created with the default values, you can try mke2fs -n on the device to view a list of superblock backup numbers without destroying your data (again, make dead sure that you're using -n , because you'll really tear up the filesystem otherwise).

If the device still appears to function properly except for a few small parts , you can run fsck -c before a manual fsck to search for bad blocks. Such a failure is somewhat rare.

Checking ext3 Filesystems

You normally do not need to check ext3 filesystems because the journal ensures data integrity. However, you may wish to mount an ext3 filesystem in ext2 mode. The kernel will not mount an ext3 filesystem that contains a non-empty journal (if you don't shut your system down cleanly, you can expect that the journal contains some data). To flush the journal in an ext3 filesystem to the regular filesystem database, run e2fsck as follows:

e2fsck -fy /dev/ disk_device

The Worst Case

Disk problems that are worse in severity leave you with few choices:

In both cases, you still need to repair the filesystem before you mount it (unless you feel like picking through the raw data by hand). To answer y to all of the fsck questions, use fsck -y , but do this as a last resort.

Note  

There is an advanced utility called debugfs for users with in-depth knowledge of filesystems, or for those who feel like experimenting on a filesystem that isn't important.

If you're really desperate, such as in the event of a catastrophic disk failure without backups, there isn't a lot you can do other than try to get a professional service to "scrape the platters."

2.4.9 Special-Purpose Filesystems

Not all filesystems represent storage on physical media. Most versions of Unix have filesystems that serve as system interfaces. This idea goes back along way; the /dev mechanism is an early model of using files for I/O interfaces. The /proc idea came from the eighth edition of research Unix [Killian]. Things really got rolling when the people at Bell Labs (including many of the original Unix designers) created Plan 9 [Bell Labs], a research operating system that took filesystem abstraction to a whole new level.

The special filesystem types in common use on Linux include the following:

Категории