Upgrading and Repairing Servers

Servers are meant to run without much attention the great majority of the time. Therefore, you might think that there isn't much regular maintenance involved. That's largely true, but there are tasks that you need to do on a regular basis in order to protect your data. As your server chugs along, you have to pay attention to its status by doing a regular reading of the server's vital signs, its performance trends, and its event logs.

When it's time to do routine maintenance on your server, chances are that the work will disrupt the service that the server is doing for your clients/users. This results in the dreaded "Server is down for maintenance" message. Timing and notice are, of course, important. You should notify all users with reasonable notice prior to bringing down any essential service. Given how busy people are these days, it's not a bad idea to start giving daily notices about a week prior to maintenance. On the day of maintenance, you should probably post a couple notices during the day.

There's some contention about when is the best time to perform server maintenance. Most people recommend after-hours, and often late at night, because that will affect the fewest people. Many people recommend doing maintenance on Friday and Saturday nights because if something goes wrong, you still have at least one weekend day to fix the problem before the work week begins. A minority recommendation is that Monday nights are the best time because if there is a problem, you are in the beginning of a work week, and you have additional help and service resources available to you to fix the problem.

Backup and replication are probably the most important pieces of routine maintenance. With modern server technologies, you shouldn't have to bring a server offline to perform a backup. However, a backup will affect your server's performance. Therefore, you should also do backup and replication at low-activity time.

The sections that follow look at a few additional routine maintenance tasks:

  • Drive and media testing

  • Routine cleaning

  • Virus and spyware checking

  • Disk defragmentation

Drive Testing

If your server starts to experience read or write errors, or if you see stop errors from the operating system or random weird behavior, it's not a bad idea to test your hard drives to see if there is a problem with a drive or with some portion of it. In a high-end RAID or storage solution, diagnostic software is usually included as part of the package. If you are lucky, that software runs in the background or runs periodically. However, you might need to run such a utility manually.

All modern operating systems ship with a diagnostic disk utility for testing a drive, the file system that's on it, and the data structures contained on the drive, such as indexes. Because your operating system actually formats the disk, writes the file system, and writes the files, it's a good idea to start drive testing by using the tools that the operating system supplies.

It's good practice to run a disk checking utility from time to time because even if you aren't experiencing a problem at the moment, the utility can find damaged sectors and mark them so that they can't damage future files or I/O. It may take a while to do a disk check on a large drive, so it's best to perform these tasks at time of low workload. Also, if your system is highly available, you may have to remove the volume to test your drives, and despite the hassle, you should do so from time to time. Let's briefly look at a couple of examples of how this is done, first with Windows Server 2003 and then with Sun Solaris.

Checking a Disk in Windows Server 2003

When Windows uses a hard drive, particularly a system drive, it puts a lock on the drive, preventing it from being low-level tested on a bit-by-bit basis. The original DOS CHKDSK disk checking utility was modified in Windows and is now called Check Disk. However, just like CHKDSK, Check Disk has to be run from outside Windows on an unmounted volume. It's most often run at startup.

You run Check Disk by doing the following:

1.

Open My Computer, right-click the drive to be tested, and then select Properties.

2.

Click the Tools tab and then click the Check Now button in the Error Testing section. The Check Disk dialog box appears, as shown in Figure 21.2.

Figure 21.2. The Check Disk dialog box shown in Windows Server 2003.

3.

Click on both the Automatically Fix File System Errors and the Scan for and Attempt Recovery of Bad Sectors check boxes and then click OK.

4.

When the Check Disk utility posts a dialog box, asking you if you would like to schedule the check when Windows next restarts, click Yes.

5.

Close all dialog boxes and restart your system. Scanning a large drive can take some time, so it's best left for periods when the system will be lightly used, if at all.

Note

In the Windows 2000 operating system, the Check Disk utility is referred to as ScanDisk.

Checking a Disk in Sun Solaris

With Sun Solaris, the disk check utility fsck is rather similar to its Windows equivalent. It is fully documented in the Solaris man pages, and you can find its procedure for use at www.cs.manchester.ac.uk/solaris/sun_docs/C/solaris_9/SUNWaadm/SYSADV1/p145.html#FSTROUBLEFSCK-28. In a nutshell, fsck checks the file system for data integrity. The drive being tested needs to be unmounted, and some file systems aren't supported. (However, UFS is supported.) fsck can be used with a number of switches, and those switches differ, depending on the version of UNIX you use.

One good utility for disk repairs and testing is Gibson Research's SpinRite 6.0 (see http://grc.com/spinrite.htm). This utility finds bad sectors, predicts the ones that are likely to fail, reads the data out of that region, and then marks off the problem areas. SpinRite is a replacement for the original DOS CHKDSK command and does a read-only surface scan when run. ScanDisk also does a read-only scan and can mark off a bad sector, but SpinRite does data pattern testing, defect scrubbing, data relocation, and sector repair and recovery in addition. At a price around $100, SpinRite pays for itself many times over and works with Windows, Linux, and other file systems.

When a disk starts to fail, you should start to see more read and write errors in your event log. It's a good idea to set up an alert system so that when one of these factors starts to increase beyond what's normal, you get a message. At that point, you should run your standard disk diagnostics and make sure that your backups are in good order.

Disk checking utilities have some significant limitations. For one thing, they don't work on a mounted volume. Also, they work only on drives that your operating system manages. Your operating system cannot perform a low-level disk check on any drive attached to a disk controller. For those types of disks, you need to use a compatible utility. SpinRite, for example, can work with some RAID 0 or RAID 1 configurations, where the RAID is an on-board chip and isn't managed by its own processor. As a general rule, SpinRite's maker says that if DOS can recognize a RAID volume, then SpinRite can check that volume. Chances are that any controller you buy has an on-board processor on it.

Defragmentation

Disk defragmentation is among the most common tasks that administrators perform. Although you can experience measurable performance improvement when you defragment a single user's single-disk workstation, when you move to a server with multiple users running software on multiple disks, the benefit of disk defragmentation becomes less apparent. Most servers run RAID configurations that stripe their files across several disks, and multiple heads access those disks to satisfy a complex workload, so rearranging files to make them contiguous doesn't have a lot of impact.

Note

Keep in mind that when you copy a disk's contents to another location on a file-orientedor file-by-filebasis, the target system is automatically defragmented. This is not true of a bit-level or XCOPY operation, where each bit is copied bit-by-bit and each sector is copied sector-by-sector. In that case, the fragmentation is preserved. Server backup programs fall under both categories: file oriented and bit oriented. If you do a file-level backup, you end up with a defragmented disk if you did a disk-to-disk copy. If you have backed up to tape, then a restore from tape will also result in a defragmented disk.

In studies of server systems running older versions of Windows server systems, system performance improvements of between 7% and 11% have been achieved. That's not insignificant, but it isn't dramatic either. When many more disks are involved, the performance improvement is even smaller. Still, there is little downside to defragging your disk system other than the load that the operation places on your system: When you perform the operation at a time of low activity (at night, for example), the benefits justify the effort.

Keep in mind that most defragmentation tools need to have 15% of the drive free in order to perform a defragmentation.

Antivirus and Firewall Software and Systems

Unfortunately, it has become necessary to deploy antivirus software at several levels in the enterprise. Antivirus software should be costed into any server deployment and should be on virtually every computer on your network.

Many analysts believe the following:

  • General antivirus server software should be deployed at all firewalls (perimeter protection). The best protection is achieved using software that provides virus scanning capability and enforces a set of policy rules. When an antivirus software program scans a system's files, it is looking for the signatures of known viruses. Those signatures can be code snippets, registry entries, filenames, file types, file locations, and so forth.

  • Domain servers should be locked down as much a possible.

  • Application servers should run specialized antivirus software that may be effective in preventing viruses from attacking that type of application. Antivirus vendors sell specialized software for messaging applications such as Exchange and Domino; databases such as SQL Server, Oracle, and DB2; complete office solutions such as Microsoft Small Business Server; and so forth.

  • Antivirus software and personal firewalls should be installed on desktops and workstations. It's important to keep this software up-to-date so that it can detect current threats and so its scanning protection is enabled.

  • Antivirus software and personal firewalls must be installed on laptops, handheld PDAs, and networked wireless devices, as well as any traveling systems.

This four-pronged approach to virus management is illustrated in Figure 21.3.

Figure 21.3. Antivirus software should be deployed at several layers of a network in order to provide protection from attack from without and attack from within.

The two most important places to stop viruses and spyware are at the firewall and at the desktop. If you can stop viruses and spyware before they gain entry to your network, you can save your systems from performing a lot of extra work. The desktop is also very important because it is impossible to stop 100% of all threats; mistakes and accidents happen. By being vigilant at the client level, you can catch problems before they make it through to the entire network.

In the past, most large ISPs considered virus and spyware protection to be something that their clients must do, and not the responsibility of the ISP itself. However, as the industry has matured and become even more competitive, many ISPs are now offering virus and spyware protection tools as part of their standard packagesto businesses as well as to consumers. Some companies offer a prescanning service either by redirecting traffic through their servers or as part of their hosting package. It's worth seeking out this additional protection.

It's useless to spend large sums of money installing antivirus software on servers if you don't install the appropriate firewalls that allow a worm or Trojan Horse to gain access to your system through a back door. Similarly, it is useless to position firewalls and antivirus software on your servers to protect the front end of your network if you have mobile systems that can propagate viruses and other malware when they are reattached to your network. Failure to protect mobile systems is one of the major errors many companies make when they implement security systems. (Failure to properly back up mobile systems is another issue that doesn't get enough attention from network administrators.)

Antivirus software and firewalls that use a packet-sniffing approach have two cost components. The first cost is the up-front cost of the software and the yearly subscription costs for servers and clients. The second cost, one that most people don't account for, is the reduction in server performance for any server that is forced to run this type of software. You can figure that antivirus software will reduce your server's performance anywhere from 5% to 15% in most cases. However, when you are under attack, you may find that your antivirus software's consumption of server resources goes up dramatically.

Note

One common method for virus propagation is through network shares. When some viruses detect a network share that they can copy themselves, they take full advantage of the opportunity. Propagation of the virus then proceeds when the virus searches for other network shares and copies executable software to those directories. When you set your shares' permissions, you should make this type of propagation mechanism difficult or impossible.

Rather than just guess, it's a good idea to measure the impact of your antivirus software by using a performance monitoring tool to gauge the increase in activity for your CPU(s), the increased memory usage, and, to a much lesser degree, disk I/O. Keep in mind that unusual situations occur when you are under attack. If you are experiencing an attack of an email worm or mass mailing, your messaging server's antivirus software may consume considerably more resources than it would under normal conditions.

More and more appliances are becoming available for front-end networks and server systems. Few companies put their most significant firewalls on their application or domain servers these days. The trend toward self-contained appliances probably means that more products will move their antivirus programs off servers. In fact, many higher-end firewalls also come with antivirus software that is either built in or can be activated with payment of an additional fee. If you deploy a system like that, be sure that your firewall's antivirus program doesn't interfere with anything you are deploying on workstations, desktops, or laptops. Often firewalls download client antivirus software (and in some cases, firewall clients as well), and they can interfere with other products. It's good to test these issues before you commit to a vendor's product.

Keeping Case Fans and Filters Clean

Very little attention is paid to keeping servers clean inside. For the most part, you can get away with negligence because modern electronics are often sealed devices and relatively impervious to dust. However, neglecting to keep a server clean may get you in trouble when you write to an optical disc inside the server, where a dust particle can interfere with the laser that is writing to the disc. Other components, particularly mechanical components, can fail. Also, dirty heat sinks aren't nearly as effective as clean surfaces at shedding heat.

The heat modern processors throw off is truly amazing. In demonstrations, people have cooked eggs on working processors. That may not be something you actually want to do, but it does speak to the temperatures involved. You only have to consider Intel's BTX replacement platform for the ATX casing, what Intel refers to as the "heat advantaged chassis." In the BTX form factorcompliant cases, large (massive?) heat sinks sit on top of the processors and pull air in from outside the case to cool the processor.

Very few modern cases and fans filter the air coming into the case. You can dramatically lower the amount of dust and dirt in a system if you follow these simple rules:

  • Make sure your air flows in one direction. Systems usually have air flowing from the front of the case to the back of the case, with fans blowing in at the entrance point and fans blowing out at the exit point.

  • Close all holes in the case that don't have significant airflow out of them. Don't leave the opening of an expansion slot uncovered, for example.

  • Place filters so that all air coming into the system is filtered.

Filtered fans can be a little hard to find, and many server OEMs don't go to the trouble to use them. Still, it's worth the effort to add them. You can find fan filters at a number of online stores because they tend to be popular with gamers. A filter is a flat panel that screws onto the outside of your case, in front of the fan.

Tip

To filter incoming air in a low-cost way, consider cutting out an appropriate-sized cardboard frame and then gluing to it a cut piece of nylon panty hose. Don't stretch the nylon when you are fitting it to the frame; it's a more effective filter when the pores are smaller sized.

When you're adding fans to a server case or replacing a stock heat sink or CPU cooler, size matters. Bigger fans can move more air, even when they rotate more slowly than smaller fans. Therefore, bigger fans are quieter. Similarly, bigger heat sinks can absorb more heat, and they can shed more heat than smaller heat sinks. When looking for high-efficiency cooling solutions for CPUs, you should try to mate the two: large heat sinks with large but slower-moving fans. For sound absorption, you can get good (but not great) results by adding to systems sound absorption panels from manufacturers such as Asaka.

At some point, you need to open your server and physically remove the dust that has accumulated inside it. The best way to do this is by using a canister of compressed gas specifically sold for dust removal. This canister should come with a long plastic tube spout that can be aimed into cramped spaces. It's important that the product you use leaves no residue. To see whether your compressed gas leaves a residue, spray it on a very clean piece of glass or a clean window and observe whether a film or particles are left behind. If the product you are considering using leaves a residue, you should replace it immediately with a better dust removal product.

When you are dusting out your case, it is fine to vigorously dust nonmechanical parts. Memory, hard drives, and other components aren't likely to be displaced or damaged by a stream of gas, and because they are enclosed, they can't be damaged. However, you should make sure you don't displace a wire or connection. More importantly, if you aim a stream of compressed air into a mechanism that is open, such as the door of a floppy disk drive, you can cause the drive to malfunction.

When you are trying to remove dust from a server, you need to be extremely careful when you are trying to clean an optical or mechanical drive. DVDs, CD-ROMs, tape drives, and floppy disk drives, among other components, contain delicate internal equipment such as aligned laser heads that can be damaged or misaligned if disturbed too vigorously. You should therefore clean internal drives like these with products that are designed for them. You can find tape cleaning cartridges for all types of tape drives, specially designed optical disks for cleaning CD/DVD drives, and so forth.

Категории