Microsoft IIS 6.0Administrator's Consultant

The Indexing Service extracts information from designated documents and organizes the results into a catalog that can be searched quickly and easily. The extracted information includes the content (text) within documents as well as document properties, such as the document title and author. To understand how the Indexing Service works, let’s look at the following subjects:

Using the Indexing Service

The Indexing Service indexes the following types of documents:

Other documents for which a document filter is installed can be indexed as well. The Indexing Service isn’t installed on your Web server by default, but you can install it using the Windows Components Wizard. To access and use this wizard, follow these steps:

  1. Log on to the computer using an account with administrator privileges.

  2. Access Control Panel. Double-click Add Or Remove Programs. This displays the Add Or Remove Programs dialog box.

  3. Start the Windows Components Wizard by clicking Add/Remove Windows Components.

  4. In the Components list box, select Indexing Service and then click Next to continue. The wizard then installs the Indexing Service.

  5. Click Finish when prompted.

Once you’ve installed the Indexing Service, you manage the service using the Indexing Service snap-in for the Microsoft Management Console (MMC) or the Indexing Service node in Computer Management. Regardless of the option you choose, you can work with both local and remote servers using the same techniques. The only task that’s different is connecting to remote servers.

With the Indexing Service snap-in, you set the server you want to work with when you add the snap-in to a management console. Here are the steps for adding the Indexing Service snap-in to a management console and selecting a server to work with:

  1. Open the Run dialog box by clicking Start and then clicking Run.

  2. Type mmc in the Open field and then click OK. This opens the MMC.

  3. In MMC, click File, and then click Add/Remove Snap-In. This opens the Add/Remove Snap-In dialog box.

  4. On the Standalone tab, click Add.

  5. In the Add Standalone Snap-In dialog box, click Indexing Service and then click Add.

  6. Select Local Computer to connect to the computer on which the console is running. Or select Another Computer and then type the name of a remote computer.

  7. Click Finish. Afterward, click Close and then click OK.

With the Computer Management console, you connect to the local server automatically when you start the utility. You can connect to a different computer by right-clicking the Computer Management node, selecting Connect To Another Computer, and then following the prompts. Figure 12-1 shows the Indexing Service node in the Computer Management console.

Figure 12-1: Use the Indexing Service node in the Computer Management console to manage the Indexing Service.

As you can see, selecting the Indexing Service node displays an overview of the currently installed catalogs, which include the default System and Web catalogs. The catalog summary provides the following information:

If you access the Indexing Service using Computer Management, you’ll find that two default catalogs were created when you installed the service. These catalogs are the following:

You can create additional catalogs at any time. When you create a catalog, you can associate the catalog with a Web site and an NNTP virtual server. The service then uses the indexing settings on the directories associated with the site or virtual server to determine which documents should be indexed. You configure indexing settings on directories as detailed in the section of this chapter entitled “Setting Web Resources to Index.”

Indexing Service Essentials

The Indexing Service stores catalog information in Unicode format. This allows the service to index and query content in multiple languages. The Indexing Service performs three main functions to process document contents:

Indexing and catalog building take place automatically in the background when the Indexing Service is running. When first started, the Indexing Service takes an inventory of the directories associated with each catalog to determine which documents should be indexed. This process is referred to as scanning. The Indexing Service can perform two types of scans:

Full scans take a complete look at all documents associated with a catalog. The Indexing Service performs a full scan under the following circumstances:

Incremental scans look only at documents modified since the last full or incremental scan. The Indexing Service performs incremental scans under the following circumstances:

After completing a scan of documents to be indexed, the Indexing Service begins to build the necessary catalogs. It does this by reading each document using a document filter. Filters are software components that interpret the structure of a particular kind of document, such as an ASCII text file, a Word document, or an HTML document. Using the appropriate filter, the Indexing Service extracts the document contents and property values, storing the property values and the path to the document in the index. Next, the Indexing Service uses the filter to determine the language in which the document is written and breaks the document body (content) into individual words. Each supported language has an exception list that provides a list of words that the Indexing Service should ignore.

You’ll find exception lists in the \%SystemRoot%\System32 directory. These files are stored as ASCII text files and are named Noise.lang, where lang is a three- letter extension that indicates the language of the exception list. You can add entries to or remove entries from the exception list using a standard text editor or word processor.

The Indexing Service also stores values of selected document properties in the property cache. The property cache is a storage place for values of properties that you might want to search on or display in the list of search results. Within the property cache are two storage levels: primary and secondary. The primary storage level is for values that are frequently accessed, and, as such, these values are stored in a way that makes them quick and easy to retrieve. The secondary storage level is for additional values that are used infrequently.

After discarding words on the exception list and updating the property cache, the Indexing Service stores the remaining document content in a word list. Each document can have one or more word lists associated with it. Word lists are combined to form temporary indexes called shadow indexes. Shadow indexes are stored on disk in a compressed file format. Multiple shadow indexes can be, and usually are, in the catalog at any given time. The Saved Indexes entry, mentioned previously, lists the number of shadow and master indexes in a catalog. Over time, the number of shadow indexes can grow substantially. This occurs as documents are added to and modified within indexed directories.

The Indexing Service uses a process called shadow merging to combine word lists and temporary indexes, thereby reducing the number of temporary resources used and improving the service’s overall responsiveness. Shadow merges occur during scans and as part of the normal housekeeping process implemented by the Indexing Service. The key events that trigger a shadow merge are when there are too many word lists stored in memory (1012 by default) or when the total size of all word lists exceeds a preset value (2560 KB by default).

The result of the indexing process is a master index. Each catalog has one, and only one, master index. The master index is created the first time you create a catalog and is kept up to date by periodically merging it with shadow indexes to create a new master index. This process of merging shadow indexes with the master index is called master merging. Once a master merge has occurred, there’s only one saved index associated with a catalog—namely, the master index.

Master merges are triggered automatically based on the size of the shadow indexes, the amount of free disk space on the catalog drive, and the number of document changes in indexed directories. Automatic master merges, regardless of condition, are scheduled to occur nightly at midnight as well. If necessary, you can force a master merge. The key reason for forcing a master merge is to cause the Indexing Service to update a catalog so that all changes are reflected in search results immediately. As you might imagine, the master merge process is resource-intensive, so you normally wouldn’t force a master merge during peak usage hours.

Settings that control scanning, merging, and other Indexing Service processes are found in the Registry and are stored here:

HKEY_LOCAL_MACHINE \SYSTEM \CurrentControlSet \Control \ContentIndex

Registry settings, given in decimal value, that control scanning and merging include the following:

Searching Catalogs

Searching is the process of looking through the catalog to find information. Users can search the catalog in several ways. The technique most often used with Web servers is to build a query form that can be used to search the catalog. The Indexing Service includes a query form for each catalog that can be used to test the installation. You can also create query forms using Active Server Pages (ASP) and Internet data query (IDQ) files.

With ASP, you create the query form and handle the results using a combination of server-side scripts that use ASP objects, HTML, and client-side scripts. The scripts you use can be written in any installed scripting language, and both Microsoft VBScript and Microsoft JScript are installed by default. Typically, you’ll use the same page to implement the query form and display the results once the user has entered search parameters. For example, you could create a page called Query.asp that implements the query form and has an embedded script that submits the search parameters and then formats the search results.

IDQ, on the other hand, is a special language designed for submitting queries to the Indexing Service. With IDQ you create separate pages for handling each step in the query process. You use the following elements:

An advantage of IDQ over ASP is that IDQ queries are much faster and more efficient in their use of Indexing Service resources. Regardless of whether you use ASP or IDQ to handle searches, you must set basic parameters that provide default values for the Indexing Service. The parameters you should set are summarized in Table 12-1.

Table 12-1: Basic Parameters for the Indexing Service

Parameter

Description

Sample Value for IDQ

CiCatalog

Sets the file location of the catalog to be searched. If you don’t set this parameter, the Indexing Service searches the Inetpub directory for a default catalog.

CiCatalog = D:\Catalogs\WWW

CiFlags

Sets the search flags for the query. The DEEP flag tells the Indexing Service to search all subdirectories within the current scope.

CiFlags = DEEP

CiMaxRecordsIn ResultSet

Sets the maximum number of records to return in the result set.

CiMaxRecordsInResultSet = 100

CiMaxRecords PerPage

Sets the maximum number of records to return in a single page.

CiMaxRecordsPerPage = 20

CiRestriction

Stores the search values entered by the user as passed from the query form.

CiRestriction = %CiRestriction%

CiScope

Sets the scope of the query within the catalog. If scope is set to /, the search begins at the top (or root) of the document tree.

CiScope = /Docs

Note

Most organizations have Web developers whose job is to create the Web pages needed for searching, handling, and displaying results. As the Web administrator, you assist the development team in setting parameters and publishing the Web pages when they’re completed.

Категории