Programming Microsoft Outlook and Microsoft Exchange, Second Edition (DV-MPS Programming)
Exchange Server 2000 provides built-in support for content indexing. If you plan to use Exchange Server as a repository for a large amount of information, or if you think that your users will require extensive search features, you should consider using content indexing. You can also use content indexing to search for text in attachments on items. Content indexing supports indexing Office and HTML documents as well as standard text attachments. My only caveat for content indexing is that you make sure the requirements for processor and disk resources don't affect the performance of your server. You can schedule incremental crawls of the data sources for content indexing through the Exchange System Manager, shown in Figure 19-6.
Content indexing allows you to perform quick queries against a full-text index inside your Exchange Server applications. Using the full-text index is as easy as generating ADO code that uses the CONTAINS or FREETEXT predicate. You cannot use these predicates until you turn on content indexing for your Exchange server. Be aware that content indexing operates on a per store basis, meaning there is no top-level index that allows you to search across stores. Let's see how a query containing the CONTAINS or FREETEXT predicate looks.
Figure 19-6. Using the Exchange System Manager for content indexing.
CONTAINS Predicate
The CONTAINS predicate allows you to perform text-matching operations against the full-text index. With CONTAINS, you can perform simple queries, such as "Show me all items that contain the word Bob in the subject," as well as complex queries with weighting on the terms they contain (you can use a weighted query to indicate relative importance of the terms you search for). The following code snippet shows several ways you can use the CONTAINS predicate. The code includes a simple version of CONTAINS; it also shows you how to use the NEAR keyword, prefix matching, linguistic matching (such as drive, driving, and so on), and weighted queries.
'Contains Bob strSQL = "Select ""urn:schemas:httpmail:subject"" From " & _ "scope('shallow traversal of """ & strURL & _ """') WHERE CONTAINS(""urn:schemas:httpmail:subject"",' ""Bob"" ')" 'Contains Bob AND Cool, could also be OR strSQL = "Select ""urn:schemas:httpmail:subject"" From " & _ "scope('shallow traversal of """ & strURL & _ """') WHERE CONTAINS(""urn:schemas:httpmail:subject"",' ""Bob"" " & _ "AND ""Cool""')" 'Word prefix match strSQL = "Select ""urn:schemas:httpmail:subject"" From " & _ "scope('shallow traversal of """ & strURL & _ """') WHERE CONTAINS(""urn:schemas:httpmail:subject"",' ""*Bob*"" ')" 'Linguistic matching strSQL = "Select ""urn:schemas:httpmail:subject"" From " & _ scope('shallow traversal of """ & strURL & _ """') WHERE CONTAINS('FORMSOF(INFLECTIONAL,""drive"") ')" 'Bob NEAR cool, where ~ is same as NEAR (within 50 words) strSQL = "Select ""urn:schemas:httpmail:subject"" From " & _ "scope('shallow traversal of """ & strURL & _ """') WHERE CONTAINS(""urn:schemas:httpmail:subject""," & _ "'""Bob"" ~ ""cool""')" 'Weighted Match strSQL = "Select ""urn:schemas:httpmail:subject"" From " & _ "scope('shallow traversal of """ & strURL & _ """') WHERE CONTAINS(""urn:schemas:httpmail:subject""," & _ "'ISABOUT (""Bob"" WEIGHT(0.9), ""Cool"" WEIGHT(0.1))')" |
FREETEXT Predicate
You can use the FREETEXT predicate to search columns based on the meaning of the search words rather than the exact wording. When you use FREETEXT, the query engine breaks the string you specify into a number of search terms, assigns weights to the terms, and then attempts to find a match. The following code performs a search for the meaning "best server on the planet." Since I have in my Inbox a message with the subject "Exchange Server is the best server in the world," the search finds that message even though the wording is slightly different.
NOTE
You can use the AND, OR, and FREETEXT predicates together.
'Freetext strSQL = "Select ""urn:schemas:httpmail:subject"" From " & _ "scope('shallow traversal of """ & strURL & _ """') WHERE FREETEXT(""urn:schemas:httpmail:subject""," & _ "'best server on the planet')" |
Working with Ranking
When using content indexing with a query, you might want to retrieve the rank value of a document as compared to the search terms in your query. The search engine will assign a rank of 0_1000 to your items depending on how well they match the query. In your ADO SELECT statement, you can request the rank property by adding the property urn:schemas.microsoft.com:fulltextqueryinfo:rank . You can use the ORDER BY predicate with this property to sort the items returned from the query based on their relevance to the search terms. Also, you can force Exchange to coerce the values for the rank either by using clause weighting or by using rank coercion.
The idea of clause weighting is the similar to the idea used in the weighted column example we examined earlier. The difference is that instead of applying the weight to only the column, you apply the weight to the entire search term using the RANK BY predicate. This predicate takes a number of options such as WEIGHT and COERCION.
The next example uses clause ranking, so the WEIGHT option is used. This option takes a decimal value from 0 through 1 plus up to three digits past the decimal, such as 0.832. Using the technique of clause weighting, you can assign certain search terms a fraction of the weight that other search terms have. The following example searches all properties on the items for the term transportation and the terms heavy and trains. As you can see by the ranking, if the search engine finds only transportation, it should rank that item at one-quarter the value of an item containing heavy trains. When using weighting in this manner, the search engine applies weighting to the terms in the pre-processing stage.
StrSQL = "Select ""urn:schemas:httpmail:subject"", " _ & """urn:schemas.microsoft.com:fulltextqueryinfo:rank"" FROM " _ & "scope('shallow traversal of "file://./backofficestorage/ _ & "thomriznt5dom.extest.microsoft.com/apps/items/""') WHERE " _ & "CONTAINS(*,'""transportation""') RANK BY WEIGHT(0.25) OR " _ & "CONTAINS(*,'""heavy trains""') RANK BY WEIGHT(1.0)" |
Coercion, especially rank coercion, is a post-processing concept in which after the search engine finds matches for the search terms, your application can tell the search to recalculate the rank according to your specifications. Coercion is best illustrated with an example. Suppose you are searching for a document that contains the word Exchange. If the word Exchange is in a particular property that you think will make the item containing the word very relevant, such as the subject property, you can coerce the search engine into ranking the item you are searching for very high. If the search engine finds an item with the word Exchange in another property (that is, a property other than the subject property), you can have the search engine readjust the ranking so that the items containing Exchange found in the other properties are ranked lower than those found in the subject property.
You can perform the coercion using one of two approaches. One approach is absolute coercion, in which you assign an absolute value such as 500 to the items that meet your criteria for coercion. But what if you have more complex scenarios and absolute coercion will not meet your needs? For example, you want items with the word Exchange in the subject property to be ranked from 900 through 1000. (Remember that rank can range from 0 through 1000.) Using a coercion formula—the second approach—you can tell the search engine to make the coerced rank of these items according to this formula: 900 + the uncoerced rank multiplied by 0.1. For the items containing the word Exchange in a property different from subject, you can coerce the rank to be in the range of 0 through 900 by making the coerced rank equal to the uncoerced rank multipled by 0.9.
Using a coercion formula requires that your users know which columns they should have the search engine rank higher when their search criterion involves those columns. You could implement some logic in your application to take a shot at defining which columns should be coerced as ranking higher if search terms are found in those columns. The following code shows both absolute coercion and using a coercion formula for the example we just looked at:
'Use absolute coercion '1000 - Exchange in Subject '500 - Exchange anywhere else StrSQL = "Select ""urn:schemas:httpmail:subject""," _ & """urn:schemas.microsoft.com:fulltextqueryinfo:rank"" FROM " _ & "scope('shallow traversal of "file://./backofficestorage/" _ & "thomriznt5dom.extest.microsoft.com/apps/items/""') WHERE " _ & "CONTAINS(""urn:schemas:httpmail:subject"",'""Exchange""') " _ & "RANK BY COERCION(ABSOLUTE,1000) OR CONTAINS(*,'""Exchange""') " _ & "RANK BY COERCION(ABSOLUTE,500)" 'Use coercion formula '900 - 1000 - Exchange in Subject using MULTIPLY and ADD '0 - 900 - Exchange anywhere else using MULTIPLY 'MULTIPLY takes a decimal number from 0 to 1 with 3 digits after 'the decimal 'ADD takes an integer 'You cannot go above 1000 StrSQL = "Select ""urn:schemas:httpmail:subject""," _ & """urn:schemas.microsoft.com:fulltextqueryinfo:rank"" FROM " _ & "scope('shallow traversal of "file://./backofficestorage/ _ & "thomriznt5dom.extest.microsoft.com/apps/items/""') WHERE " _ & "(CONTAINS(""urn:schemas:httpmail:subject"",'""Exchange""') " _ & "RANK BY COERCION(MULTIPLY,0.1)) RANK BY COERCION(ADD,900) OR " _ & "CONTAINS(*,'""Exchange""') RANK BY COERCION(MULTIPLY,0.9)" |
Indexing Default Properties
The content indexing engine by default indexes a certain set of built-in properties, which are listed in Table 19-5. Note that at this time, there is no simple way to tell the engine to index your custom properties, such as setting a fulltextindexed property in your schema. The only way to ensure that your custom properties are full-text indexed is to create a text file, such as http://thomriz.com/schema/myprop, that contains the fully qualified names for your properties on separate lines. You then need to set the following registry key to point at that text file:
HKLM\Software\Microsoft\Search\1.0\ExchangeParameters\SchemaTextFilePathName |
Table 19-5. The set of built-in properties indexed by the content indexing engine by default.
MAPI Property | urn:schemas:httpmail Property |
---|---|
PR_SUBJECT, PR_SUBJECT_W | urn:schemas:httpmail:subject |
PR_BODY, PR_BODY_W | urn:schemas:httpmail:textdescription |
PR_SENDER_NAME, PR_SENDER_NAME_W | urn:schemas:httpmail:textdescription |
PR_SENDER_NAME_W | urn:schemas:httpmail:sendername |
PR_SENT_REPRESENTING_NAME, PR_SENT_REPRESENTING_NAME_W | urn:schemas:httpmail:fromname |
PR_DISPLAY_TO, PR_DISPLAY_TO_W | urn:schemas:httpmail:displayto |
PR_DISPLAY_CC, PR_DISPLAY_CC_W | urn:schemas:httpmail:displaycc |
PR_DISPLAY_BCC, PR_DISPLAY_BCC_W | urn:schemas:httpmail:displaybcc |
PR_SENDER_EMAIL_ADDRESS, PR_SENDER_EMAIL_ADDRESS_W | N/A. |
The following code, taken from the Training application, can be used when you turn on content indexing for the application. For right now, I've commented out this code.
'Enable this if you have content indexing enabled on your system 'Uses CONTAINS instead of LIKE '***************************** BEGIN 'strCategoryText = "CONTAINS (""" & strSchema & "category""," 'if strCategories = "all" then ' arrCategories = Session("arrCategories") 'Select the first one 'Generate the rest 'strCategoriesSQL = strCategoryText ' for i=lbound(arrCategories) to UBound(arrCategories) ' if i=LBound(arrCategories) then 'First one, start the ' ' strCategoriesSQL = strCategoryText & "'""" & arrCategories(i) ' if lbound(arrCategories) = UBound(arrCategories) then 'Only one, end the statement ' strCategoriesSQL = strCategoriesSQL & """')" ' else ' strCategoriesSQL = strCategoriesSQL & """ OR " ' end if ' elseif (i<UBound(arrCategories) AND i>LBound(arrCategories)) then ' strCategoriesSQL = strCategoriesSQL & """" & _ ' arrCategories(i) & """ OR " ' else 'it's the last one, drop the OR ' if Right(arrCategories(i),1) = chr(10) then 'must be a carriage return/linefeed ' arrCategories(i) = Mid(arrCategories(i), _ 1,(len(arrCategories(i))-2)) ' end if ' strCategoriesSQL = strCategoriesSQL & """" & _ Trim(Cstr(arrCategories(i))) & """')" ' end if ' next 'else 'Need to create the category search string 'Grab the querystring value which should be separated by $ ' strCats = Request.QueryString("Categories") ' arrCats = Split(strCats,"$") 'Always going to be at least one ' for i=lbound(arrCats) to UBound(arrCats) ' if i=LBound(arrCats) then 'First one, start the ' ' strCategoriesSQL = strCategoryText & "'""" & arrCats(i) ' if lbound(arrCats) = UBound(arrCats) then 'Only one, end the statement ' strCategoriesSQL = strCategoriesSQL & """')" ' else ' strCategoriesSQL = strCategoriesSQL & """ OR " ' end if ' elseif (i<UBound(arrCats) AND i>LBound(arrCats)) then ' strCategoriesSQL = strCategoriesSQL & """" & arrCats(i) & _ """ OR " ' else 'it's the last one, drop the OR ' if Right(arrCats(i),1) = chr(10) then 'must be a carriage return/linefeed ' arrCats(i) = Mid(arrCats(i),1,(len(arrCats(i))-2)) ' end if ' strCategoriesSQL = strCategoriesSQL & """" & _ Trim(Cstr(arrCats(i))) & """')" ' end if ' next 'end if '************************************** END |