Data Mining: Opportunities and Challenges
DM TRENDS: METHODS AND TECHNIQUES
Constraint-Based DM
Many of the DM techniques that currently exist are very useful but lack the benefit of any guidance or user control. One method of inserting some form of human involvement into DM is in the form of constraint-based DM. This form of DM incorporates the use of constraints that guide the process. Frequently, this is combined with the benefits of multidimensional mining to add greater power to the process (Han, Lakshamanan, & Ng, 1999). There are several categories of constraints that can be used, each with its own characteristics and purpose. These include knowledge-type constraints, that specify the "type of knowledge" that is to be mined, and are typically specified at the beginning of any DM query. Some of the types of constraints that can be used include clustering, association, and classification. Data constraints identify the data that are to be used in the specific DM query. Since constraint-based mining is ideally conducted within the framework of an ad hoc, query-driven system, data constraints can be specified in a form similar to that of a SQL query. Because much of the information being mined is in the form of a DB or multidimensional data warehouse, it is possible to specify constraints that identify the levels or dimensions (dimension/level constraints) to be included in the current query. It would also be useful to determine what ranges of a particular variable or measure are considered to be particularly interesting and should be included in the query (interestingness constraints). Finally, rule constraints specify the specific rules that should be applied and used for a particular DM query or application. One application of the constraint-based approach is in the Online Analytical Mining Architecture (OLAM) developed by Han, Lakshamanan, and Ng (1999), which is designed to support the multidimensional and constraint-based mining of DBs and data warehouses. In short, constraint-based DM is one of the developing areas that allow the use of guiding constraints, which should make for better DM. A number of studies have been conducted in this area (Cheung, Hwang, Fu, & Han, 2000; Lakshaman, Ng, Han, & Pang, 1999;Lu, Feng, & Han, 2000; Pei & Han, 2000; Pei, Han, & Lakshaman, 2001; Pei, Han, & Mao, 2000; Tung, Han, Lakshaman, & Ng, 2001; Wang, He, & Han, 2000; Wang, Zhou, & Han, 2000).
Phenomenal DM
Phenomenal DM is not a term for a DM project that went extremely well. Rather, it focuses on the relationships between data and the phenomena that are inferred from the data (McCarthy, 2000). One example of this is that by using receipts from cash supermarket purchases, it is possible to identify various aspects of the customers who are making these purchases. Some of these phenomena could include age, income, ethnicity, and purchasing habits. One aspect of phenomenal DM, and in particular the goal to infer phenomena from data, is the need to have access to some facts about the relations between this data and their related phenomena. These could be included in the program that examines data for phenomena or in a kind of knowledge base or DB that can be drawn upon when doing the DM. Part of the challenge in creating such a knowledge base involves the coding of common sense into a DB, which has proved to be a difficult problem so far (Lyons & Tseytin, 1998).
| |||||||||||||||||||||||||||||||||
|