Data Mining: Opportunities and Challenges
Data mining has often been defined as a "process of extracting previously unknown, valid and actionable information from large databases and then using the information to make critical business decisions" (Cabena, Hadjinian, Stadler, Verhees, & Zanasi, 1998). This definition is based on the premise that an organization can link its myriad sources of data into a data warehouse (to potentially include data marts). Further, these data sources can evolve to a higher degree of analyses to include exploration using on-line analytical processing (OLAP), statistical analyses, and querying. One mechanism that can be used to migrate beyond exploration and permit organizations to engage in information discovery is the Sample, Explore, Modify, Model and Assess (SEMMA) Methodology, as developed by the SAS Corporation (http://www.sas.com/). In sum, SEMMA enables its users to: 1) test a segment of data (test data) to be mined for preliminary outcomes; 2) assess this sample data for outliers, trends, etc.; 3) revise the sample data set for model testing, which can include neutral networks, linear regression, and decision trees; 4) evaluate the model for accuracy; and 5) validate the model using the complete data set (Groth, 1998). In general, the rationale for mining data is a function of organizational needs and type of firm's role in the industry. Moreover, researchers (Hirji, 2001; Keim, 2001) have established that data-mining applications, when implemented effectively, can result in strategic planning and competitive advantage. In an effort to minimize the collection and storage of useless and vast amounts of data, mining can identify and monitor which data are most critical to an organization thereby providing efficient collection, storage, and exploration of data (Keim, 2001). The efficiencies associated with mining of "most vital data" can result in improved business intelligence, heightened awareness of often overlooked relationships in data, and discovery of patterns and rules among data elements (Hirji, 2001). Others (http://www.sas.com/; Groth, 1998) offer that data mining permits improved precision when executing:
In particular, data mining offers the health care industry the capabilities to tackle imminent challenges germane to its domain. Among them being:
| |||||||||||||||||||||||||||||||||
|