(ed.) Intelligent Agents for Data Mining and Information Retrieval

From the definition of data warehouses (Inmon, 1994), we define the textual warehouses as a source of information that is subject-oriented, filtered, integrated, archived (versions), and organized for a process of retrieval, interrogation or analysis.

The information contained in a document warehouse must be organized as follows :

The architecture we propose for the definition of the textual warehouses is presented in Figure 1. This architecture includes two stages: warehouse storage and warehouse exploitation.

Figure 1: Architecture of Textual Warehouses

The first stage involves extracting the structure and content from each document in order to store them in the warehouse. Each textual element of content must be indexed to extract information that will be used afterward by techniques of information retrieval.

The second stage manipulates the information contained in the warehouse. For that task, we propose three techniques:

Such textual warehouses then become the basic tool for company employees who wish to exploit information which they need for their daily professional tasks (e.g., administrative intranet, digital libraries, technical documentation, etc.).

Категории