Distributed Multimedia Database Technologies Supported by MPEG-7 and MPEG-21

 < Day Day Up > 


4.4 Content-Based Retrieval

The key functionality in a multimedia database is how to retrieve continuous and noncontinuous multimedia information effectively. One broadly used method, the content-based retrieval (CBR) of multimedia objects, relies on simple extraction properties of multimedia objects. [144], [145], [146] In contrast, in concept-based content retrieval, [147] semantic interpretation of the objects is added; for example, objects are identified as a named person, and so forth. This semantic interpretation can be added by the indexer or obtained through a semiautomatic indexing process. We discussed this issue in Section 4.1.

To account for the broad use of CBR and the need for semantically rich indexing and querying, an MMDBMS must provide means for CBR as well as optimizer and processor engines for complex structured queries. This implies that the next-generation MMDBMSs must provide generic and reusable engines for different processing purposes, which must be built, preferably, on top of existing database engines. Two main description classes for CBR are actually distinguished and can be used in combination:

4.4.1 Image CBR

There exist many Image CBR systems (CBIR), as discussed in Rui et al., [150] Smith and Chang, [151] Flickner et al., [152] Das et al., [153] Shanbehazadeh et al., [154] and Kelly and Cannon. [155] A broadly used system is the IBM QBIC (http://wwwqbic.almaden.ibm.com). [156]

4.4.2 Use Case: The QBIC System from IBM

The similarity search in QBIC relies principally on feature vectors generated from color, texture, and histograms. Exhibit 4.10 shows the graphical CBIR query mask of QBIC, in which, in addition to keywords, values for the RGB (red, green, and blue) color system can be entered. The computation of the features is the following:

Exhibit 4.10: Image content-based retrieval query mask of Query by Image Content. [158]

QBIC is integrated in the current Image Extender of IBM's commercial database system DB2. The QBIC functionality in the Image Extender is detailed in the product's Section 4.6. QBIC is also used in CBR Web-Application; for instance, it is integrated in a Web-based system for finding artwork in the digital collection of the Hermitage Museum; the Web site is http://hermitagemuseum.org/.

4.4.3 Video CBR

Video CBR is supported by fewer systems than CBIR. [159], [160] For instance, VideoQ incorporates, in addition to traditional keyword search methods, the possibility to search video based on a rich set of visual features; that is, color, texture, shape, motion, and spatio-temporal relationships.

The VideoQ[161] system extracts regions and objects from the video shots using color and edge information and tracks their motion. Queries are entered in a graphical user interface by specifying the color, texture, the motion of the objects as well as their spatio-temporal relationship to other objects. Exhibit 4.11 shows a snapshot of the VideoQ query interface during the specification of a spatio-temporal query.

Exhibit 4.11: Query mask of VideoQ. [162]

Virage [163] is a broadly used commercial Video CBR (http://www.virage.com) that includes facilities for video analyzing, annotation, and retrieval. It also provides a speech recognition tool, based on a large set of speaker profiles, to encode content of audio in ASCII (American Standard Code for Information Interchange) text. This encoded data can then be used as the query content. However, Virage does not currently support any spatio-temporal querying.

There are similar CBR technologies currently in use in different applications. [164], [165], [166], [167], [168] An interesting approach is given by MetaSEEk (http://www.ctr.columbia.edu/MetaSEEk/). It is a meta-search engine for images based on the content of CBIRs that are located at IBM, Virage, and Columbia University servers. Furthermore, MetaSEEk [169] allows the search in categories, similar to WebSEEk [170] (http://www.ctr.columbia.edu/webseek/).

In this context, Colin C. Venters and Matthew Cooper provided a very comprehensive overview of currently used CBIR systems, [171] which can be found online at http://www.jisc.ac.uk/uploaded_documents/jtap-039.doc. In 2000, R.C. Veltkamp and M.Tanase [172] compared 43 available products. Their report may be found at http://give-lab.cs.uu.nl/cbirsurvey/. Finally, A.W.M. Smeulders et al. [173] overviewed CBIR in their overview paper and discussed open problems, such as the "semantic gap," which exists between low-level and high-level indexing and retrieval (as already discussed in Section 4.1). The focus of these overview articles is on algorithmic details and metadata models for retrieval in not detailed.

4.4.4 Audio CBR

One of the first systems available was QBH (Query by Humming), described at the ACM Multimedia Conference in 1995. [174] The system allows the user to query an audio database by humming the desired tune into a microphone. QBH tracks pitches with a modified form of autocorrelation method and converts them into a melody pattern encoded in a stream of U (up), D (down), and S (same) characters. Then, the system uses an approximate string match algorithm to match the patterns against the audio database.

The Melody Tools of MPEG-7 (see Section 2.8) can be used for query by humming. In a simple scenario, the audio input of a humming person is analyzed for the contour and beat (use of the MelodyContour DS) and is then compared with the melodies stored in the database. The song with the most similar contour and beat is retained. Additional MPEG-7 information on the author, the genre, and so forth, are equally returned. In some cases, however, it is useful to employ the MelodySequence DS for audio similarity matching; for instance, to identify melodies similar to humming by imperfect musicians. The MelodySequence DS can also be used to reconstruct a melodic line and lyrics from its MPEG-7 description. A working Query by Humming system can be found at www.musicline.de/de/melodiesuche.

In general, Audio CBR is the task used to find occurrences of a musical query pattern in a music database. A natural way to give such a query pattern is by humming (as described before), but recently, other forms have attracted interest. In the sample scenario from Chapter 1, the query pattern was recorded with the mobile phone. In such a case, the AudioSignature DS is a good representative for the content in the database and for the query pattern, as it provides an unique fingerprint. It has been shown that only 10 seconds of recorded pattern is sufficient to achieve a 90 percent recognition rate. [175] Another Audio CBR form is introduced in the Parfait Olé Melody Search Engine (http://www.parfaitole.com/). This engine allows one to identify an unknown melody from its first few notes, provided that one can describe the melody in terms of standard musical pitches. The pitches are allowed to be relative pitches, that is, the exact key need not be known, and the rhythm is ignored.

Jonathan Foote [176] gives a good overview of Audio CBR systems. He also realized several Audio CBR systems; in 1997, [177] he proposed a system for content-based retrieval of music and audio that can retrieve audio documents by acoustic similarity. In his approach, a template is generated using cepstral coefficients for each audio in the database and for the user's acoustic input. The query template is then matched against templates in the database to determine the query results. Later, he extended his methodology to retrieval by rhythmic similarity. [178]

4.4.5 Influence of MPEG-7 on CBR

MPEG-7 has a major influence on CBR implementations. It offers the unique possibility to have a standardized way of describing the low-level content and thus enables the realization of interoperable querying mechanisms. MPEG-7 applications for CBR may be found at http://mpeg-industry.com. Most of them focus on similarity search. Some of them try to integrate semantic meaningful annotation and advanced query mask for this purpose. For instance, Caliph and Emir from the I-Know Center in Austria propose an MPEG-7 based photo annotation and retrieval tool. The basic semantic entities, such as AgentObjectType and EventType, may be entered and reused. SemanticRelations may be graphically specified by connected respective semantic entities. The search mask follows the annotation mask and allows the same graphical concept. Caliph and Emir is available on 90-day license from http://www.know-center.at/en/divisions/div3demos.htm. Another interesting approach is used in the VAnnotator, developed in the IST VIZARD (Video Wizard) project. This MPEG-7-based annotation tool employs the concept of "video lenses" for providing different views on the materials being annotated. There are lenses for viewing and adding textual annotations and lenses that allow viewing the result of image processing algorithms such as cut detection. VAnnotator may be downloaded from http://www.video-wizard.com/index-n.htm on a 30-day trial license.

4.4.6 CBR Online Demos

Finally, the reader is advised to make his or her own judgment by looking at the following list (not complete) of available demos:

Online advanced CBIR systems include:

Online MPEG-7-based retrieval and indexing systems include:

[144]Amato, G., Mainetto, G., and Savino, P., An approach to a content-based retrieval of multimedia data, Multimedia Tools Appl., 7, 9–36, 1998.

[145]Rui, Y., Huang, T.S., and Chang, S.-F., Image retrieval: past, present and future. J. Visua Comm. Image Representation, 10, 1–23, 1999.

[146]Yoshitaka, A. and Ichikawa, T., A survey on content-based retrieval for multimedia databases, IEEE Trans. Knowl. Data Eng., 11, 81–93, 1999.

[147]Apers, P.M.G. and Kersten, M.L., Content-based retrieval in multimedia databases based on feature models, in Proceedings of the International Conference on Advanced Multimedia Content Processing, Osaka, Japan, November 1998, Springer-Verlag, Heidelberg, LNCS 1554, pp. 119–130.

[148]Subrahmanian, V.S., Principles of Multimedia Database Systems, Morgan Kaufman Press, 1998.

[149]Apers, P.M.G., Blanken, H.M., and Houtsma, M.A.W., Multimedia Databases in Perspective. Springer-Verlag, Heidelberg, 1997.

[150]Rui, Y., Huang, T.S., and Chang, S.-F., Image retrieval: past, present and future. J. Visua Comm. Image Representation, 10, 1–23, 1999.

[151]Smith, J. and Chang, S.-F., VisualSEEk: a fully automated content-based image query system, in Proceedings of the Fourth ACM Multimedia Conference (MULTIMEDIA'96), New York, November 1996, ACM Press, pp. 87–98.

[152]Flickner, M., Sawhney, H., Niblack, W., Ashley, J., Huang, Q., Dom, B., Gorkani, M., Hafner, J., Lee, D., Petkovic, D., Steele, D., and Yanker, P., Query by image and video content: the QBIC system. IEEE Comput., 28, 23–32, 1995.

[153]Das, M., Riseman, E.M., and Draper, B.A., FOCUS: searching for multi-colored objects in a diverse image database, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1997, pp. 756–761.

[154]Shanbehazadeh, J., Moghadam, A.M.E., and Mahmoudi, F., Image indexing and retrieval techniques: past, present, and next, in Proceedings of SPIE: Storage and Retrieval for Media Databases 2000, Yeung, M.M., Yeo, B.-L., and Bouman, C.A., eds., Vol. 3972, 2000, pp. 461–470.

[155]Kelly, P.M. and Cannon, T.M., Query by image example: the CANDID approach, in Proceedings of SPIE: Storage and Retrieval for Media Databases 1995, 1995, pp. 238–248.

[156]Flickner, M., Sawhney, H., Niblack, W., Ashley, J., Huang, Q., Dom, B., Gorkani, M., Hafner, J., Lee, D., Petkovic, D., Steele, D., and Yanker, P., Query by image and video content: the QBIC system. IEEE Comput., 28, 23–32, 1995.

[157]Ortega, M., Rui, Y., Chakrabarti, K., Porekaew, K., Mehrotra, S., and Huang, T.S., Supporting ranked Boolean similarity queries in MARS, IEEE Trans. Knowl. Data Eng., 10, 905–925, 1998.

[158]Veltkamp, R.C. and Tanase, M., Content-based image retrieval systems: a survey Technical Report UU-CS-2000-34, Utrecht University, Utrecht, March 2001.

[159]Chang, S.-F., Chen, W., Meng, H., Sundaram, H., and Zhong, D., VideoQ: an automated content-based video search system using visual cues, in Proceedings of the 5th ACM International Multimedia Conference (MULTIMEDIA '97), New York/Reading, November 1998. ACM Press/Addison-Wesley, pp. 313–324.

[160]Bach, J.R., The VIRAGE image search engine: an open framework for image management, in Proceedings of SPIE-96, 4th Conference on Storage and Retrieval for Still Image and Video Databases, San Jose, 1996, pp. 76–87.

[161]VideoQ has been developed within the ADVENT Project of the Image and Advanced TV Lab at Columbia University. A Web demonstration may be found at http://www.ctr.columbia.edu/VideoQ/.

[162]Chang, S.-F., Chen, W., Meng, H., Sundaram, H., and Zhong, D., VideoQ: an automated content-based video search system using visual cues, in Proceedings of the 5th ACM International Multimedia Conference (MULTIMEDIA '97), New York/Reading, November 1998. ACM Press/Addison-Wesley, pp. 313–324.

[163]Bach, J.R., The VIRAGE image search engine: an open framework for image management, in Proceedings of SPIE-96, 4th Conference on Storage and Retrieval for Still Image and Video Databases, San Jose, 1996, pp. 76–87.

[164]Das, M., Riseman, E.M., and Draper, B.A., FOCUS: searching for multi-colored objects in a diverse image database, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1997, pp. 756–761.

[165]Shanbehazadeh, J., Moghadam, A.M.E., and Mahmoudi, F., Image indexing and retrieval techniques: past, present, and next, in Proceedings of SPIE: Storage and Retrieval for Media Databases 2000, Yeung, M.M., Yeo, B.-L., and Bouman, C.A., eds., Vol. 3972, 2000, pp. 461–470.

[166]Kelly, P.M. and Cannon, T.M., Query by image example: the CANDID approach, in Proceedings of SPIE: Storage and Retrieval for Media Databases 1995, 1995, pp. 238–248.

[167]Chang, S.-F., Chen, W., Meng, H., Sundaram, H., and Zhong, D., VideoQ: an automated content-based video search system using visual cues, in Proceedings of the 5th ACM International Multimedia Conference (MULTIMEDIA '97), New York/Reading, November 1998. ACM Press/Addison-Wesley, pp. 313–324.

[168]Bach, J.R., The VIRAGE image search engine: an open framework for image management, in Proceedings of SPIE-96, 4th Conference on Storage and Retrieval for Still Image and Video Databases, San Jose, 1996, pp. 76–87.

[169]Beigi, M., Benitez, A.B., and Chang, S.-F., Metaseek: a content-based metasearch engine for images, in Proceedings of the International Conference of Storage and Retrieval for Image and Video Databases (SPIE), 1998, pp. 118–128.

[170]Smith, J. and Chang, S.F., Visually searching the web for content, IEEE MultiMedia, 4, 12–20, 1997.

[171]Venters, C.C. and Cooper, M., A review of content-based image retrieval systems. Technical Report JTAP-054, JISC Technology Applications Programme (JTAP) at University Manchester, Manchester, 1999.

[172]Veltkamp, R.C. and Tanase, M., Content-based image retrieval systems: a survey Technical Report UU-CS-2000-34, Utrecht University, Utrecht, March 2001.

[173]Smeulders, A.W.M., Worring, M., Santini, S., Gupta, A., and Jain, R., Content based image retrieval at the end of the early years. IEEE Trans. Pattern Anal. Machine Intell., 22, 1349–1380, 2000.

[174]Ghias, A., Logan, J., Chamberlain, D., and Smith, B.C., Query by humming-musical information retrieval in an audio database, in Proceedings of the ACM Multimedia, San Francisco, 1995, pp. 231–236.

[175]Crysandt, H. and Wellhausen, J., Music classification with MPEG-7, Proceedings of the SPIE International Conference on Electronic Imaging—Storage and Retrieval for Media Databases, Santa Clara, January 2003.

[176]Foote, J., An overview of audio information retrieval, ACM Multimedia Syst., 7, 2–10, 1999.

[177]Foote, J.T., Multimedia Storage and Archiving Systems II, Proceedings of SPIE, Kuo C.C.J., et al., eds., Vol. 3229, 1997, pp. 138–147.

[178]Foote, J., Cooper, M., and Nam, U., Audio retrieval by rhythmic similarity, in Proceedings of the Third International Symposium on Musical Information Retrieval, Paris, September 2002.


 < Day Day Up > 

Категории