Distributed Multimedia Database Technologies Supported by MPEG-7 and MPEG-21

2017-07-07 02:10:07

< Day Day Up >

4.4 Content-Based Retrieval

The key functionality in a multimedia database is how to retrieve continuous and noncontinuous multimedia information effectively. One broadly used method, the content-based retrieval (CBR) of multimedia objects, relies on simple extraction properties of multimedia objects. ^[144], ^[145], ^[146] In contrast, in concept-based content retrieval, ^[147] semantic interpretation of the objects is added; for example, objects are identified as a named person, and so forth. This semantic interpretation can be added by the indexer or obtained through a semiautomatic indexing process. We discussed this issue in Section 4.1.

To account for the broad use of CBR and the need for semantically rich indexing and querying, an MMDBMS must provide means for CBR as well as optimizer and processor engines for complex structured queries. This implies that the next-generation MMDBMSs must provide generic and reusable engines for different processing purposes, which must be built, preferably, on top of existing database engines. Two main description classes for CBR are actually distinguished and can be used in combination:

A straightforward first approach is to use a "tag" to each object and to retrieve objects on the basis of their tags. ^[148] The tags could even be structured into multiple attributes. Traditional relational algebra is then applicable on all the attributes in the tag, and predicates on these can be used to retrieve the objects. Standard keyword-based indexing, as used in text retrieval, can then be used to retrieve the tags relevant to the query and the associated objects.

The second approach to CBR is the similarity search between extracted multimedia features. ^[149] Here, the query is actually posed against the low-level feature vectors extracted from the multimedia object rather than against the tag.

4.4.1 Image CBR

There exist many Image CBR systems (CBIR), as discussed in Rui et al., ^[150] Smith and Chang, ^[151] Flickner et al., ^[152] Das et al., ^[153] Shanbehazadeh et al., ^[154] and Kelly and Cannon. ^[155] A broadly used system is the IBM QBIC (http://wwwqbic.almaden.ibm.com). ^[156]

4.4.2 Use Case: The QBIC System from IBM

The similarity search in QBIC relies principally on feature vectors generated from color, texture, and histograms. Exhibit 4.10 shows the graphical CBIR query mask of QBIC, in which, in addition to keywords, values for the RGB (red, green, and blue) color system can be entered. The computation of the features is the following:

Color: QBIC computes the average Munsell ^[157] coordinates of each object and image, plus an n-dimensional color histogram (n is typically 64 or 256) that gives the percentage of the pixels in each image in each of the n colors.

Texture: QBIC's texture features are based on modified versions of the coarseness, contrast, and directionality features. Coarseness measures the scale of the texture (pebbles versus boulders), contrast describes the vividness of the pattern, and directionality describes whether or not the image has a favored direction or is isotropic (grass versus smooth objects).

Shape: QBIC uses several different sets of shape features. One is based on a combination of area, circularity, eccentricity, major axis orientation, and a set of algebraic moment invariants. A second is the turning angles or tangent vectors around the perimeter of an object, computed from smooth splines fit to the perimeter. The result is a list of 64 values of turning angle.

Exhibit 4.10: Image content-based retrieval query mask of Query by Image Content. ^[158]

QBIC is integrated in the current Image Extender of IBM's commercial database system DB2. The QBIC functionality in the Image Extender is detailed in the product's Section 4.6. QBIC is also used in CBR Web-Application; for instance, it is integrated in a Web-based system for finding artwork in the digital collection of the Hermitage Museum; the Web site is http://hermitagemuseum.org/.

4.4.3 Video CBR

Video CBR is supported by fewer systems than CBIR. ^[159], ^[160] For instance, VideoQ incorporates, in addition to traditional keyword search methods, the possibility to search video based on a rich set of visual features; that is, color, texture, shape, motion, and spatio-temporal relationships.

The VideoQ^[161] system extracts regions and objects from the video shots using color and edge information and tracks their motion. Queries are entered in a graphical user interface by specifying the color, texture, the motion of the objects as well as their spatio-temporal relationship to other objects. Exhibit 4.11 shows a snapshot of the VideoQ query interface during the specification of a spatio-temporal query.

Exhibit 4.11: Query mask of VideoQ. ^[162]

Virage ^[163] is a broadly used commercial Video CBR (http://www.virage.com) that includes facilities for video analyzing, annotation, and retrieval. It also provides a speech recognition tool, based on a large set of speaker profiles, to encode content of audio in ASCII (American Standard Code for Information Interchange) text. This encoded data can then be used as the query content. However, Virage does not currently support any spatio-temporal querying.

There are similar CBR technologies currently in use in different applications. ^[164], ^[165], ^[166], ^[167], ^[168] An interesting approach is given by MetaSEEk (http://www.ctr.columbia.edu/MetaSEEk/). It is a meta-search engine for images based on the content of CBIRs that are located at IBM, Virage, and Columbia University servers. Furthermore, MetaSEEk ^[169] allows the search in categories, similar to WebSEEk ^[170] (http://www.ctr.columbia.edu/webseek/).

In this context, Colin C. Venters and Matthew Cooper provided a very comprehensive overview of currently used CBIR systems, ^[171] which can be found online at http://www.jisc.ac.uk/uploaded_documents/jtap-039.doc. In 2000, R.C. Veltkamp and M.Tanase ^[172] compared 43 available products. Their report may be found at http://give-lab.cs.uu.nl/cbirsurvey/. Finally, A.W.M. Smeulders et al. ^[173] overviewed CBIR in their overview paper and discussed open problems, such as the "semantic gap," which exists between low-level and high-level indexing and retrieval (as already discussed in Section 4.1). The focus of these overview articles is on algorithmic details and metadata models for retrieval in not detailed.

4.4.4 Audio CBR

One of the first systems available was QBH (Query by Humming), described at the ACM Multimedia Conference in 1995. ^[174] The system allows the user to query an audio database by humming the desired tune into a microphone. QBH tracks pitches with a modified form of autocorrelation method and converts them into a melody pattern encoded in a stream of U (up), D (down), and S (same) characters. Then, the system uses an approximate string match algorithm to match the patterns against the audio database.

The Melody Tools of MPEG-7 (see Section 2.8) can be used for query by humming. In a simple scenario, the audio input of a humming person is analyzed for the contour and beat (use of the MelodyContour DS) and is then compared with the melodies stored in the database. The song with the most similar contour and beat is retained. Additional MPEG-7 information on the author, the genre, and so forth, are equally returned. In some cases, however, it is useful to employ the MelodySequence DS for audio similarity matching; for instance, to identify melodies similar to humming by imperfect musicians. The MelodySequence DS can also be used to reconstruct a melodic line and lyrics from its MPEG-7 description. A working Query by Humming system can be found at www.musicline.de/de/melodiesuche.

In general, Audio CBR is the task used to find occurrences of a musical query pattern in a music database. A natural way to give such a query pattern is by humming (as described before), but recently, other forms have attracted interest. In the sample scenario from Chapter 1, the query pattern was recorded with the mobile phone. In such a case, the AudioSignature DS is a good representative for the content in the database and for the query pattern, as it provides an unique fingerprint. It has been shown that only 10 seconds of recorded pattern is sufficient to achieve a 90 percent recognition rate. ^[175] Another Audio CBR form is introduced in the Parfait Olé Melody Search Engine (http://www.parfaitole.com/). This engine allows one to identify an unknown melody from its first few notes, provided that one can describe the melody in terms of standard musical pitches. The pitches are allowed to be relative pitches, that is, the exact key need not be known, and the rhythm is ignored.

Jonathan Foote ^[176] gives a good overview of Audio CBR systems. He also realized several Audio CBR systems; in 1997, ^[177] he proposed a system for content-based retrieval of music and audio that can retrieve audio documents by acoustic similarity. In his approach, a template is generated using cepstral coefficients for each audio in the database and for the user's acoustic input. The query template is then matched against templates in the database to determine the query results. Later, he extended his methodology to retrieval by rhythmic similarity. ^[178]

4.4.5 Influence of MPEG-7 on CBR

MPEG-7 has a major influence on CBR implementations. It offers the unique possibility to have a standardized way of describing the low-level content and thus enables the realization of interoperable querying mechanisms. MPEG-7 applications for CBR may be found at http://mpeg-industry.com. Most of them focus on similarity search. Some of them try to integrate semantic meaningful annotation and advanced query mask for this purpose. For instance, Caliph and Emir from the I-Know Center in Austria propose an MPEG-7 based photo annotation and retrieval tool. The basic semantic entities, such as AgentObjectType and EventType, may be entered and reused. SemanticRelations may be graphically specified by connected respective semantic entities. The search mask follows the annotation mask and allows the same graphical concept. Caliph and Emir is available on 90-day license from http://www.know-center.at/en/divisions/div3demos.htm. Another interesting approach is used in the VAnnotator, developed in the IST VIZARD (Video Wizard) project. This MPEG-7-based annotation tool employs the concept of "video lenses" for providing different views on the materials being annotated. There are lenses for viewing and adding textual annotations and lenses that allow viewing the result of image processing algorithms such as cut detection. VAnnotator may be downloaded from http://www.video-wizard.com/index-n.htm on a 30-day trial license.

4.4.6 CBR Online Demos

Finally, the reader is advised to make his or her own judgment by looking at the following list (not complete) of available demos:

Online advanced CBIR systems include:

Amore (Advanced Multimedia Oriented Retrieval Engine), http://www.ccrl.com/amore/: developed by C & C Research Laboratories NEC USA, this is a retrieval engine that adds category and a notion of the importance of a feature to a similarity search supported by a keyword search. The set of images to be searched stem principally from sport, art, and vehicles.

Blobworld, http://elib.cs.berkeley.edu/photos/blobworld/: developed by the Computer Science Division, University of California-Berkeley, this engine performs an automatic segmentation of the query image into regions of interest and lets the user declare the importance of them for the query.

The Department of Computer Science and Engineering, University of Washington, http://www.cs.washington.edu/research/imagedatabase/demo/: this department developed a handful of useful automatic annotation methodologies for CBR. The main focus is on semiautomated recognition of generic object and concept classes in digital images. In addition, a hierarchical multiple classifier methodology provides a learning mechanism for automating the development of recognizers for additional objects and concepts. Online demos are available for the object recognition techniques and the retrieval engine.

LCPD (Leiden 19th-Century Portrait Database), http://ind156b.wi.leidenuniv.nl:2000/: LCPD is interesting from the application point of view. It is developed by the Department of Computer Science, Leiden University, The Netherlands. It offers a variety of search tools, one for studio applications (making carte de visite portraits), one for making family albums, one query by example, a relevance feedback based search, and finally a browsing interface.

MuscleFish.com: this site provides several audio retrieval tools that can analyze audio files, cluster them according to categories, and search for audios that are similar or dissimilar to others. An online demo may be found at http://www.musclefish.com/cbrdemo.html.

Themefinder, http://www.themefinder.org: Themedfinder is a Webbased melodic search tool. The database content consists of hand-picked monophonic incipit from classical and folksong repertoires. Queries are based on pitch information and can be given in many forms; for example, as intervals, scale degrees, pitch classes, or contour patterns. In addition, the search may be limited by entering the name of the composer and a time signature. Only exact matching is available.

Online MPEG-7-based retrieval and indexing systems include:

MPEG-7 Retrieval Demo, http://vision.ece.ucsb.edu/demos.html: from the Department of Electrical and Computer Engineering, University of California, Santa Barbara, this demo exhibits the use of the MPEG-7 Homogeneous Texture Descriptor to a large collection of aerial images. The dataset consists of 54 large aerial images of the Santa Barbara, California, region. Each image is segmented into 1681 tiles using a rectangular grid. Each tile is a gray-scale image of 128× 128 pixels.

The MPEG-7-based SQUID (Shape Queries Using Image Databases), University of Surrey, http://www.ee.surrey.ac.uk/Research/VSSP/imagedb/demo.html: a tool for retrieving images by similar shape. It is based on the MPEG-7 Curvature Scale Space feature, which is a multiscale organization of the inflection points (or curvature zero-crossing points) of the object contour. The database contains about 1100 images of marine creatures. Each image shows one distinct species on a uniform background. Every image is preprocessed to recover the boundary contour, which is then represented by three global shape parameters and the maxima of the curvature zerocrossing contours in its Curvature Scale Space image.

^[144]Amato, G., Mainetto, G., and Savino, P., An approach to a content-based retrieval of multimedia data, Multimedia Tools Appl., 7, 9–36, 1998.

^[145]Rui, Y., Huang, T.S., and Chang, S.-F., Image retrieval: past, present and future. J. Visua Comm. Image Representation, 10, 1–23, 1999.

^[146]Yoshitaka, A. and Ichikawa, T., A survey on content-based retrieval for multimedia databases, IEEE Trans. Knowl. Data Eng., 11, 81–93, 1999.

^[147]Apers, P.M.G. and Kersten, M.L., Content-based retrieval in multimedia databases based on feature models, in Proceedings of the International Conference on Advanced Multimedia Content Processing, Osaka, Japan, November 1998, Springer-Verlag, Heidelberg, LNCS 1554, pp. 119–130.

^[148]Subrahmanian, V.S., Principles of Multimedia Database Systems, Morgan Kaufman Press, 1998.

^[149]Apers, P.M.G., Blanken, H.M., and Houtsma, M.A.W., Multimedia Databases in Perspective. Springer-Verlag, Heidelberg, 1997.

^[150]Rui, Y., Huang, T.S., and Chang, S.-F., Image retrieval: past, present and future. J. Visua Comm. Image Representation, 10, 1–23, 1999.

^[151]Smith, J. and Chang, S.-F., VisualSEEk: a fully automated content-based image query system, in Proceedings of the Fourth ACM Multimedia Conference (MULTIMEDIA'96), New York, November 1996, ACM Press, pp. 87–98.

^[152]Flickner, M., Sawhney, H., Niblack, W., Ashley, J., Huang, Q., Dom, B., Gorkani, M., Hafner, J., Lee, D., Petkovic, D., Steele, D., and Yanker, P., Query by image and video content: the QBIC system. IEEE Comput., 28, 23–32, 1995.

^[153]Das, M., Riseman, E.M., and Draper, B.A., FOCUS: searching for multi-colored objects in a diverse image database, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1997, pp. 756–761.

^[154]Shanbehazadeh, J., Moghadam, A.M.E., and Mahmoudi, F., Image indexing and retrieval techniques: past, present, and next, in Proceedings of SPIE: Storage and Retrieval for Media Databases 2000, Yeung, M.M., Yeo, B.-L., and Bouman, C.A., eds., Vol. 3972, 2000, pp. 461–470.

^[155]Kelly, P.M. and Cannon, T.M., Query by image example: the CANDID approach, in Proceedings of SPIE: Storage and Retrieval for Media Databases 1995, 1995, pp. 238–248.

^[156]Flickner, M., Sawhney, H., Niblack, W., Ashley, J., Huang, Q., Dom, B., Gorkani, M., Hafner, J., Lee, D., Petkovic, D., Steele, D., and Yanker, P., Query by image and video content: the QBIC system. IEEE Comput., 28, 23–32, 1995.

^[157]Ortega, M., Rui, Y., Chakrabarti, K., Porekaew, K., Mehrotra, S., and Huang, T.S., Supporting ranked Boolean similarity queries in MARS, IEEE Trans. Knowl. Data Eng., 10, 905–925, 1998.

^[158]Veltkamp, R.C. and Tanase, M., Content-based image retrieval systems: a survey Technical Report UU-CS-2000-34, Utrecht University, Utrecht, March 2001.

^[159]Chang, S.-F., Chen, W., Meng, H., Sundaram, H., and Zhong, D., VideoQ: an automated content-based video search system using visual cues, in Proceedings of the 5th ACM International Multimedia Conference (MULTIMEDIA '97), New York/Reading, November 1998. ACM Press/Addison-Wesley, pp. 313–324.

^[160]Bach, J.R., The VIRAGE image search engine: an open framework for image management, in Proceedings of SPIE-96, 4th Conference on Storage and Retrieval for Still Image and Video Databases, San Jose, 1996, pp. 76–87.

^[161]VideoQ has been developed within the ADVENT Project of the Image and Advanced TV Lab at Columbia University. A Web demonstration may be found at http://www.ctr.columbia.edu/VideoQ/.

^[162]Chang, S.-F., Chen, W., Meng, H., Sundaram, H., and Zhong, D., VideoQ: an automated content-based video search system using visual cues, in Proceedings of the 5th ACM International Multimedia Conference (MULTIMEDIA '97), New York/Reading, November 1998. ACM Press/Addison-Wesley, pp. 313–324.

^[163]Bach, J.R., The VIRAGE image search engine: an open framework for image management, in Proceedings of SPIE-96, 4th Conference on Storage and Retrieval for Still Image and Video Databases, San Jose, 1996, pp. 76–87.

^[164]Das, M., Riseman, E.M., and Draper, B.A., FOCUS: searching for multi-colored objects in a diverse image database, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1997, pp. 756–761.

^[165]Shanbehazadeh, J., Moghadam, A.M.E., and Mahmoudi, F., Image indexing and retrieval techniques: past, present, and next, in Proceedings of SPIE: Storage and Retrieval for Media Databases 2000, Yeung, M.M., Yeo, B.-L., and Bouman, C.A., eds., Vol. 3972, 2000, pp. 461–470.

^[166]Kelly, P.M. and Cannon, T.M., Query by image example: the CANDID approach, in Proceedings of SPIE: Storage and Retrieval for Media Databases 1995, 1995, pp. 238–248.

^[167]Chang, S.-F., Chen, W., Meng, H., Sundaram, H., and Zhong, D., VideoQ: an automated content-based video search system using visual cues, in Proceedings of the 5th ACM International Multimedia Conference (MULTIMEDIA '97), New York/Reading, November 1998. ACM Press/Addison-Wesley, pp. 313–324.

^[168]Bach, J.R., The VIRAGE image search engine: an open framework for image management, in Proceedings of SPIE-96, 4th Conference on Storage and Retrieval for Still Image and Video Databases, San Jose, 1996, pp. 76–87.

^[169]Beigi, M., Benitez, A.B., and Chang, S.-F., Metaseek: a content-based metasearch engine for images, in Proceedings of the International Conference of Storage and Retrieval for Image and Video Databases (SPIE), 1998, pp. 118–128.

^[170]Smith, J. and Chang, S.F., Visually searching the web for content, IEEE MultiMedia, 4, 12–20, 1997.

^[171]Venters, C.C. and Cooper, M., A review of content-based image retrieval systems. Technical Report JTAP-054, JISC Technology Applications Programme (JTAP) at University Manchester, Manchester, 1999.

^[172]Veltkamp, R.C. and Tanase, M., Content-based image retrieval systems: a survey Technical Report UU-CS-2000-34, Utrecht University, Utrecht, March 2001.

^[173]Smeulders, A.W.M., Worring, M., Santini, S., Gupta, A., and Jain, R., Content based image retrieval at the end of the early years. IEEE Trans. Pattern Anal. Machine Intell., 22, 1349–1380, 2000.

^[174]Ghias, A., Logan, J., Chamberlain, D., and Smith, B.C., Query by humming-musical information retrieval in an audio database, in Proceedings of the ACM Multimedia, San Francisco, 1995, pp. 231–236.

^[175]Crysandt, H. and Wellhausen, J., Music classification with MPEG-7, Proceedings of the SPIE International Conference on Electronic Imaging—Storage and Retrieval for Media Databases, Santa Clara, January 2003.

^[176]Foote, J., An overview of audio information retrieval, ACM Multimedia Syst., 7, 2–10, 1999.

^[177]Foote, J.T., Multimedia Storage and Archiving Systems II, Proceedings of SPIE, Kuo C.C.J., et al., eds., Vol. 3229, 1997, pp. 138–147.

^[178]Foote, J., Cooper, M., and Nam, U., Audio retrieval by rhythmic similarity, in Proceedings of the Third International Symposium on Musical Information Retrieval, Paris, September 2002.