Handbook of Video Databases: Design and Applications (Internet and Communications)

2. XML DTD for Animation Databases

An XML DTD specifies the allowed sets of elements, the attributes of each element, and the valid content of each element. By defining a DTD that incorporates the superset of the nodes specified in VRML, MPEG-4, SMIL, and PowerPoint, XML can adeptly handle multiple databases of animations, as shown in Figure 17.3.

Figure 17.3: Proposed XML mediator for databases of animations

The DTD for the XML mediator is defined based on augmented scene graphs proposed in [1]. The scene graph is a hierarchical graph structure for 3D models and its generalization, to represent animations [12]. In a scene graph, complete information is defined for all 3D models and other entities that affect the display: lights, colors, sounds, background, and views. Typically, leaf nodes represent geometry of 3D models (shape, structure, size, position, connection, etc.) and interior nodes represent other information such as transformations applied to its child nodes. A sub-tree in the scene graph may represent a complex 3D model in the scene with multiple sub-models. If multiple copies of a complex model are going to populate the scene, then it is efficient to refer (link) the one sub-tree in different parts of the graph. Also, any change in that subtree will be reflected in all references. One example of a scene graph is illustrated in Figure 17.4.

Figure 17.4: An example of scene graph illustrating the model

The basic scene graph was augmented in [1] to represent animations of 3D models by introducing a node, called Interpolator node, as an internal node to represent motion sequences of 3D models, e.g., a walking sequence for a human body model. An Interpolator node is associated with a 3D model that can be a complex model such as an articulated figure linked by joints of multiple degrees of freedom (DOFs). Each joint can have 6 DOFs, 3 for translation and 3 for rotation. An articulated figure such as a human body may have hundreds of DOFs. In an Interpolator node, key frames are used to represent the motion for all DOFs: key [k1, k2, …, ki, …, km], keyvalue [v1, v2, …, vi,…, vm], where ki is the key frame number or key frame time. The key frame number can be mapped to the time according to the different frame rate standards such as PAL, SECAM, and NTSC. vi is a vector in the motion configuration space: vi=[ui,1, ui,2, …, ui,j, …, ui,n], where ui,j is a key value of DOF j (in displacement for translational DOFs and angle value for rotational DOFs) at ki. The m and n are the numbers of key frames and DOFs of the model respectively. The key and keyvalue of an Interpolator node define animation of a 3D model.

Most animation formats are defined based on the above scene graph structure. VRML uses nearly 54 types of scene graph nodes to describe the animation models and sequences [7]. These nodes include SFNode (the single field node for scalars), MFnode (the multifieldnode for vectors) with field statements that, in turn, contain node (or USE) statements. Each arc in the graph from A to B means that node A has an SFNode or MFNode field whose value directly contains node B. Prototypes and route nodes are some of the other nodes in VRML. MPEG-4 BIFS can be considered as a compressed version of VRML apart from having nodes for body and facial animations. MPEG-4 has 100 nodes defined for animation sequences [7, 9]. Synchronized Multimedia Integration Language (SMIL), a standard proposed by World Wide Web Consortium (W3C), also describes formats for representing animation sequences. Apart from providing constructs for describing spatial and temporal relationships, SMIL also defines animations as a time-based function of a target element (or more specifically of some attribute of the target element, the target attribute). PowerPoint, the popular software from Microsoft for presentations, provides facilities for including sound and video files in a presentation. It also provides support for animations in the presentation. The PowerPoint design is based upon the Component Object Model, which can be easily mapped on to the XML DTD.

2.1 Mapping XML Animation to Database

We adopt a relational database approach for representing and storing the DTD of the XML mediator. As shown in the ER diagram of Figure 17.5, the geometric models and motions of each object are stored in two separate tables ("Motion" and "Model") of the database and are linked by the SceneID, the primary key. The model table and the motion table are not the only tables in the database. There are separate tables for storing the interpolator, sensor, and route nodes, which are the main nodes to be considered while looking for a motion. To handle the case of model comparison, information regarding the nodes representing the model is stored in the Model_Nodes table. The scene also has its table in the database, which is linked to the XML_text table by a Transformation relationship and also to the VRML_MPEG4 and PowerPoint tables. The Model_Nodes and PowerPoint database tables have an entry for metadata that provide descriptions of objects and presentations. These descriptions include metadata of the type of object (man, woman, cow, etc.), nature of object (color, size, shape), as well as descriptions of content in presentations such as including text/MPEG-4/VRML/etc. other elements' content, and the order in which they are represented.

Figure 17.5: The ER diagram of the XML Mediator DTD

The VRML_Text/MPEG4_Text/PwerPt_Text are mapped on to the XML_Text by parsing the content according to the DTD specified. This forms a scene and hence is related to the Scene table from where the mapping on to the respective nodes is done. Names for different models and motions are assigned based on the content in the nodes, i.e., the metadata associated with the nodes. In the case of motion table, identifiers of the interpolator, sensor, and route nodes are used for identifying the metadata (and hence the name). For the model table, identifiers of the nodes that contribute towards the generation of the model are used for metadata (and hence the name) identification. Metadata identification is discussed in more detail, in Section 3.1.2. The motion/model identifiers and the motion/model name form the primary key to identify respective models/motions. These fields characterize a general animation scene in any format. (SMIL being an XML-based language can easily be incorporated into the above ER model. We do not mention it here explicitly, since the current implementation of the animation toolkit does not handle SMIL animations.)

Категории