Handbook of Video Databases: Design and Applications (Internet and Communications)

Shahram Ghandeharizadeh Department of Computer Science University of Southern California Los Angeles, California, USA <shahram@cs.usc.edu>

Seon Ho Kim Department of Computer Science University of Denver Denver, Colorado, USA <seonkim@cs.du.edu>

1. Introduction

Continuous media objects, audio and video clips, are large in size and must be retrieved at a pre-specified rate in order to ensure their hiccup-free display [9, 29]. Even with 100 gigabyte disk drives, a video library consisting of 1000 MPEG-2 titles (with an average display time of 90 minutes) requires thirty such disks for data storage. [1] Over time such a storage system will evolve to consist of a heterogeneous collection of disk drives. There are several reasons why a system administrator might be forced to buy new disk drives over time. First, the application might require either a larger storage capacity due to introduction of new titles or a higher bandwidth due to a larger number of users accessing the library. Second, existing disks might fail and need to be replaced. [2] The system administrator may not be able to purchase the original disk models due to the technological trends in the area of magnetic disks: Approximately every 12 to 18 months, the cost per megabyte of disk storage drops by 50%, its storage space doubles in size, and its average transfer rate increases by 40% [17, 24]. Older disk models are discontinued because they cannot compete in the market place. For example, a single manufacturer introduced three disk models in the span of six years, a new model every two years; see Table 35.1. The oldest model (introduced in 1994) costs more than the other two while providing both a lower storage capacity and a lower bandwidth.

Table 35.1: Three different Seagate disk models and their zone characteristics.

Seagate Barracuda 4LP

Introduced in 1994, 2 Gbytes capacity, with a $1,200 price tag

Zone Id

Size (MB)

Track Size (MB)

No. of Tracks

Rate (MB/s)

0

506.7

0.0908

5579

10.90

1

518.3

0.0903

5737

10.84

2

164.1

0.0864

1898

10.37

3

134.5

0.0830

1620

9.96

4

116.4

0.0796

1461

9.55

5

121.1

0.0767

1579

9.20

6

119.8

0.0723

1657

8.67

7

103.2

0.0688

1498

8.26

8

101.3

0.0659

1536

7.91

9

92.0

0.0615

1495

7.38

10

84.6

0.0581

1455

6.97

Seagate Cheetah 4LP

Introduced in 1996, 4 Gbytes capacity, with a $1,100 price tag

Zone Id

Size (MB)

Track Size (MB)

No. of Tracks

Rate (MB/s)

0

1017.8

0.0876

11617

14.65

1

801.6

0.0840

9540

14.05

2

745.9

0.0791

9429

13.23

3

552.6

0.0745

7410

12.47

4

490.5

0.0697

7040

11.65

5

411.4

0.0651

6317

10.89

6

319.6

0.0589

5431

9.84

Seagate Barracuda 18

Introduced in 1998, 18 Gbytes capacity, with a $900 price tag

Zone Id

Size (MB)

Track Size (MB)

No. of Tracks

Rate (MB/s)

0

5762

0.1268

45429

15.22

1

1743

0.1214

14355

14.57

2

1658

0.1157

14334

13.88

3

1598

0.1108

14418

13.30

4

1489

0.1042

14294

12.50

5

1421

0.0990

14353

11.88

6

1300

0.0923

14092

11.07

7

1268

0.0867

14630

10.40

8

1126

0.0807

13958

9.68

With a heterogeneous disk subsystem, a continuous media server must continue to deliver the data to a client at the bandwidth pre-specified by the clip. For example, if a user references a movie that requires 4 megabits per second (Mbps) for its continuous display, then, once the system initiates its display, it must be rendered at 4 Mbps for the duration of its presentation. [3] Otherwise, a display may suffer from frequent disruptions and delays, termed hiccups. In this paper, we investigate techniques that ensure continuous display of audio and video clips with heterogeneous disk drives. These are categorized into two groups: partitioning and non-partitioning techniques. With the partitioning techniques, disks are grouped based on their model. To illustrate, assume a system that has evolved to consist of three types of disks: Seagate Barracuda 4LP, Seagate Cheetah 4LP, Seagate Barracuda 18; see Table 35.1. With this approach, the system constructs three disk groups. Each group is managed independently. A frequently accessed (hot) clip might be replicated on different groups in order to avoid formation of hot spots and bottlenecks [7, 16, 28]. With the non-partitioning techniques, the system constructs a logical representation of the physical disks. This logical abstraction provides the illusion of a homogeneous disk subsystem to those software layers that ensure a continuous display.

In general, the non-partitioning schemes are superior because the resources (i.e., bandwidth and storage space) are combined into a unified pool, eliminating the need for techniques to detect bottlenecks and replicate data in order to eliminate detected bottlenecks. Hence, non-partitioning techniques are sensitive to neither the frequency of access to objects nor the distribution of requests as a function of time. Moreover, scheduling of resources is simple with non-partitioning schemes. With the partitioning techniques, the system must monitor the load on each disk partition when activating a request in order to balance the load across partitions evenly. This becomes a difficult task when all partitions are almost completely utilized. The disadvantage of non-partitioning techniques is as follows. First, the design and implementation of availability techniques that guarantee a continuous display in the presence of disk failures becomes somewhat complicated. Second, deciding on the configuration parameters of a system with a non-partitioning technique is not a trivial task.

Figure 35.1 shows a taxonomy of the possible non-partitioning approaches. Among them, many studies [20, 23, 2, 27, 3, 13, 14] have described techniques in support of a hiccup-free display assuming a fixed transfer rate for each disk model. One may apply these studies to multi-zone disk drives by assuming the average transfer rate (weighted by space contributed by each zone) for each disk model. The advantage of this approach is its simplicity and straightforward implementation. However, its performance may degrade when requests arrive in a manner that they reference the data residing in the slowest disk zones.

Figure 35.1: Taxonomy of techniques.

Disk Grouping and Disk Merging techniques [30] support heterogeneous collection of single-zone disks with deterministic performance. Another approach is to utilize multi-zone disks [4] as they are. For example, the Seagate Barracuda 18 provides 18 gigabyte of storage and consists of 9 zones with each zone providing a different data transfer rate (see Table 35.1). To the best of our knowledge, there are only four techniques in support of hiccup-free display with multi-zone disk drives: IBM's Logical Track [25], Hewlett Packard's Track Pairing [4], and USC's FIXB [11] and deadline driven [12] techniques. Logical Track, Track Pairing, and FIXB provide deterministic performance guarantee while deadline driven approach only stochastically guarantees a hiccup-free display. Studies that investigate stochastic analytical models in support of admission control with multi-zone disks, e.g., [22], are orthogonal because they investigate only admission control (while the above four techniques describe how the disk bandwidth should be scheduled, the block size for each object, and admission control). Moreover, we are not aware of a single study that investigates hiccup-free display using a heterogeneous collection of multi-zone disk drives.

This study extends the four techniques to a heterogeneous disk subsystem. While these extensions are novel and a contribution in their own right, we believe that the primary contribution of this study is the performance comparison of these techniques and quantification of their tradeoffs. This is because three of the described techniques assume certain characteristics about the target platform. Our performance results enable a system designer to evaluate the appropriateness of a technique in order to decide whether it is worthwhile to refine its design by eliminating its assumptions.

The rest of this chapter is organized as follows. Section 2 introduces hiccup-free display techniques for a heterogeneous disk subsystem. Section 3 quantifies the performance tradeoffs associated with these techniques. Our results demonstrate tradeoffs between cost per simultaneous stream supported by a technique, its startup latency, throughput, and the amount of disk space that it wastes.

For example, while USC's FIXB results in the best cost/performance ratio, the potential maximum latency incurred by each user is significantly larger than the other techniques. The choice of a technique is application dependent: One must analyze the requirements of an application and choose a technique accordingly. For example, with nonlinear editing systems, the deadline driven technique is more desirable than the other alternatives because it minimizes the latency incurred by each user [19]. Our conclusions and future research directions are contained in Section 4.

Table 35.2: List of terms used repeatedly in this chapter.

Terms

Definition

K

Number of disk models

Di

Disk model i, 0 ≤ i < K

qi

Number of disks for disk model Di

jth disk drive of disk model Di, 0 ≤ j < qi

mi

Number of zones for each disk of disk model Di

Zi()

Zone i of disk , 0 ≤ i < mi

#TRi

Number of tracks in disk model i

NT(Zi())

Number of tracks in zone i

PTi(Zj())

Track i of zone j, 0 ≤ i < NT(Zj())

LTi

Logical track i

AvgRi

Average transfer rate of disk model i

Bi

Block size for disk model i

Tw_Seek

Worst seek time of a zone (including the maximum rotational latency)

Tcseek

Seek time required to make a complete span

R(Zi)

Transfer rate of Zi

S(Zi)

Storage capacity of Zi

Tscan

Time to perform one sweep of m zones

TMUX(Zi)

Time to read N blocks from zone Zi

Rc

Display bandwidth requirement, consumption rate

N

Max. number of simultaneous displays, throughput

l

Max. startup latency

[1]Assuming an average bandwidth requirement of 4 Mbps for each clip, the system designer might utilize additional disk drives to satisfy the bandwidth requirement of this library, i.e., number of simultaneous users accessing the library.

[2]Disks are so cheap and common place that replacing failed disks is cheaper than fixing them.

[3]This study assumes constant bit rate media types. Extensions of this work in support of variable bit rate can be accomplished by extending our proposed techniques with those surveyed in [1].

[4]More detailed discussion about multi-zone disks can be found in [18, 10].

Категории