Hash Trees

2017-11-03 09:05:07

Instead of having directory entries in an unsorted order, a hash tree can be used to sort some of the entries. Each block in the directory corresponds to a node in the tree. The non-leaf nodes have data structures that point to the next layer. In a smaller directory, there are two layers, and the first block is the only node at the top layer.

To show which blocks correspond to the nodes in the next layer, there are node descriptor data structures. Before the node descriptors is a header data structure, and it starts following the '..' directory entry. The node descriptor header has the fields given in Table 15.25.

Table 15.25. Data structure for the hash tree node descriptor header.
Byte Range	Description	Essential
03	Unused	No
44	Hash version	Yes
55	Length of this structure	Yes
66	Levels of leaves	No
77	Unused	No

Following the header is a list of node descriptors that identifies what hash values are in each block. Each entry in the list has the data structure given in Table 15.26.

Table 15.26. Data structure for the hash tree node descriptor entries.
Byte Range	Description	Essential
0 3	Minimum hash value in node	Yes
4 7	Block address	Yes

Each entry contains the smallest hash value and the directory block of the node. The first node descriptor does not need a minimum because it should be 0. Therefore, those four bytes are used for another purpose. They store the current number of node descriptors and the maximum number of node descriptors that can fit in the block. Therefore, the first node descriptor has the fields given in Table 15.27.

Table 15.27. Data structure for the first node descriptor entry.
Byte Range	Description	Essential
01	Maximum number of node descriptors	No
23	Current number of node descriptors	Yes
47	Block address of first node	Yes

The remainder of the block after the last node descriptor typically contains data from previous directory entries.

Here we see the contents of the first block in a large directory using a hash index:

# icat -f linux-ext3 ext3-3.dd 16 0000000: 1000 0000 0c00 0100 2e00 0000 0200 0000 ................ 0000016: f403 0200 2e2e 0000 0000 0000 0208 0000 ................ 0000032: 7c00 0400 0100 0000 3295 6541 0400 0000 |.......2.eA.... 0000048: 88d5 fa92 0200 0000 86e7 50be 0300 0000 ..........P..... 0000064: 3738 3930 2e31 3233 3400 0000 1200 0000 7890.1234.......

In this output, bytes 0 to 9 are for the '.' directory entry, and bytes 12 to 23 are for the '..' entry. Notice that the record length field of the '..' entry in bytes 16 to 17 is 1,012 bytes (0x03f4). That points to the end of the 1,024-byte block.

The hash index header starts at byte 24, and the first four bytes are unused. Byte 28 shows us that hash version 2 is being used, and byte 29 shows us that the structure is eight bytes long. The first node descriptor is in bytes 32 to 39. Bytes 32 to 33 show that it can have 124 (0x7c) descriptors in the block, but bytes 34 to 35 show that only 4 are used. Bytes 36 to 39 show that block 1 of the directory contains the first node.

The second node descriptor starts in byte 40. We see that this node contains files with a file name hash greater than 0x41659532, and the names are located in block 4 of the directory. To find the upper bound of the hashes in this node, we look at the entry for the next node, which starts in byte 48, and see that its hash value is 0x92fad588. The entry for the fourth node starts in byte 63.

Категории