File Name Category
The file name category of data includes the data structures that store the name of each file and directory. This section describes where the data are stored and how to analyze them.
Overview
ExtX has several methods for assigning names to a file or directory, and this section examines three of them. The first subsection looks at directory entries, which are the basic data structure used to assign names. We next look at hard and soft links and then hash trees.
Directory Entries
An ExtX directory is just like a regular file except that it has a special type value in its inode. Directories allocate blocks that will contain a list of directory entry data structures. A directory entry is a simple data structure that contains the file name and the inode address where the file's metadata can be found. The size of the directory corresponds to the number of blocks that it has allocated and is irrelevant to how many files actually exist.
Every directory starts off with directory entries for the '.' and '..' directories, which are for the current and parent directory. Following them are entries for every file and subdirectory in the directory. The root directory is always located in inode 2.
A directory entry has a dynamic length because the file name can be anywhere from 1 to 255 characters long. Therefore, the data structure has a field that identifies how long the name is and where the next directory entry can be found. The length of the entry is rounded up to a multiple of four. We can see this in Figure 14.6(A), where we have three files in a directory. The first two entries are for '.' and '..' and the last entry points to the end of the allocated block. The space after the c.txt file is unused.
Figure 14.6. Directory entries contain the file name and inode. They also contain a pointer to the next entry. Unused entries are skipped over by increasing the pointer of the previous entry.
When a file or directory is deleted, its name needs to be modified so that the OS does not print it. The OS hides a directory entry by increasing the record length of the previous directory entry so that it points to the entry after the one being hidden. We can see this in Figure 14.6(B) where the b.txt file was deleted and the pointer in a.txt was incremented to point to c.txt. A directory listing would skip over it, but the data still exists.
When a new entry is created, the OS examines each existing entry and compares its record length with the name length. Each directory entry has eight bytes of static fields in addition to the name, so the minimum record length can be determined by adding eight to the name length and rounding up to a multiple of four. If the record length of an entry is larger than it needs to be and the difference is more than the size of the entry being added, the entry is placed in the unused space. For example, consider a new directory that has only a '.' and '..' entries in a 4,096-byte block. There will be two entries with the values given in Table 14.1.
Name |
Name Length |
Record Length |
---|---|---|
. |
1 |
12 |
.. |
2 |
4,084 |
The final '..' entry has a record length of 4,084 bytes because it needs to point to the end of the block, but it needs only 12 bytes. Therefore, the new entry will be added 12 bytes after the start of the '..' entry, and the record length of the new entry will point to the end of the block. The new entries would have the fields given in Table 14.2.
Name |
Name Length |
Record Length |
---|---|---|
. |
1 |
12 |
.. |
2 |
12 |
File1.dat |
8 |
4,072 |
There are actually two versions of directory entry structures. An older version has only the name, inode address, and length values. The updated version uses one of the bytes in the file name length field and uses it to store the file type, such as file, directory, or character device. We will later see how this can be used to detect when an inode has been reallocated since a file name was deleted. The data structures of both types of directory entries are given in the next chapter.
Here is the output of running fls on a directory in our file system image. It shows one deleted file, the file with the "*" before it. The first column shows the file type according to the directory entry and then according to the inode. Notice that the types for the deleted file are still the same, which means that the inode may not have been reallocated yet.
# fls -f linux-ext3 a ext3.dd 69457 d/d 69457: . d/d 53248: .. r/r 69458: abcdefg.txt r/r * 69459: file two.dat d/d 69460: subdir1 r/r 69461: RSTUVWXY
Links and Mount Points
ExtX provides both hard and soft links so that users can define multiple names for a file or directory. A hard link is an additional name for a file or directory in the same file system. After a hard link is created, you will not be able to tell if it is the original name or a link. To make a hard link, the OS allocates a new directory entry and points it to the original inode. The link count in the inode is incremented by one to account for the new name. A file will not be deleted until all its hard links are deleted.
Note that the '.' and '..' entries in each directory are hard links to the current and parent directory. Therefore, the link count for a directory is equal to at least two plus the number of subdirectories it has.
Soft links are also a second name for a file or directory, but they can span different file systems. The OS creates a soft link using a symbolic link, which is a special type of file. The full path of the destination file or directory is stored in either blocks allocated to the file or in the inode if the path is less than 60 characters long. The next chapter shows example data structures that contain symbolic links.
We can see an example of hard and soft links in Figure 14.7. Part A shows a hard link named hardlink.txt that points to the file1.txt file. Part B shows a soft link to the same file, but this time there is another level of indirection. softlink.txt has its own inode that contains the path of the file. Notice that in Part B both inodes have a link count of one. In reality, the symbolic link would store the /file1.txt address in its block pointers because the path is shorter than 60 characters.
Figure 14.7. An example of A) a hard link and B) a soft link to the 'file1.txt' file.
In Unix, directories can be used for both storing files and volume mount points, as we discussed in Chapter 4. Consider a directory dir1 that is in a file system named FS1. If file system FS2 is mounted on the dir1 directory, when a user changes into that directory and lists the contents, the files from FS2 are shown. Even if the dir1 directory has its own files in FS1, they will not be shown when FS2 is mounted on it. We can see this in Figure 14.8, where part A shows the three pictures in dir1 and part B shows that dir1 now contains the three files in the root directory of volume FS2.
Figure 14.8. Example where a directory in FS1 contains three files, but when FS2 is mounted on the directory, they are not seen.
For investigators, this means that you need to know where file systems were mounted. If you are looking for a specific file, you might need to reference several file systems before you find the file because different directories could have been on different volumes. Many current post-mortem investigation tools do not show volumes at their mount point, and therefore you will need to determine which volume should be there. On the plus side, because the tools do not show volumes at their mount points, you can see the directory contents of the mount points. One hiding technique is to create files in a directory and then mount a volume on the directory so that a casual observer would not notice them.
Hash Trees
When the file system is created, the user can choose to use a hash tree to organize the files instead of the unsorted list that I just described. If a file system is using hash trees, then a compatible feature flag will be set in the superblock. The hash trees still use directory entry data structures, but they are in a sorted order.
The hash trees in ExtX are similar to the B-trees that were discussed in Chapter 11, "NTFS Concepts," so refer to that section for an overview of how trees are used in a directory. The major difference between hash and B-trees is that hash trees sort the files based on a hash of the file name and not based on the name itself. Note that there is also experimental support for B-Trees in ExtX that are like the NTFS B-Trees, but we do not discuss them in this chapter because they are not yet standard.
If a directory is using a hash tree, it will have multiple blocks and each will be a node in the tree. Each node contains the files whose hash value is in a given range. The first block of the directory is the root node, and it contains the '.' and '..' directory entries. The rest of the first block contains node descriptors, which contain a hash value and a block address. The OS uses the node descriptors to determine to which block it should jump for a given hash value.
We can see this in Figure 14.9 where we have several files in two leaves. The first block contains the header and node descriptors, and the second and third blocks contain the file directory entries. An OS that did not know about hash trees would process the entries in all the blocks without knowing that they were in a sorted order.
Figure 14.9. A directory with hash trees and two leaves. The tree uses directory entries so it can be processed as a normal directory.
There can be up to three layers of nodes in a hash index tree. The data structures for the node descriptors and an actual directory are given in Chapter 15.
Allocation Algorithms
When a new file name is created, Linux uses a first-available strategy. The OS starts at the beginning of the directory and examines each directory entry. Using the name length, it calculates how long the entry needs to be. It compares that with the actual record length. If they are different, it assumes that either it is at the end of a block or that the record length was increased to cover a deleted entry. In either case, the OS tries to add the name in the unused area. If no unused areas exist that are large enough for the name, then the name is appended to the list. New blocks are added as needed, and they are wiped before use. Linux will not allow an entry to cross a block boundary. Other OSes could choose a different strategy. If hash trees are being used, the file is added to the block that corresponds to the file's hash value.
When a file is deleted, the record length of the previous entry is incremented so that it points to the entry after the one being deleted. That is the only action that places the entry in an unallocated status. Linux will clear the inode pointer in an Ext2 file system, but not an Ext3 file system. Unused entries are not rearranged to compress the size of a directory.
Analysis Techniques
Analysis of the file name category of data involves listing the names in a directory so that we can find a specific files or files that have a given pattern. The first step in this process is to locate the root directory, which is easy in ExtX because it is always located in inode 2. Directories are similar to files, except that they have a special type set in their inode. After locating the directory content, we process it as a sequence of directory entry data structures. To examine only the allocated names, we process a directory entry structure, jump ahead by its reported size, and process the next entry. This process repeats until the end of the block.
If we want to view unallocated file names as well, we ignore the reported size of the entry and calculate how long the entry should be and advance to that point. For example, if the last character of a name is in byte 34, we advance to byte 36. After we advance to the boundary, we apply the directory entry data structure and perform sanity checks to determine if the data could have been for a directory entry. If it is, it is processed, and if not, we advance four more bytes and test that location. This process eventually brings us to where the previous directory entry would have pointed us.
Figure 14.10 shows an example where we have two unallocated directory entries in between two allocated entries. There is unused space between each of the directory entries, so simply advancing to the next 4-byte boundary after each entry would not find the names.
Figure 14.10. A list of directory entries where two unallocated names are in between two allocated names, and we must advance through the unused space to find the unallocated names.
The allocation status of a directory entry structure is determined based on whether it is pointed to by an allocated entry. The first two entries in each directory will always be allocated because they are for the '.' and '..' entries. When we find a file in which we are interested, we can look up its metadata using the inode address.
In some cases, we might want to search the file system for blocks that were previously used by a directory. To do so, we should examine the first 24 bytes of each block to determine if the '.' and '..' entries exist. If so, this block was the first block in a directory. Linux always allocates directory entries on block boundaries, so a more general search would examine the first bytes for any file name, not only '.'.
It might be possible to infer about the order in which files were deleted by using the pointer values. Figure 14.11 shows the six possible combinations of how three consecutive files could be deleted. The top of the figure gives the starting state where there are four allocated directory entries. The number below each block is the "address" of the entry, which is there to make it easier to describe each scenario. Each shaded block is an unallocated entry, and the number corresponds to the order in which it was deleted. For example, in the (A) scenario entry 1 was deleted first, and the length of entry 0 was increased to point entry 2. Entry 2 was then deleted, and the length of entry 0 was increased to point to entry 3. Lastly, entry 3 was deleted, and the length of entry 0 was increased to point to the entry after 3. The state shown (A) is unique among the other combinations of how those files can be deleted.
Figure 14.11. The six combinations of the relative order that three consecutive directory entries could be unallocated. Only the 1-3-2 and 2-3-1 sequences have the same final state.
If we look at the final states of the six scenarios, only (B) and (D) are the same. In both of these scenarios, we can still determine that the middle file was the last to be deleted. We will see an example of this analysis procedure in the following scenario section.
Analysis Considerations
Deleted file names are easy to locate in ExtX, and the Ext3 inode number is not cleared by Linux, so you might also be able to obtain temporal information about when the file name was deleted. In Linux, the directory entry structure will remain in the unallocated state until a new file is created whose name is the same length or smaller. Therefore, a file with a short name might not have its directory entry overwritten as quickly as a file whose name is long. Other OSes could use different allocation methods or even compact the directories to make them smaller. The fsck tool in Linux can repackage directories to make them smaller and remove the unused space.
When a deleted file name is found, care must be taken when analyzing it. If the file's inode has been reallocated, the metadata is no longer relevant to the deleted name. There is no easy method for determining if an inode has been reallocated since the file name was deleted. One method is to use the file type value in the directory entry and compare it with the type in the inode. If the directory entry type is for a directory and the inode has a regular file, it is likely that the inode has been reallocated. This is why the fls output in TSK gives both the directory entry and inode types.
It could be possible for data to be hidden in directory entry structures. The space between the last directory entry and the end of the block is unused and could contain data. This is especially true when the hash trees are used because the first block contains a small amount of administrative data, and the rest is unused. This is a dangerous hiding technique, though, because the OS could overwrite it when a new file name is created.
Analysis Scenarios
To show how we can use the low-level details of ExtX directory entries, two example scenarios are given in this section. The first identifies the original location of a file, and the second identifies a possible order in which files were deleted.
Source of a Moved File
While investigating a Linux system that has been compromised, we come across a file called snifferlog-1.dat that contains network packets. The other files in the directory have names similar to log-001.dat and do not contain network data. The listing is shown here:
# fls -f linux-ext3 ext3-8.dd 1840555 r/r 1840556: log-001.dat r/r 1840560: log-002.dat r/r 1840566: log-003.dat r/r 1840569: log-004.dat r/r 32579: snifferlog-1.dat r/r 1840579: log-005.dat r/r 1840585: log-006.dat
To find the executable that could have created this file, we search for the full path of the file. Executable files sometimes contain the names of the files that they open, but our search is unsuccessful. It is trivial for an executable to obfuscate the names of the files that it opens. We are about to move on to another part of the system when we notice the odd sequence of inode addresses in the listing.
The parent directory and the log files all have inode addresses around 1,840,500, but the snifferlog-1.dat file has an address of 32,579. We know from the allocation strategy of Linux that files are typically allocated an inode in the same block group as the parent directory. Therefore, snifferlog-1.dat was either originally allocated to a different parent directory and moved to its current location, or it was created in the current directory but the block group was full.
We look at the fsstat output and determine that the log directory is in block group 113, which has 99% of its inodes and 48% of its blocks free. Therefore, it is unlikely that it was full when the file was created unless there were a lot of files that were deleted since then.
Group: 113: Inode Range: 1840545 - 1856832 Block Range: 3702784 - 3735551 Free Inodes: 16271 (99%) Free Blocks: 15728 (48%)
We now investigate the sniffer log inode, which is 32,579, in more detail and determine that it belongs to block group 2.
Group: 2: Inode Range: 32577 - 48864 Block Range: 65536 - 98303 Free Inodes: 16268 (99%) Free Blocks: 0 (0%)
One theory is that it was created in a directory in block group 2 and moved to the directory in block group 113. Therefore, we will look for directories in block group 2 that could have been the parent directory for snifferlog-1.dat. We will do this with the ils tool in TSK. ils lists details about inodes in a given range, and we will supply the range of the block group and filter out all non-directory entries. The -m flag is given so that the mode will be converted to a human readable format, and the -a flag is given to list only allocated inode entries. We also use grep to filter out all entries that are not directories (it is using the fifth column).
# ils -f linux-ext3 -m -a ext3-8.dd 32577-48864 | grep "|d" |0|32577|16893|drwxrwxr-x|4|500|500|0|4096| |0|32655|16893|drwxrwxr-x|2|500|500|0|4096| |0|32660|16877|drwxr-xr-x|2|500|500|0|4096|
The first column is a fake name for each entry where it is "dead" if the entry is unallocated and "alive" if the entry is allocated. The third column is the inode address. This output is normally processed using the mactime tool to make timelines, so it is not user friendly.
We view the three allocated directories using fls. The directory at inode 32577 is the most promising.
# fls -f linux-ext3 ext3-8.dd 32577 r/r 32578: only_live_twice.mp3 r/r 32582: goldfinger.mp3 r/r 32580: lic_to_kill.mp3 r/r 32581: diamonds_forever.mp3
This might look like an innocent directory of James Bond MP3 files, but notice that the inode numbers and what we know about inode and directory entry allocation. Inodes are allocated on a first-available basis in the block group. The inode address of our sniffer log was 32,579, which could have been allocated in between only_live_twice.mp3 and lic_to_kill.mp3. Also notice that goldfinger.mp3 has a larger inode than the other files, and it is in the middle of the directory. Further, the name goldfinger.mp3 is 14 characters long, and the name snifferlog-1.dat is 16 characters long, which means that they can use the same sized directory entry.
When we examine each of these files, we notice that only_live_twice.mp3 is an executable, and the other files are network sniffer logs in the same format as snifferlog-1.dat. Also the network packets in only_live_twice.mp3 have timestamps before the packets in snifferlog-1.dat, and the timestamps in lic_to_kill.mp3 are after the snifferlog-1.dat times.
Using this information, our theory is that snifferlog-1.dat file was created after the only_live_twice.mp3 file, and then lic_to_kill.mp3 was created. Some time after the diamonds_forever.mp3 file was created, the snifferlog-1.dat file was moved to the directory in block group 113. After it was moved, the goldfinger.mp3 file was created, and it overwrote the directory entry and took the next available inode. The M-time and C-time of the inode 32577 directory are the same as those for the goldfinger.mp3 file, which are both after the times for the snifferlog-1.dat file. We can see this relationship in Figure 14.12 where the directory entry in group 113 points to the inode in block group 2. We still do not know how the file was moved and if it always had that name or if it had an MP3 name. Analyzing the executable may shed light on those answers.
Figure 14.12. Scenario where the snifferlog-1.dat file was moved from a directory in block group 2.
In this scenario, we have tied a file back to its original directory using its inode address. If the file were moved within the same block group or to a different file system, then this technique would not work. But it does illustrate how knowing the allocation algorithms can help when these minor details are useful.
File Deletion Order
While investigating a Linux system, we find a directory named /usr/local/.oops/. This is not typical in a Linux system, and we look at its contents. It contains eight deleted file names, and all of the corresponding inodes have been reallocated to new files, so file recovery will not be trivial. We are curious about how the files were deleted, though, and we parse the directory entries to determine the values given in Table 14.3. The names are listed in the order in which they exist in the directory.
Byte Location |
Name |
Record Length |
Needed Length |
---|---|---|---|
12 |
.. |
1,012 |
12 |
24 |
config.dat |
20 |
20 |
44 |
readme.txt |
104 |
20 |
64 |
allfiles.tar |
20 |
20 |
84 |
random.dat |
64 |
20 |
104 |
mytools.zip |
44 |
20 |
124 |
delete-files.sh |
24 |
24 |
148 |
sniffer |
876 |
16 |
164 |
passwords.txt |
860 |
860 |
We can make some general observations from this output. If the record length of an unallocated directory entry is the actual size that it needs (based on the length of its name), the next directory entry was deleted after it was. For example, config.dat was deleted before readme.txt was deleted because if readme.txt was deleted before config.dat, the length of the config.dat file would have been increased by 20 to cover up readme.txt. Therefore, we know that config.dat was deleted before readme.txt, allfiles.tar was deleted before random.dat, and delete-files.sh was deleted before sniffer.
We also can observe that mytools.zip was deleted after delete-files.sh but before sniffer was deleted. We know this because the length of mytools.zip was increased to 44 to account for delete-files-sh being deleted. If sniffer had been deleted before mytools.zip, the length of mytools.zip would have been increased to account for it. We also can see that passwords.txt was deleted before sniffer was and that random.dat was deleted after mytools.zip. We also see that readme.txt was deleted before sniffer because the record length for readme.txt points to sniffer.
If we spend some more time determining the relative order of these deletions, we conclude that the files may have been deleted in alphabetical order. We do not know the order in which allfiles.tar, config.dat, and delete-files.sh were deleted, but we do know they were before the other files and they are the first three files when the names are sorted. Therefore, these files might have been deleted when they were shown in a window sorted by name or by a command, such as rm *. The rm tool deletes files, and my tests show that the command deletes files in alphabetical order. With these results, it is likely that the files were not individually deleted one by one and that maybe a script was used or a file manager window. Note that the results are more difficult to interpret when there were file creations in between the file deletions.