Investigators Guide to Steganography

Still Images

The methods of steganography are quite varied; in still images, least-significant bit insertion and spread-spectrum techniques are used.

Texture block uses low bit-rate data hiding, and is accomplished by copying a region from a random texture pattern in a picture to an area of similar texture, resulting in a pair of identically textured regions in a picture.

Patchwork uses a low bit-rate data hiding based on a pseudorandom, statistical process. Patchwork invisibly embeds in a host image a specific statistic, and takes two places within a picture and lightens one and darkens the other.

Other methods include dithering manipulation, perceptual masking, and DCT coefficients manipulation.

Moving Images

Steganography, when applied to a video file such as an .avi or .mpeg, typically uses discrete cosine transform (DCT) manipulation. Westfeld and Wolf have described a method for data hiding in a videoconferencing system. Because videoconferencing needs to have a high frame rate on often narrow-band digital networks, DCT manipulation is a necessary and valuable part of the process. Basically, videoconferencing applications compress each frame with "differential lossy compression," meaning only the differences between successive stills are compressed, then broadcast. This renders the embedding technique almost invisible. While differences between an original and a stego-image can be detected, it is likely that no one can tell which is which. Because the videoconferencing system would broadcast only the differences between successive frames, the threat of detection by comparison between successive similar frames would not be a factor. Other attacks are equally unlikely to succeed in detecting or extracting the stego-message. Added noise would be very similar to the original noise and the data would be embedded before encryption, making it that much more difficult to find. The data rate for this technique could be as high as 8 kbps if embedded into an ISDN videoconference. This technique could be a very valuable and effective steganography method due to a high data rate and ease of stealth.

Audio Files

When developing data-hiding methods for audio, the first consideration is the most likely environment the sound signal will travel between encoding and decoding. There are two main areas of modification that we will consider: first, the storage environment, or digital representation of the signal, and second, the transmission pathway the signal might travel.

There are several methods for adding steganographic information to audio files:

  1. The high bit-rate LSB insertion is easily destroyed by anything other than a pure digital transmission.

  2. Differential phase variation, which is based on the sensitivity of the human auditory system. The human ear is sensitive to differential phase variation, but is relatively insensitive to the initial phase. The sound file is divided into blocks and each block's initial phase is modified using the embedded message. This preserves the subsequent phase shifts, meaning less differences, and therefore harder for the ear to detect. This technique is very good when dealing with perceived signal-to-noise ratio.

    Phase coding works by substituting the phase of an initial audio segment with a reference phase. The reference phase represents the data. The phase of all the following segments is adjusted to preserve the relative phase between segments while allowing data to be embedded. Phase coding is one of the most effective coding methods when it comes to signal-to-perceived noise ratio.

    Absolute phases can withstand a fair amount of modification; however, if the relative phase differences between the blocks is preserved, the ear will be less likely to detect any changes. As long as phase modification is small, inaudible coding can be achieved.

  3. Spread spectrum can be used a couple of ways. It has the ability to stay effective even if perceivable noise is added to the sound; and while adding noise is possible, the embedded signal can be filtered through a perceptual mask. To eliminate this problem, often the most audible components of the added noise are reduced in power. Then there is the basic spread-spectrum technique, which is designed to encode a stream of information by spreading the data across as much of the frequency spectrum as possible. This allows for signal reception, even if there is interference.

  4. Adding echo to the audio signal. Echo hiding is a robust and high data-rate method of embedding information into an audio signal. Adding an echo uses two different delays to encode the bits. They are both small enough to be heard with the naked ear, but they are perceived as something that enriches the sound rather than distorts it. This method is the only one that can resist a jitter attack. When adding echo, the data is hidden by varying initial amplitude, decay rate, and offset.

    When the delay between the original sound and the echo decreases, the signals blend and the human ear cannot distinguish between the two. Information is embedded by echoing the original signal with one of two delay kernels. A binary one is represented by an echo kernel with a change plus one-second delay. A binary zero is represented by a change plus zero-second delay. The extraction of the embedded information involves detecting the spaces between the echoes.

    Using this method you can see it is possible to encode and decode information with minimal alteration of the original audio signal. Minimal alteration means the signal has been changed in such a way that the average human cannot hear any significant difference between the original and altered signal. If there is an alteration, it actually works in the encoder's favor by giving the signal a richer sound.

Text Files

Open-Space Method

The open-space method uses white space on the printed page.

  1. Inter-sentence spacing: Encodes a binary message by placing one or two spaces after each terminating character (period or semicolon). The problem with this method is that it is very inefficient as it requires a lot of space for a small message, and inconsistent use of white space is easily spotted.

  2. End-of-line spacing: Data is inserted in the form of spaces at the end of a line. This allows for much more room to insert a message, but can present problems if a program automatically removes extra spaces or the document is turned into hard copy.

  3. Inter-word spacing: Uses right justification. The justification spaces are adjusted to allow for binary encoding. One space between words is a 0, two spaces are a 1. Open space works as long as text remains ASCII.

Syntactic Method

Deriving from "syntax," this method uses the manipulation of punctuation to hide information. Syntactic is a method that utilizes punctuations and contradictions. For example:

bread, cereal, and milk bread, cereal and milk

Semantic Method

A final category of data hiding in text involves changing the words themselves. Semantic methods are similar to the syntactic method. Rather than encoding binary data by exploiting ambiguity of form, these methods assign two synonyms primary or secondary value. For example, the word "big" could be considered primary and "large" secondary. Whether a word has primary or secondary value bears no relevance to how often it will be used, but primary words will be read as ones, secondary words as zeros when decoding.

Steganographic File Systems

A steganographic file system is a method of storing files that encrypts data and hides it so that it cannot be proven to be there. A steganographic file system can:

A stego file system can protect from some threats:

To elaborate on this concept, for example, a user of a steganographic file system is put in a position to reveal three different passwords used to protect different directories with his or her e-mail archive, tax records, and love letters, but keeps quiet about the directory containing his or her trade secrets. The person who is getting these passwords would have no way of proving that such a directory exists.

The classical way of hiding information in a deniable way would be to use a steganographic program to embed the information in large files such as audio or video, although there are some problems with this approach:

Steganographic file systems are designed to overcome the draw-backs of using individual files for hiding information. A stego file system aims to create a secure file system where the risk of users being forced to reveal private data is eliminated by giving the users the ability to truthfully say that there is no encrypted data hiding on the disk. Following is a discussion of the two ways of constructing a stego file system.

Method #1

Problems

Method #2

Stego File System Construction

In this explanation, the stego file system design is based on the second method of construction because it is more practical and efficient. This system does not use a separate partition of a hard disk, but instead places hidden files into unused blocks of a partition that also contains normal files, managed under a standard file system.

This has shown a practical implementation of a steganographic file system. It offers the following functionality:

Hiding in Disk Space

In this section we will discuss three different methods for hiding information steganographically in disk space: S-tools, hidden partitions, and slack space.

S-Tools

Similar to the method used in the stego file system, S-Tools will spread the file bits out throughout the free space on the floppy. This is undetectable in the normal Windows viewer, but the file is there.

S-Tools Version 3 has the ability to embed information in unused tracks of a floppy disk. While this program is not widely available on the Internet these days, it is still possible to find it and you may encounter this particular function.

How It Is Done

S-Tools will allow you to hide files in the unused space on floppy disks. To understand what is meant by unused space, look at the way DOS organizes the files on a disk. Every floppy disk, when formatted, is divided into sectors. Each sector on a disk can hold 512 bytes of information. On a 1.44 Mb disk, there are 1440 1024/512 = 2880 sectors. When you write a file to the disk, DOS computes how many sectors it will need to hold the file and writes this information into the file allocation table (FAT).

S-Tools' FDD (feature-driven development) module will look at the FAT to decide which disk sectors have not been used, and will allow you to hide information on them. S-Tools will not hide information in consecutive sectors on disk because this would be too easy to detect. Instead it uses a random number generator to choose which free sectors to use. S-Tools will add additional security by allowing you to fill all other unused sectors on the disk with random data.

Using This Module

There are a few tips that you might want to be aware of when using the FDD module. If you want to be able to plausibly deny having any concealed data on your disks, it would make sense to fill the unused space on all your newly formatted disks with random data. This way any concealed data will appear to be "lost in the noise."

One point to remember with this feature of S-Tools: Do not write any ordinary files to the disk after you have concealed information on it. Depending on the amount of space you have left on the disk, it is very likely that DOS will overwrite your hidden information. This point can also work in your favor because there may be a situation where you want the hidden information destroyed.

Analyze Disk

This option displays a usage map of the floppy and tells you how much information you can hide on it. S-Tools will work with any capacity of disk that DOS can use, up to a maximum of 1.44 Mb. Sectors marked in red are the ones that S-Tools cannot use because files are already stored there. The status bar at the bottom of the screen will tell you how much information you can hide on the disk (Figure 4.5 through Figure 4.7).

Fill Free Space

This option allows you to fill the unused sectors on a disk with random data. This will mask the presence of any file that you want to hide on the disk. S-Tools automatically asks you whether you want to fill the free space after hiding a file.

Figure 4.5

Figure 4.6

Figure 4.7

A Word of Warning

If you fill the free space on a disk after hiding a file, you will lose that file. After hiding, S-Tools will forget about its presence until you use the reveal operation. If at any time you decide you want to stop the process, hit the Escape key (Figure 4.8).

Figure 4.8

Hide File

This is the option that you use when you want to hide a file on disk. If you are not sure whether the disk has enough free space to hold the hidden file, then you can use the Analyze Disk option to find out.

First you are asked to choose the file that you want to hide. If you have asked to be prompted for encryption options, you will be asked whether the file should be encrypted before hiding. Using encryption is recommended even if the file is already encrypted because the pass phrase that you enter is also used to seed the random number generator that is used to choose the sectors that will hold the hidden file. Again, if you want to cancel the operation press the Escape key (Figure 4.9 through Figure 4.11).

Figure 4.9

Figure 4.10

Figure 4.11

Reveal File

This is the option that you should use to reveal a file that has been hidden on a disk. Simply insert the disk into the disk drive and select this option. If encryption was selected as an option when the file was embedded, then you must supply the correct pass phrase in order to reveal it. If everything works as planned, S-Tools will look at the disk and decide whether a file is hidden on it. If there is a hidden file, the program will tell you the size of the file and give you the option of viewing it or saving it.

Hidden Partitions

A hidden partition on a hard drive is another way of hiding large amounts of information in plain sight. The simplest explanation is a Linux partition chock full of secret messages hiding on a hard drive with only a Windows operating system. While this likely would not fool someone who was actively looking for hidden information, it would fool a casual user or someone unfamiliar with the computer's setup.

Slack Space

Slack space is the unused space in a disk. Even if the actual data being stored requires less storage than the cluster size, an entire cluster is reserved for the file. The unused space is called the slack space.

For example, the minimum space allocated on the hard drive is 32 kb and we have a file that is 6 kb. This leaves 26 kb unused and considered unavailable by the operating system. This unused space, slack space, could be used to hide information without showing up in any directory or file system.

Hiding in Network Packets

A covert channel is described as "any communication channel that can be exploited by a process to transfer information in a manner that violates the system's security policy." Essentially, it is a method of communication that is not part of an actual computer system design, but can be used to transfer information to users or system processes that normally would not be allowed access to the information.

In TCP/IP, there are a number of methods available whereby covert channels can be established and data can be surreptitiously passed between hosts.

This method can be used in a variety of areas:

Background Terminology

For our purposes, it is important to realize that TCP is a "connection-oriented" or "reliable" protocol. Simply put, TCP has certain features that ensure data arrives at the remote host in an intact manner (usually). The basic operation of this relies on the initial TCP "three-way hand-shake":

The entire connection process happens in a matter of milliseconds, and each packet from this point on is independently acknowledged by both sides. This handshake method ensures a reliable connection between hosts and is why TCP is considered a connection-oriented protocol. It should be noted that only TCP packets exhibit this negotiation process. This is not so with UDP packets, which are considered unreliable and do not attempt to correct errors nor negotiate a connection before sending to a remote host. This chapter deals with the TCP protocol primarily to exploit the acknowledgment feature, which will be described next. The thrust of these methods, however, could be easily supported on the UDP protocol type.

Encoding Information in a TCP/IP Header

Within each header, there are several areas that are not used for normal transmission or are "optional" fields to be set as needed by the sender of the datagrams.

An analysis of the areas of a typical IP header that are either unused or optional reveals many possibilities where data can be stored and transmitted (Figure 4.12 and Figure 4.13).

Figure 4.12

Figure 4.13

For our purposes, we will focus on encapsulation of data in the more mandatory fields. This is not because they are any better than the other optional areas; rather, these fields are not as likely to be altered in transit as the IP or TCP options fields, which are sometimes changed or stripped off by packet-filtering mechanisms or through fragment reassembly.

Rowland excellently describes three methods of adding information in his article, "Covert Channels in the TCP/IP Protocol Suite." [1] He describes encode and decode in the following fields:

Method One: Manipulation of the IP Identification Field

The identification field of the IP Protocol helps with reassembly of packet data by remote routers and host systems. Its purpose is to give a unique value to packets so that if fragmentation occurs along a route, they can be accurately reassembled. The first encoding method simply replaces the IP identification field with the numerical ASCII representation of the character to be encoded. This allows for easy transmission to a remote host, which simply reads the IP identification field and translates the encoded ASCII value to its printable counterpart. The lines below show a tcpdump representation of the packets on a network between two hosts, "nemesis.psionic.com" and "blast.psionic.com." A coded message consisting of the letters H-E-L-L-O was sent between the two hosts in packets appearing to be destined for the Web server on blast.psionic.com. The actual packet data does not matter.

The field in question is the IP portion of the packet, called the ID field, located in the parentheses. Note that the ID field is represented by an unsigned integer during the packet generation process of the included program. This program does not perform any type of byte-ordering functions normally used in this process; therefore, packet data is converted to the ASCII equivalent by dividing by 256.

This method is used by having the client host construct a packet with the appropriate destination host and source host information and encoded IP ID field. This packet is sent to the remote host, which is listening on a passive socket that decodes the data. This method is relatively straightforward and easy to implement, as shown in the included covert_tcp program. You should note that this method relies on manipulation of the IP header information, and may be more susceptible to packet filtering and network address translation where the header information may be rewritten in transit, especially if located behind a firewall. If this happens, loss of the encoded data may occur.

Method Two: Initial Sequence Number Field

The Initial Sequence Number field (ISN) of the TCP/IP Protocol suite enables a client to establish a reliable protocol negotiation with a remote server. As part of the negotiation process for TCP/IP, several steps are taken in what is commonly called a "three-way handshake," as described earlier. For our purposes, the sequence number field serves as a perfect medium for transmitting clandestine data because of its size (a 32-bit number). In this light, there are a number of possible methods to use. The simplest is to generate the sequence number from the actual ASCII character we wish to have encoded. This is the method used by covert_tcp, as shown in the following packets. (The "S" indicates a synchronize packet; the ten-digit number following is the sequence number being sent.) Again, no byte-ordering functions are used by covert_tcp to generate the sequence numbers. This enables a more realistic looking sequence number. Therefore, in our example the sequence numbers are converted to ASCII by dividing by 16777216, which is a representation of 65536 256. Again, our message of H-E-L-L-O is being sent:

Using this method, the packet is constructed with the appropriate data in the SYN field and sent to the destination host. The destination host, expecting to receive information from the client, simply grabs the SYN field of each incoming packet to reconstruct the encoded data. This is done with a passive listening socket on the remote end, as described earlier.

Because of the sheer amount of information one can represent in a 32-bit address space (4,294,967,296 numbers), the sequence number makes an ideal location for storing data. Aside from the obvious example given previously, one can use a number of other techniques to store information in either a byte fashion or as bits of information represented through careful manipulation of the sequence number. The simple algorithm of the covert_tcp program takes the ASCII value of our data and converts it to a usable sequence number (which is actually done by the packet generation functions and is converted back to ASCII in a symmetrical manner). Note that this method is similar to a "substitution cipher," whereby packets containing the same information will display the same sequence number (note packets three and four, which contain the letter "L" in the encoding and their sequence numbers). Methods that incorporate a random number generation of the sequence number with a subsequent inclusion of the data to be encoded through an XOR or similar operation may yield a more random result. Inclusion of encrypted data to perform the same function is a logical extension of this idea.

Method Three: The TCP Acknowledge Sequence Number Field "Bounce"

This method relies on basic spoofing of IP addresses to enable a sending machine to "bounce" a packet of information off a remote site and have that site return the packet to the real destination address. This has the benefit of concealing the sender of the packet, as it appears to come from the "bounce" host. This method could be used to set up an anonymous one-way communication network that would be difficult to detect, especially if the bounce server is very busy.

This method relies on the characteristic of TCP/IP where the destination server responds to an initial connect request (SYN packet) with a SYN/ACK packet containing the original initial sequence number plus one (ISN + 1). In this method, the sender constructs a packet that contains the following information:

The source and destination ports chosen do not matter (except if you want to conceal the traffic as a well-known service such as HTTP, and you are having the receiving server listening for data on a predetermined port, in which case you will want to forge the source port as well). The DESTINATION IP address should be the server you wish to bounce information off of and the SOURCE IP should be the address of the server you wish to communicate with.

The packet is sent from the client's computer system and routed to the forged destination IP address in the header ("bounce server"). The bounce server receives the packet and sends either a SYN/ACK or a SYN/RST, depending on the state of the port the packet was destined for on the bounce server. The return packet is sent to the forged source address with the ISN number plus one. The listening destination server takes this incoming packet and decodes the information by transforming the returned sequence number minus one back into the ASCII equivalent. It should be noted that the low-order bits are dropped in the translation process of covert_tcp because of the method used to "encode" and "decode" information, so the program does not need to adjust for the incremented SYN packet number.

Following is a step-by-step representation of the bounce method:

This method is essentially tricking the remote server into sending the packet and encapsulated data back to the forged source IP address, which it rightfully thinks is legitimate. From the receiving end, the packet appears to originate from the bounce server, and indeed it does. As a side note, if the receiving system is behind a packet filter that allows communication only to certain sites, this method can be used to bounce packets off the trusted sites; this will then relay them to the system behind the packet filter with a legitimate source address. This could be vital in communicating with receiving servers in heavily protected or scrutinized networks.

Bouncing a packet off a well-known Internet site (.mil, .gov, .com, etc.) is also a useful technique for concealing operations in ordinary traffic. Be sure the bounce site is not using round-robin DNS (stable IP address) or, if it is, that the receiving server is passively listening on a predetermined port to decode the transmissions from multiple sites (i.e., send out a forged source address and source port of 1234 so the bounce server returns the packet to the listening server on port 1234). Using this technique, the sending client can bounce packets off hundreds of Internet hosts while the receiving server listens and writes out any data destined for the predefined port number regardless of IP address.

If your network site has a correctly configured router, it may not allow a forged packet with a network number that is not from its network to traverse outbound. Alas, many routers are not configured with this protection in mind and will happily pass the data, so you can generally expect this technique to work.

Implications, Protection, and Detection

The implications of these methods depend on the intent and purposes they are being used for. This method of covert channel could be used immediately as an alternative to encryption in countries that have a stricter stance on cryptography, such as China and France. Additionally, this technique could be used quite effectively for data smuggling and anonymous communication.

Protection from this technique would start with the use of an application proxy firewall system. An application proxy firewall is designed to keep packets from logically separated networks from passing directly to each other. A packet-filter firewall is another option, but is not as effective as the application proxy firewall.

Detection of these techniques can be difficult. If the information in the packet data is encrypted or is "bounced" from another server, it can be very difficult to determine where the packet originated. One way to determine where a forged packet originated is to put a sniffer on the inbound side of the server.

[1]Available at www.firstmonday.dk/issues/issue2_5/rowland/#dep2

Категории