Removing Weird Characters from Imported Text

Problem

You've imported data from a text file but find weird (i.e., nonprintable) characters in the imported text. You'd like to remove the nonprintable characters.

Solution

Use the CLEAN worksheet function.

Discussion

If you've tried some of the text-importing recipes in this chapter, you may have come across some unusual characters in your imported data, as illustrated in Figure 3-14.

Notice the unusual rectangle characters mixed with the data. These characters are so-called nonprintable characters because under normal circumstances you can't see them. They include control characters such as line-feed and carriage-return characters, tabs, and others. You may or may not encounter such characters, depending on where your text files originate. I find I'm more likely to encounter such characters when importing data from text files that originated on a non-Windows system. This is because there are different standards for specifying formatting and control instructions on different systems. For example, Windows uses both a line feed and carriage return to denote the end of a line of text, whereas Macintosh systems use a single carriage return.

Figure 3-14. Nonprintable characters

You can remove these nonprintable characters using Excel's built-in worksheet function CLEAN. Simply type =CLEAN(cell reference) in a cell to remove nonprintable characters from the text contained in the referenced cell. You should replace cell reference with a valid cell reference, such as C1. Figure 3-15 shows how CLEAN deals with the first several rows of data.

Columns G and H show the cleaned data. Notice the cell formula for the selected cell, G1, displayed in the formula bar. In this case, the formula is =CLEAN(B1). The result is that the nonprintable character was removed from the number 0.8116.

Категории