Hack 83. Map Numerical Data the Easy Way

Hacking the color palette of a GIF image offers a cheap and simple route to mapping all sorts of quantitative data.

Geographical maps are often used to represent some quantifiable property of each country or region depicted, such as population, GDP, health, and so on. Making similar maps of your own involves finding a base map and then coloring each country or region based on the value in question. A number of commercial applications are able to do something along these lines, but for a price. And, of course, it ain't much of a hack if you let your MS Excel plug-in do the work.

This hack is based on the fact that pixels in a GIF are internally just indexes in a palette and is therefore somewhat similar to the old palette animation trick. A simple GIF image starts with 13 bytes of header data, followed by the palette, which is then followed by the actual image data. Modifying the image data on the fly is hard, since it is compressed. However, modifying the palette information is not difficult. The palette is basically just a list of byte triplets, each describing the RGB color of one pixel, holding a maximum of 256 entries.

Note that hacking the GIF palette works for mapping countries of the world or, say, states of the U.S., because there are fewer than 256 of each. If you were interested in mapping voter turnout across San Francisco's more than 400 voting precincts, you'd need to use a different technique, such as the one described in [Hack #44] .

 

7.7.1. Hacking the GIF Palette

Let's work our way up from a simple example. Say we want to convert a color GIF into grayscale. Using Python and some image library, this can perhaps be done more elegantly, but if we don't care too much about image size, the following little script will do. If we assume that f is a file-like object containing a GIF and o is a file-like object able to receive a GIF, then this bit of Python will recolor the image in grayscale and dump it to o:

header = f.read(13) o.write( header ) for i in range(1 << ( (ord(header[10]) & 7) +1 )): gray = (ord(f.read(1)) + ord(f.read(1)) + ord(f.read(1)))/3 o.write(3*chr(gray)) o.write( f.read( ) )

Not bad for six lines of code! In the third line, we calculate the size of the palette by left-shifting the lowest 4 bits of the tenth byte of the header. Then we loop through the palette and average each RGB triple and write that to the output. We finish by writing what's left in the buffer back to the output.

In order to make this work for a map, obviously we're going to need a map where each distinct region/country is color-coded in a special way. If we had a list of regions numbered 0 through n, and we could palette-index i for country i, then this would be easy. Unfortunately, most imaging tools work with colors and not with palette indexes. Furthermore, these imaging tools tend to reshuffle the palette as they see fit, which makes this setup rather risky. Instead, we can use the RGB color scheme, using a unique color for each country, which we can later replace on the fly with the color needed.

7.7.2. Getting the Data

Let's say we want to create maps showing the population growth in different shades of red for all countries. The CIA Factbook supplies this information in a nice parseable HTML format at http://www.cia.gov/cia/publications/factbook/fields/2002.html. In addition to population growth, other interesting CIA-gathered facts can be used for mapping. (You can find a list of some possibilities at http://www.cia.gov/cia/publications/factbook/docs/notesanddefs.html.)

The following Python code harvests the population data by downloading the page and scraping the HTML. The output is, interestingly, another Python script, which contains the population growth values and can be imported into yet another Python script to generate the imagery. A more advanced version of this hack might save the data in a database somewhere for later use, but...this is a hack, after all:

import urllib res = [ ] html = urllib.urlopen('http://www.cia.gov/cia/publications/factbook/fields/ 2002.html').read( ) for tag in html.split('>')[1:]: country, tag = tag.split('',1)[1].split('%')[0].strip( ) if growth[0]= ='N': growth = None else: growth = float(growth) res.append( [country,growth] ) print "countryList = %s" % `res`

Note the use of the backticks in the last line, which causes Python to produce the representation of the associated list in Python code. We save the Python code generated by this script as countryList.py:

$ python getCountryList.py > countryList.py

 

7.7.3. Tying It All Together

Now, fire up your favorite image editor, load a world map with countries on it, and give country i in the produced list RGB color (i, 255 - i, 238). Yes, it might take some time, and each country needs to be done in the exact order it's listed in your data set, but the task will be quite good for your sense of geography! Alternatively, if you're feeling lazy, you can download such a map from http://mappinghacks.com/maps/worldmap.gif. Save the map as worldmap.gif. Figure 7-4 shows what the "pristine" version of this file looks like, with one shade per country.

Figure 7-4. worldmap.gif, with its original color palette

Our main code is going to do something very similar to the "grayer" in the first code fragment we looked at. However, instead of replacing all palette entries with averages of red, green, and blue, it checks whether blue is 238 and red equals 255 minus green. If so, then we'll replace the entry by the target color of the countryin this case, the result of an equation converting the growth of the country to RGB:

for i in range(1 << ( (ord(header[10]) & 7) +1 )): r,g,b = [ord(c) for c in f.read(3)] if r= =255-g and b= =238: growth = countryList[r][1] if growth!=None: r = int(30*max(0,growth+2))+64 g = b = 92 else: r = g = b = 64 o.write( chr(r)+chr(g)+chr(b) )

Variations of this code can be used to generate all kinds of dynamic maps. It is probably most useful as a CGI script on a web site, where one could use it to generate dynamic maps: for example, a map showing where your web site visitors are coming from. The following code is an implementation of the population growth map as a CGI script.

#!/usr/bin/python print "Content-Type: image/gif " from countryList import countryList f = open( 'worldmap.gif', 'rb' ) output = f.read(13) for i in range(1 << ( (ord(output[10]) & 7) +1 )): r,g,b = [ord(c) for c in f.read(3)] if r= =255-g and b= =238: growth = countryList[r][1] if growth!=None: r = int(30*max(0,growth+2))+64 g = b = 92 else: r = g = b = 64 output += chr(r)+chr(g)+chr(b) print output + f.read( )

As it happens, the exact same technique is used to make the maps explored in [Hack #3]. The result is shown in Figure 7-5.

Figure 7-5. worldmap.gif recolored to show population growth by country

 

7.7.4. See Also

Douwe Osinga

Категории