Hack 79. Geocode a U.S. Street Address
You know the address, but where is that in GPS terms?
You know your friend's address, but that won't help you program your GPS or aim your ICBM. For that, you need her latitude and longitude: you want to "geocode" her address! Geocoding is the process of adding geographic coordinates, such as latitude/longitude, to other information. You can geocode street addresses, or any other information that has a geographic component.
One Saturday we were sitting around thinking that we really ought to go see the Power Tool Drag Races. We knew that they were put on by Qbox (http://www.qbox.org/), and we even knew their address, but where exactly is that? Sure, we could use a commercial mapping service and have it tell us to turn left here, and in circles there, but what I wanted was to program my GPS and have it just sort of point the way. At one level, this is much harder to follow than turn-by-turn directions, except that directions only work as long as you follow them. Since I have little confidence in my ability to follow directions in San Francisco, I am very happy to have the safety net of the GPS pointer.
To cut to the chase, just enter this URL (Figure 7-1 shows what it should return):
http://geocoder.us/demo.cgi?address=950+Hudson+Street%2C+san+francisco%2C+ca
Figure 7-1. The Power Tool Drag Races at Qbox.com
We plugged (37.734085, -122.377589) into our GPS unit, and off we went for a day of power-tool debauchery.
There are commercial services that provide geocoding for U.S. addresses and for other parts of the world. Do a Google search for "Geocode Addresses" for commercial services.
A geocoder is also at the heart of all the online map services. When you enter a street address into MapQuest, it is geocoded and the map you get is generated from the returned coordinates. In the good old days of the Web, pretty much all of the online map services returned the lat/long for addresses as a "freebie." And then they decided that geocoding had added value, and one by one they pulled the plug.
There is a strong movement of people who believe in open data and open data formats. Mapping sites' removal of free geocoding led directly to the creation of the free geocoder.us site. As William Gibson famously noted, "the street finds its own uses for things," and that use can transcend and exceed the original vision of the tool.
7.3.1. The Birth of geocoder.us
Strangely enough, the removal of useful features from online map services seemed to occur right before a surge of interest in free sources of geodata occurred among the free and open source software community.
Collecting this data and keeping it up to date with "ground truth squads" who go around and verify that streets are where they are supposed to be and that houses haven't up and run off, is quite expensive.
An alternative to the full expense of this data lies in the U.S. Census Bureau. They have compiled TIGER (Topologically Integrated Geographic Encoding and Referencing system) data. TIGER data is used as part of the normal fulfillment of their duties to do an actual enumeration of the people every 10 years. This data is imperfect, but the regular tasks of census workers are similar to our own needs. They wish to identify the location of a residence based on a street address, just as we do when we geocode.
Again, it is important to stress that TIGER data is imperfect, but "imperfect but free" has its own charm! TIGER data is also used as the basis for the free TIGER Map Server offered by the Census Bureau at http://tiger.census.gov/cgi-bin/mapsurfer.
There is a lot of interesting information about geography and the challenges of capturing complex and inconsistent information to be found in the TIGER documentation. But for simple geocoding, all you really need to know is that the TIGER data endeavors to include information on every street segment in the U.S. For each block, the TIGER data includes the street name, the latitude and longitude at each end of the block, and the range of address numbers for the left and the right side of the street.
Here is the entry that includes 1005 Gravenstein Hwy N, Sebastopol, CA 95472 (O'Reilly Media's headquarters):
11003 67518936 A Gravenstein Hwy A31 1001 1019 1000 101801009547295472 06060970979298092980 707707077015340315340320124009-122816102+38390313-122815686+38389814
This street segment goes from (38.390313, -122.816102) to (38.389814, -122.81515686); one side of the street includes addresses from 1001 through 1019, and the other covers addresses from 1000 to 1018. We can interpolate that "1005" is about a fifth of the way from 1001 to 1019 and, assuming the street is straight, that it will be about a fifth of the way between the ends of the blocks.
There is a lot of other information in this line, and in the other files that make up the data set for a county. TIGER/Line comprises some 24 gigabytes of data for the whole country. Including information on curves in the road that are not the ends of street segments, but in the interests of compressing that 24 GB into something searchable, we will simplify away that extra information.
Fortunately for us, Schuyler Erle has stripped away all of that complexity at http://geocoder.us/, a free geocoding web site and web service for U.S. addresses based on the U.S. Census TIGER/Line data.
You may use the web site to geocode individual addresses or use one of three web-service interfaces to geocode via code, as illustrated in [Hack #80] . You can even download the source code from CPAN, the Perl code repository at http://cpan.org, and the TIGER/Line data from the census and create your own geocoding service.
The site provides a text box for entry of an address or an intersection. So entering "1005 Gravenstein Highway North, Sebastopol, CA" will return the location of O'Reilly Media. You can also enter an intersection, like "Hollywood and Vine, Hollywood, CA" or "Florence Ave and Wilton, Sebastopol, CA 95472."
If your address is one of the majority of those that geocoder.us successfully geocodes, it will return the latitude and longitude. As a bonus, it will display a map, created dynamically by the TIGER/Line Map Server, with your address marked and centered.
The results with lat/long appear quickly, but it can take longer for the map to be fetched from the TIGER/Line Map Server. The map will be blank and the little circle on the right will be red until the map is loaded.
In Seattle, Washington, you can indirectly use the geocoder at http://seattle.wifimug.org/nearby.cgi to get "Caffeinated and Unstrung" by finding the nearest location that offers coffee and free wireless access, as illustrated in Figure 7-2.
Figure 7-2. Caffeinated and Unstrung: building on Geocoder.us
7.3.2. See Also
The U.S. Census Bureau and Geography page provides lots of great information (http://www.census.gov/geo/www/index.html)