Hack 62. Find the Latitude and Longitude of a Street Address
The Google Maps API won't do it for you, but there are other ways to find the coordinates of a given street address.
As we've seen all through this book, Google Maps makes it easy to make custom maps of anything for which you have a latitude and a longitude. However, people don't tend to think of places in terms of geographic coordinates; more often, people commonly know and refer to places by a street or mailing address. In order to find and show these places on a map, we need to be able to turn a given address into the corresponding latitude/longitude coordinates. The process of turning addresses into map coordinates is generally referred to as geocoding.
Unfortunately, for all the great things that the Google Maps API offers, the very common task of geocoding street addresses simply isn't among them. Although the Google Maps webUI will show you the location of an address, often with great accuracy, the problem is that Google doesn't own this data and has been unable to negotiate permission with its data providers to offer address lookups as a service in the API. Furthermore, screen-scraping the Google Maps results page is strictly a no-no, according to the Terms of Use. If you want to stay legitand not risk angry takedown letters from you-know-whohow can you get the lat/long coordinates for a given street address?
7.2.1. The Hack
Fortunately, within the United States, the Census Bureau collects street address information as part of its constitutionally assigned duties of enumerating the populace of the country every ten years. What's more, the Census Bureau publishes this information in the public domain, in the form of the TIGER/Line data set. This data is freely available from the Census Bureau web site at http://www.census.gov/geo/www/tiger/. As of 2004, updated versions are published twice a year.
The problem is that the TIGER/Line data set is composed of 3,233 separate ZIP files, one for each county in the entire United States. The entire data set is 4.3 GB compressed and runs to almost 16 GB uncompressed. That's a lot of data to struggle with if you just want to look up a few lousy addresses. This is where Geocoder.us comes in.
Geocoder.us offers a web service (http://geocoder.us/) for geocoding U.S. street addresses from TIGER/Line data. Actually, several styles of web service are offered, including SOAP, XML-RPC, and REST. All of these services have but one goal, which is to take a street address or intersection in the U.S. and turn it into latitude and longitude coordinates that can be displayed on a map.
Figure 7-1 shows the working demo application of the Geocoder.us service at http://geocoder.us/demo.cgi. The latitude and longitude returned from the lookup can be trivially turned into a marker on a map, using the Google Maps API. You can use this URL to test out the service manually, but if you want to do address lookups in your program, you should definitely use the web service interfaces instead.
7.2.2. The Code
7.2.2.1. XML-RPC.
The easiest way we know for accessing the Geocoder.us web service is to use the XML-RPC interface from within Perl. The outline of the code looks like this:
use XMLRPC::Lite; my $result = XMLRPC::Lite -> proxy( 'http://rpc.geocoder.us/service/xmlrpc' ) -> geocode( '1005 Gravenstein Hwy, Sebastopol, CA 95472' ) -> result;
Figure 7-1. The geocoder.us site at work
Either a properly formatted U.S. street address or an intersection of the form "Hollywood Blvd & Vine St, Hollywood, CA" will be accepted by the web service. You must supply either a city and state or a ZIP Code, though providing both doesn't usually hurt.
If the lookup succeeds, then the $result variable will contain a reference to an array. Each item in the array is a hash, or associative array, that contains key/value pairs describing the results of the lookup. The following outlines the structure of the data returned by the XML-RPC request:
$result = [ { 'number' => '1005', 'prefix' => '' 'street' => 'Gravenstein', 'type' => 'Hwy', 'suffix' => 'N', 'city' => 'Sebastopol', 'state' => 'CA', 'zip' => '95472', 'lat' => '38.411908', 'long' => '-122.842232', } ];
As you can see, the Geocoder.us web service attempts to break an address into its components and then normalizes those components before doing the lookup, and then returns the normalized components along with the coordinates. Here's a bit of code that prints the latitude/longitude pairs returned:
if ($result) { for my $address (@$result) { if ($address->{lat}) { print "Address found: $address->{lat} $address->{long} "; } else { print "Couldn't locate the address! "; } } } else { print "Couldn't parse the address! "; }
The $result variable should be tested for truth before accessing its contents, because the service will return an undefined value if the address can't be parsed. If the address can be parsed, but no match is found in the database, the result will be an array containing a single hash, with an empty string in place of the latitude and longitude values. Finally, if the address given matches multiple addresses in the database, the array will contain a separate hash for each match found.
|
7.2.2.2. REST.
If, for whatever reason, you prefer not to use XML-RPC, you can always use the Geocoder.us REST service, by sending an HTTP GET request to http://rpc.geocoder.us/service/rest/geocode?address=[your address here]. (Don't forget to URI-escape the addresse.g., turn whitespace to %20before passing it to your HTTP client, if your client library doesn't do it for you. The URI:: Escape module from the CPAN can help with this.)
The REST interface returns an RDF/XML document with geo:Point elements for each address match. Here's an example:
1005 Gravenstein Hwy N, Sebastopol CA 95472 -122.842232 38.411908
Multiple geo:Point elements will be returned in the document if multiple matches for the requested address are found.
Here's a bit of Perl code that makes use of the REST service. You will need the XML::Simple, LWP::Simple, and URI::Escape modules from the CPAN.
use XML::Simple; use LWP::Simple; use URI::Escape; use strict; sub geocode_rest { my $xml = get( "http://rpc.geocoder.us/service/rest/geocode" . uri_escape( $address ); if ($xml) { my $result = eval { XMLin( $xml, ForceArray => ['geo:Point'] ) }; if ($result) { my $points = $result->{'geo:Point'}; for my $point (@$points) { my $lat = $point->{'geo:lat'}; my $lon = $point->{'geo:long'}; if ($lat and $lon) { ### Success! Do something with the coordinates. } else { ### Couldn't find a match for the address. } } } else { ### Couldn't parse the XML, so the service spit ### out an error message, meaning it couldn't parse the address. } else { ### The HTTP GET failed, indicating a network error. } }
7.2.2.3. SOAP.
The SOAP interface to the Geocoder.us web service is probably the most difficult to use, entirely owing to the subtle complexities of SOAP itself. However, if you're a Java or C# user, it might actually be easier for you to use SOAP than the other web service interfaces, because you can usually autogenerate result classes from the WSDL description. Accordingly, you may be pleased to know that a WSDL file exists for the Geocoder.us SOAP interface at http://geocoder.us/dist/eg/clients/GeoCoder.wsdl.
7.2.3. The Caveats
In principle, you should be able to access the Geocoder.us web services directly from the browser by using JavaScript, but in practice, the browser security model of the better web browsers out there will prevent you from using JavaScript to access Internet domains aside from the one that your page originates from. One way around this, which many sites are currently using, is to set up a server-side script that accesses the Geocoder.us site, as shown above. You can then use the GXmlHttp class from the Google Maps API to request address lookups via your site's geocoding proxy, and get the results back in JavaScript as either XML or plain text, depending on your preferences.
One caveat you should be aware of is that the TIGER/Line data set is less complete and/or less accurate in some areas than the commercial data sources that Google Maps uses to locate addresses. The flip side, of course, is that TIGER/Line is freely available. It's a trade-off! In general, residential addresses are more likely to be accurate than addresses in industrial areas or commercial office parks.
Also, you need to examine the terms and conditions of service for Geocoder. us at http://geocoder.us/terms.shtml, before using their service. In particular, use of the free web services for for-profit commercial ends is strictly prohibited. If you are a commercial user, you will need to subscribe to the commercial service instead.
The final caveat is that the Geocoder.us web services are throttled by IP address, to keep the service from being hammered by a single user. As of this writing, a delay of 15 seconds between requests is in place, but Locative Technologies, the maintainers of the Geocoder.us service, reserves the right to adjust this upwards or downwards as necessary. If the service seems unduly slow, it's probably because your requests aren't spaced more than 15 seconds apart. Of course, commercial account users don't suffer this restriction.
If the throttling or the expense are a problem for you, you can always set up your own local Geocoder.us database, by downloading the Geo::Coder::US module and its prerequisites from the CPAN. The documentation for the Geo::Coder::US::Import module, which comes with the distribution, contains all the instructions you need for getting the TIGER/Line data from the Census Bureau, and using it to build your own local database. Although the Geocoder.us database for the whole country runs to almost 800 MB, you can simplify things for yourself by constructing a database composed only of the counties you're interested in.
7.2.4. Geocoding Addresses Outside the U.S.
Geocoding street addresses outside the U.S. for free is considerably harder, because the data sets simply aren't freely available. (Denmark is one notable exception.) In Canada, you may be able to use the Geocoder.ca web site at http://geocoder.ca/, but that service is based on government data that itself is not freely available. For Europe, Japan, and the rest of the world, you may have to purchase access to a commercial data set in order to geocode addresses in your country. The alternative is to petition your government's legislators to change its geospatial data access policies, so that citizens in your country can access information whose collection has already been subsidized with your tax money!
7.2.5. See Also
- If you plan to use the Geocoder.us web services, you should definitely read the developer documentation at http://geocoder.us/help/, as well as the terms and conditions of service at http://geocoder.us/terms.shtml.
- If you want to set up your own U.S. address geocoder, you can start by getting the Geo::Coder::US backend code from http://search.cpan.org/~sderle/Geo-Coder-US/, or by using the CPAN shell to install the module and its prerequisites automatically. Be sure to read the documentation for the Geo::Coder::US::Import module.
- The Geo::StreetAddress::US module contains all of the rules for how street addresses are parsed. We always welcome patches to this module, to help improve our hit rate!
- The Census Bureau's TIGER/Line data set lives on the Webat http://www.census.gov/geo/www/tiger/.