Hack 82. Find Nearby Things Using U.S. ZIP Codes
ZIP Codes are everywhere, and ideal for taking advantage of "What's near me?" services.
Everywhere on the Web are forms asking for your ZIP Code and promising to get you in touch with your nearest latte, sandwich, or golf course. Now you can join the in-crowd and provide your own Location Based Services (LBS) based on ZIP Codes.
If you know your ZIP Code, then you can figure out where you are...sort of. United States ZIP Codes were established to help deliver the mail. They were not optimized to ease the burden of 21st century geowankers, which is a pity, since they provide a ubiquitous and useful label for most people in the United States. There is a method to the madness of U.S. ZIP Codes, but sadly for those of us with a locative bent, that method has nothing to do with topography and everything to do with efficient mail delivery.
ZIP Codes describe arbitrary, irregular shapes and sizes. Similar ZIP Codes tend to be close to each other, but there are places where successive ZIP Codes are spread across metropolitan areas. Because the boundaries are irregular, it is quite possible to be in one ZIP Code on the border of a second ZIP Code, and to actually be closer to a third ZIP Code than to the center of the second.
In spite of these obstacles to perfection, we can do a lot with ZIP Codes! The first step is to acquire a geocoded ZIP Code database. This is a database of ZIP Codes to which has been added the ZIP Code of the centroid of each ZIP Code. Imagine that you could slice a ZIP Code off the ground's surface and hold it suspended in air, sitting only on a pencil (a very, very strong pencil).
The centroid is the point where the ZIP Code would balance, that is, the center of mass of the area covered by the ZIP Code. This usually works well in helping to geocode ZIP Codesexcept when it doesn't. For example, depending on how you calculate it, the centroid of ZIP Code 94123 in San Francisco is roughly 15 miles out to sea. Why? Because that ZIP Code also includes the Farallon islands, which are 27 miles straight out from the Golden Gate. So the "mass" we are interested in is not always the mass we get!
The easiest way to get a current clean geocoded database is to buy it from a company like Melissa Data Corp (http://www.melissadata.com). Their ZIP*Data product is a high-quality geocoded ZIP Code database. You can get more information about the product, as well as download the manual for free, at http://www.melissadata.com/zd.html. I recommend reading the manual even if you don't decide to buy the database. It includes lots of examples of distance and bearing calculations.
The only bad thing about ZIP*Data is the price: $150 per quarter, or $395 for four quarterly updates. Note: The data doesn't "expire" or get locked in some fashion; it just gets more and more out of date. This is actually a good price for quality geocoded ZIP Code data, but it is quite a hurdle for casual experiments or for use by nonprofits.
Fortunately the U.S. government again comes to the rescue of the itinerant hacker. The U.S. Census Bureau maintains a geocoded ZIP Code database. It is not subject to regular updates, so the data can be "stale." But for many applications, "free" is more important than "perfect."
Schuyler Erle has compiled a completely free ZIP Code database from 100% public domain sources. It is available at http://civicspacelabs.org/zipcodedb. The database comes in two forms: a MySQL dump file that can be used to directly create and load a ZIP Code table within MySQL, and a comma-separated-values version that is useful for simple scripts.
If you download the CSV version and uncompress it, you can use this script to query the data. (You can also use the Unix grep utility. The idea is to show that dealing with ZIP Code need not be complicated):
#!/usr/bin/perl my ($search_zip, $zipcodefile) = @ARGV; open ZIP, $zipcodefile or die "can't open zip code file $! "; # find the lat long for this zip code my ($zip, $city, $state, $zip_lat, $zip_long); while (my $st = ) { $st =~ s/"//g; ($zip, $city, $state, $zip_lat, $zip_long) = split(/,/, $st); last if $zip =~ $search_zip; } close ZIP; print "$zip_lat, $zip_long, $city, $state, $zip ";
Then, you can run the script as follows:
ziplookup.pl 95472 zipcodes.csv 38.393314, -122.83666, Sebastopol, CA, 95472
We can then use nearest.pl to return all the points from a datafile that are within a specified distance of a point. The datafile looks like this:
40.70175, -103.68998, 100 MILES 39.22394, -123.76648, ALBION
And here is the code for ./nearest.pl:
#!/usr/bin/perl use Geo::Distance; # This is a linear sort of the zipcode file. my ($search_lat, $search_long, $datafile, $distance) = @ARGV; print "distance: $distance "; # now look for the nearest $lat, $long within the file DATA my $geo = new Geo::Distance; my $points; open DATA, $datafile or die "can't open datafile $! "; while (my $st = ) { chomp $st; my ($lat, $long, @rest) = split(/,/,$st); my $rest = join " ", @rest; $points->{$rest}->{lat} = $lat; $points->{$rest}->{long} = $long; my $dist = $geo->distance(unit=>'mile', lat1=>$lat, lon1=>$long, lat2=> $search_lat, lon2=>$search_long); $points->{$rest}->{dist} = sprintf("%5.2f", $dist); } # print points with dist < the passed distance foreach my $p (sort { $points->{$a}->{dist} <=> $points->{$b}->{dist}} keys %$points) { print "$points->{$p}->{dist} $p "; last if ($points->{$p}->{dist} > $distance); }
We can run the script as follows, specifying the latitude, longitude, datafile name, and distance in miles:
nearest.pl 38.393314 -122.83666 datafile.txt 10 0.69 SEBASTOPOL 0.93 HWY 116 3.06 HOPYARD 5.79 STONYPOINT RD 10.72 WINDSOR
7.6.1. Who Is Nearby?
The president of a former employer of mine gave presentations for our customers a few times a year. Each time, I was asked to produce a list of our customers whom we should invite to the event, based on their proximity to the speaking location. I did this in the old days using FoxPro for DOS, but we can bring this method up to date with MySQL.
First download the MySQL dump version of the Civic Space ZIP Code database (http://civicspacelabs.org/zipcodedb) and uncompress it. Then, create a MySQL database (or you can simply add the ZIP Codes table to an existing database):
mysqladmin create zipcodes
Load it up:
mysql zipcodes < zipcodes-mysql-10-Aug-2004/zipcodes.mysql
You can now start MySQL:
mysql zipcodes
And start looking at ZIP Codes:
select * from zipcodes where zip="95472" +-------+------------+-------+-----------+-------------+----------+-----+ | zip | city | state | latitude | longitude | time zone | dst | +-------+------------+-------+-----------+-------------+----------+-----+ | 95472 | Sebastopol | CA | 38.393314 | -122.836660 | -8 | 1| +-------+------------+-------+-----------+-------------+----------+-----+ 1 row in set (0.00 sec)
The simplest way to select the nearby items is to define a bounding box: a rectangular area specified by the corners of a rectangle that you know is larger than your area of interest. We know that at the equator, one degree of latitude and longitude are both equal to about 69 miles. The distance of a degree of longitude decreases as you move toward the Poles, but if the cost of including extra records is low, then we can comfortably ignore that fact. (If the difference really matters to you, then it may help to know that the distance between any two lines of longitude decreases away from the equator specifically in proportion to the cosine of the latitude.)
A bounding box 25 miles wide around the ZIP Code 95472 is approximated with:
Maximum latitude 38.393 + 25/69 Minimum latitude 38.393 - 25/69 Maximum Longitude -122.834 + 25/69 Minimum Longitude -122.834 - 25/69
And the SQL for this bounding box is:
select * from zipcodes where latitude between 38.393 - 25/69 and 38.393 + 25/69 and longitude between -122.834 - 25/69 and -122.834 + 25/69
Again, this is not totally correct because the width of the bounding box is overstated, but it is sufficient to be useful. This method also has the advantage of being very, very fast. If you need the answer to be more precise, then you should make this a two-step process. First, select the records within your bounding box (as previously stated) and then calculate the distance for each record within the box, discard those ZIP Codes that are outside of your range, and sort the survivors by distance.
A sample script to do this is available at http://mappinghacks.com/cgi-bin/ziprange.cgi.
You call the script with a ZIP Code and a distance, and it returns a list of ZIP Codes less than the specified distance from your target ZIP Code. It excludes all ZIP Codes where the distance to the centroid exceeds your limit. We could change the ZIP Code table to have it refer to our own geocoded data, say a list of our customers, or suppliers. Then we would be able to find our nearest customers based on ZIP Code.