Hack 87. Build a Spatially Indexed Data Store
Spatial extensions to the PostgreSQL database are useful in many applications.
A spatial database allows you to make lots of interesting statistical and geometrical queries. Find things within a bounding box, or within a distance; calculate the areas of shapes, and have data types for points, lines, and shapes in your database tables.
PostgreSQL, the popular free SQL database, has spatial extensions called "Geographic Objects For PostgreSQL." PostGIS, an open source project by Refractions Research, includes data types to handle geometrical "simple features"such as points, lines, and shapesmany kinds of spatial queries, and different geographical datums. PostGIS can be used as a backend for GRASS, Mapserver, GeoServer, and QGIS, among other open source GIS projects.
PostGIS is available from http://postgis.refractions.net/. Debian packages, RPMs, and SRPMS for Fedora and Mandrake Linux are provided at this site. You will probably find it safer to build the RPMs on your own system from the SRPMS using rpmbuild --rebuild; we had to do this to avoid library loading problems. (You can download our RPMs from http://mappinghacks.com/rpm.) For Windows users, there is a CygWin binary available for PostGIS. The rest of these instructions in this hack assume a POSIX-like shell environment.
Alternatively, you can build PostGIS from source. It requires a PostgreSQL source tree against which it must be built. There are good instructions for this process at http://postgis.refractions.net/.
8.2.1. Set Up the Database
If this is a new PostgreSQL installation, you will need to set up a directory under which PostgreSQL stores its databases. This may be supplied by default as /var/lib/pgsql, also the home directory of your system's postgres user:
#> initdb -D /var/pgsql
Now create a database for your spatial data. As the postgres user on your system, enter:
#> createdb tiger
where tiger is the name you give to your database. This next step is key for PostGIS to work properly, because it needs the PL/SQL language extensions:
#> createlang plpgsql tiger
This step loads the PostGIS schema and functions into your new database:
#> psql -d tiger - f postgis.sql
You may need to provide the full path to wherever postgis.sql was installed on your system. One final step is needed to add data to the spatial_ref_sys table, which contains metadata about the different spatial projections that the geometry can be in:
#> psql -d tiger - f spatial_ref_sys.sql
If everything's working, you should see the last line COMMIT (see also http://postgis.refractions.net/docs/x83.htm).
8.2.2. Getting Data into PostGIS
A spatial database is no fun without spatially meaningful data. There are several techniques for getting your information into PostGIS.
8.2.2.1 Import a shapefile into PostGIS with shp2pgsql
You can use shp2pgsql to import geospatial data in shapefile format into PostGIS. For example, you might have a shapefile that contains a set of political boundaries for which you want to ask, "Which voting area contains this address? Where is the nearest polling station? What was the voter turnout rate in the last election?"
The shp2pgsql script comes with PostGIS and will convert a shapefile into a set of insert statements suitable for loading into a PostGIS database. To import a shapefile, try the following command:
shp2pgsql -c shapefile [database_name.]table_name
The database name is optional. To see the other options of this simple conversion program, just type shp2pgsql into your shell.
8.2.2.2 Import many formats into PostGIS with ogr2ogr
The ogr2ogr program comes with Frank Warmerdam's wonderful OGR/GDAL utilities [Hack #68] .
Packages containing GDAL are also available at the PostGIS download area. The ogr2ogr format conversion utility will export directly to a PostGIS data store.
TIGER, the Topologically Integrated Geographic Encoding and Referencing system, is a topological model of the United States compiled by the U.S. Census Bureau, released on the Web (http://www.census.gov/geo/www/tiger/) and on CD-ROM for free. The database "defines the location and relationship of streets, rivers, railroads, and other features to each other and to the numerous geographic entities for which the Census Bureau tabulates data from its censuses."
The full TIGER/Line data set is nearly 4 GB compressed. You can download some or all of it at your preference (At the time of writing, the 2003 data set has been released.) These options ask wget to traverse directories recursively downward, but not upward:
#> wget -Azip -np -r http://www2.census.gov/geo/tiger/tiger2003/
Now to import the TIGER model into PostGIS:
#> ogr2ogr | less
Called without options, ogr2ogr shows you whether your version of GDAL has support for PostgreSQL built in. If not, you will have to rebuild GDAL at this stage. Source RPMs for GDAL are also available at the PostGIS address given earlier.
This quick shell script traverses the directory tree and uses the GDAL utility ogr2ogr to import the TIGER data straight into PostGIS:
#!/bin/bash DATABASE=tiger while read zip; do rm *.RT? *.MET unzip $zip ogr2ogr $APPEND -f PostgreSQL PG:dbname=$DATABASE *.RT1 -lco LAUNDER=yes -nln complete_chain CompleteChain APPEND=-append done
CompleteChain is the layer containing the complete line data in TIGER; the -nln option instructs it to translate its name to one PostgreSQL is happier with. Since PostgreSQL treats table and column names case-sensitively, converting all the names to lowercase here ensures that we won't have trouble later on with other applications that talk to PostgreSQL, such as MapServer.
Head to the base directory of the census data you downloaded, and use the following line to find all ZIP files and unpack the line layer from them with the shell script:
#> cd www2.census.gov #> find -name *.zip |time import_tiger_data.sh
You don't need the time command, but it's interesting to find out how long this takes to run! The TIGER data for all 50 states of the U.S., uncompressed, is over 24 GB in size. Whether you imported all or merely some of TIGER/Line's complete chains, you can now visualize the layer of polylines in applications that can connect to PostGIS databases, including Map Server, Quantum GIS, and GRASS. For further hacking the hack, visit http://mappinghacks.com/projects/tiger2pqsql/ to see how we got the TIGER/Line polygons imported into PostGIS, as well.