Skip to content
This repository has been archived by the owner on Jul 11, 2019. It is now read-only.

Clean Up the Data

atogle edited this page May 2, 2011 · 10 revisions

Now that we've cloned the census2pgsql project and downloaded the census data, we can start putting into PostGIS. You'll notice that our census2pgsql/data directory is now full of files ending in .pl. These are the raw census files. Each state has three files that looks like this, where XX is a state abbreviation:

  • XXgeo2010.pl
  • XX000012010.pl
  • XX000022010.pl They're just text so feel free to take a look at them in your favorite text editor or by typing something like less akgeo2010.pl.

Parse the PL Files

The first step is to parse these files into something a little more useable. We're going to merge the files for each state into a single tab-delimited file using a Ruby script.

  1. Change your working directory ruby by typing cd ../ruby, assuming that you're still in the data from the previous instructions.
  2. Type ruby merge_flat_files.rb to start merging the census files. This is going to take the three files for each state and merge them into a single tab-delimited file called XX_merged.csv, where XX is the state abbreviation. These files will live in the data directory.
  3. Go on to the next section and get the database setup. These are very big files so it will take a long time to parse, up to a few hours depending on your hardware.

Section Summary

cd ../ruby
ruby merge_flat_files.rb

Awesome, we have the data cleaned up so let's load the data into PostGIS!

Clone this wiki locally