This repository has been archived by the owner on Jul 11, 2019. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 9
Clean Up the Data
atogle edited this page May 2, 2011
·
10 revisions
Now that we've cloned the census2pgsql project and downloaded the census data, we can start putting into PostGIS. You'll notice that our census2pgsql/data
directory is now full of files ending in .pl. These are the raw census files. Each state has three files that looks like this, where XX is a state abbreviation:
- XXgeo2010.pl
- XX000012010.pl
- XX000022010.pl
They're just text so feel free to take a look at them in your favorite text editor or by typing something like
less akgeo2010.pl
.
The first step is to parse these files into something a little more useable. We're going to merge the files for each state into a single tab-delimited file using a Ruby script.
- Change your working directory
ruby
by typingcd ../ruby
, assuming that you're still in thedata
from the previous instructions. - Type
ruby merge_flat_files.rb
to start merging the census files. This is going to take the three files for each state and merge them into a single tab-delimited file called XX_merged.csv, where XX is the state abbreviation. These files will live in thedata
directory. - Go on to the next section and get the database setup. These are very big files so it will take a long time to parse, up to a few hours depending on your hardware.
cd ../ruby
ruby merge_flat_files.rb
Awesome, we have the data cleaned up so let's load the data into PostGIS!