Let's say you want to create Table B08406, "Sex of workers by means of transportation to work for workplace geography," for the state of Alaska from the files on the ftp site
http://www2.census.gov/acs2006/Final_Summaryfile
Which files do you need? How do you read the files?
You will need three files:
- The data dictionary (merge_5_6_final.xls)
- The data file (e2006ak0001000.txt)
- The geography file (g20061ak.txt)
Start with the data dictionary, merge_5_6_final.xls. Under the "Tblid" column, look for the value "B08406." You will see that the "Sequence Number" is "0001." This means that the data you looking for are in the file "e2006ak0001000.txt." How do we know this is the right file? We know this from the name of the file: the "e" stands for estimate, 2006 is the year, "ak" is the state (Alaska), and "0001" is the sequence number (which contains the data for Table B08406). See the "File Naming Conventions" section in Chapter 2.
Now that we have the data dictionary and data file, it remains for us to obtain the geography file. This is much simpler: the file is g20061ak.txt, where "g" means "geography," 2006 is the year, 1 is the period estimate (in this case, 1-year estimate), and ak is the state.
When you open the data file, e20061ak0001000.txt (the data file), you will see the following comma-delimited fields on the first line:
ACSSF,2006e1,ak,000,0001,0000001,322541,256332,215678,40654,31477,5795,3382,3709,3327,0,0,0,382,2728,29712,14006,16054,174492,134467,114565,199,14638,3200,2064,2275,1893,0,0,0,382,2146,17356,...
The first six fields - from "ACSSF" to "0000001" - are identifiers.
- The first field tells you that this is an ACS Summary File;
- The second tells you that these data are one-year estimates for the year 2008 (notice the "e" before "2006" and the "1" at the end);
- The third tells you the state ("ak" is Alaska);
- The fourth is an iteration number;
- The fifth is the all-important sequence number;
- The last is a logical record code (LOGRECNO). The LOGRECNO identifies the location within a state.
The geography file, g20061ak.txt, describes the LOGRECNO. Each LOGRECNO specifies a location pertaining to the state. For example, a LOGRECNO of "0000001" means the state of Alaska; a LOGRECNO of "0000002" means just the urban areas in Alaska; a LOGRECNO of "0000003" refers to just rural areas in Alaska. Notice that each state has its own geography file. (Here the "ak" in the file name means the state of Alaska.)
The other fields in the data file, from the seventh on, are data values. Each field corresponds to the value of the "line number" variable in the data dictionary. So field number seven (the 322,541 value, after the sixth comma) corresponds to line number one, which is "Total." Field number eight (the 256,332 value, after the seventh comma) refers to line number two, which is "Car, Truck, or Van." This continues all the way up to line number 51, at which point Table B08406 ends.
Were you to read this into a computer program using software such as SAS, you could translate the first ten fields of line number one in e20061ak0001000.txt as follows:
File Identification |
File Type |
State/U.S.-Abbreviation (USPS) |
Character Iteration |
Sequence Number |
Logical Record Number |
Total: |
Car, truck, or van: |
Drove alone |
Carpooled: |
ACSSF |
2006e1 |
AK |
000 |
0001 |
0000001 |
322541 |
256332 |
215678 |
40654 |
Figure 2.4 Linking Geographic Header File to the Data Files