As mentioned earlier, sample data are subject to nonsampling error. This component of error could introduce serious bias into the data, and the total error could increase dramatically over that which would result purely from sampling. While it is impossible to completely eliminate nonsampling error from a survey operation, the Census Bureau attempts to control the sources of such error during the collection and processing operations. Described below are the primary sources of nonsampling error and the programs instituted for control of this error. The success of these programs, however, is contingent upon how well the instructions were carried out during the survey.
- Coverage Error - It is possible for some sample housing units or persons to be missed entirely by the survey (undercoverage), but it is also possible for some sample housing units and persons to be counted more than once (overcoverage). Both the undercoverage and overcoverage of persons and housing units can introduce biases into the data, increase respondent burden and survey costs.
A major way to avoid coverage error in a survey is to ensure that its sampling frame, for ACS an address list in each state, is as complete and accurate as possible. The source of addresses for the ACS is the MAF, which was created by combining the Delivery Sequence File of the United States Postal Service and the address list for Census 2000. An attempt is made to assign all appropriate geographic codes to each MAF address via an automated procedure using the Census Bureau TIGER (Topologically Integrated Geographic Encoding and Referencing) files. A manual coding operation based in the appropriate regional offices is attempted for addresses, which could not be automatically coded. The MAF was used as the source of addresses for selecting sample housing units and mailing questionnaires. TIGER produced the location maps for CAPI assignments. Sometimes the MAF has an address that is the duplicate of another address already on the MAF. This could occur when there is a slight difference in the address such as 123 Main Street versus 123 Maine Street.
In the CATI and CAPI nonresponse follow-up phases, efforts were made to minimize the chances that housing units that were not part of the sample were interviewed in place of units in sample by mistake. If a CATI interviewer called a mail nonresponse case and was not able to reach the exact address, no interview was conducted and the case was eligible for CAPI. During CAPI follow-up, the interviewer had to locate the exact address for each sample housing unit. If the interviewer could not locate the exact sample unit in a multi-unit structure, or found a different number of units than expected, the interviewers were instructed to list the units in the building and follow a specific procedure to select a replacement sample unit. Person overcoverage can occur when an individual is included as a member of a housing unit but does not meet ACS residency rules.
Coverage rates give a measure of undercoverage or overcoverage of persons or housing units in a given geographic area. Rates below 100 percent indicate undercoverage, while rates above 100 percent indicate overcoverage. Coverage rates are released concurrent with the release of estimates on American FactFinder in the B98 series of detailed tables. Further information about ACS coverage rates may be found at
.
- Nonresponse Error - Survey nonresponse is a well-known source of nonsampling error. There are two types of nonresponse error - unit nonresponse and item nonresponse. Nonresponse errors affect survey estimates to varying levels depending on amount of nonresponse and the extent to which nonrespondents differ from respondents on the characteristics measured by the survey. The exact amount of nonresponse error or bias on an estimate is almost never known. Therefore, survey researchers generally rely on proxy measures, such as the nonresponse rate, to indicate the potential for nonresponse error.
-Unit Nonresponse - Unit nonresponse is the failure to obtain data from housing units in the sample. Unit nonresponse may occur because households are unwilling or unable to participate, or because an interviewer is unable to make contact with a housing unit. Unit nonresponse is problematic when there are systematic or variable differences between interviewed and noninterviewed housing units on the characteristics measured by the survey. Nonresponse bias is introduced into an estimate when differences are systematic, while nonresponse error for an estimate evolves from variable differences between interviewed and noninterviewed households.
The ACS makes every effort to minimize unit nonresponse, and thus, the potential for nonresponse error. First, the ACS used a combination of mail, CATI, and CAPI data collection modes to maximize response. The mail phase included a series of three to four mailings to encourage housing units to return the questionnaire. Subsequently, mail nonrespondents (for which phone numbers are available) were contacted by CATI for an interview. Finally, a subsample of the mail and telephone nonrespondents was contacted by a personal visit to attempt an interview. Combined, these three efforts resulted in a very high overall response rate for the ACS.
ACS response rates measure the percent of units with a completed interview. The higher the response rate, and consequently the lower the nonresponse rate, the less chance estimates may be affected by nonresponse bias. Response and nonresponse rates, as well as rates for specific types of nonresponse, are released concurrent with the release of estimates on American FactFinder in the B98 series of detailed tables. Further information about response and nonresponse rates may be found at
.
-Item Nonresponse - Nonresponse to particular questions on the survey questionnaire and instrument allows for the introduction of error or bias into the data, since the characteristics of the nonrespondents have not been observed and may differ from those reported by respondents. As a result, any imputation procedure using respondent data may not completely reflect this difference either at the elemental level (individual person or housing unit) or on average.
Some protection against the introduction of large errors or biases is afforded by minimizing nonresponse. In the ACS, item nonresponse for the CATI and CAPI operations was minimized by the requirement that the automated instrument receive a response to each question before the next one could be asked. Questionnaires returned by mail were edited for completeness and acceptability. They were reviewed by computer for content omissions and population coverage. If necessary, a telephone follow-up was made to obtain missing information. Potential coverage errors were included in this follow-up.
Allocation tables provide the weighted estimate of persons or housing units for which a value was imputed, as well as the total estimate of persons or housing units that were eligible to answer the question. The smaller the number of imputed responses, the lower the chance that the item nonresponse is contributing a bias to the estimates. Allocation tables are released concurrent with the release of estimates on American Factfinder in the B99 series of detailed tables with the overall allocation rates across all person and housing unit characteristics in the B98 series of detailed tables. Additional information on item nonresponse and allocations can be found at
.
- Measurement and Processing Error - The person completing the questionnaire or responding to the questions posed by an interviewer could serve as a source of error, although the questions were cognitively tested for phrasing, and detailed instructions for completing the questionnaire were provided to each household.
-Interviewer monitoring - The interviewer may misinterpret or otherwise incorrectly enter information given by a respondent; may fail to collect some of the information for a person or household; or may collect data for households that were not designated as part of the sample. To control these problems, the work of interviewers was monitored carefully. Field staff were prepared for their tasks by using specially developed training packages that included hands-on experience in using survey materials. A sample of the households interviewed by CAPI interviewers was reinterviewed to control for the possibility that interviewers may have fabricated data.
-Processing Error - The many phases involved in processing the survey data represent potential sources for the introduction of nonsampling error. The processing of the survey questionnaires includes the keying of data from completed questionnaires, automated clerical review, follow-up by telephone, manual coding of write-in responses, and automated data processing. The various field, coding and computer operations undergo a number of quality control checks to insure their accurate application.
- Content Editing - After data collection was completed, any remaining incomplete or inconsistent information was imputed during the final content edit of the collected data. Imputations, or computer assignments of acceptable codes in place of unacceptable entries or blanks, were needed most often when an entry for a given item was missing or when the information reported for a person or housing unit on that item was inconsistent with other information for that same person or housing unit. As in other surveys and previous censuses, the general procedure for changing unacceptable entries was to allocate an entry for a person or housing unit that was consistent with entries for persons or housing units with similar characteristics. Imputing acceptable values in place of blanks or unacceptable entries enhances the usefulness of the data.
Issues With Approximating The Standard Error Of Linear Combinations Of Multiple Estimates
Several examples are provided here to demonstrate how different the approximated standard errors of sums can be compared to those derived and published with ACS microdata. These examples use estimates from the 2005-2009 ACS 5-year data products.
A. With the release of the 5-year data, detailed tables down to tract and block group will be available. At these geographic levels, many estimates may be zero. As mentioned in the 'Calculations of Standard Errors' section, a special procedure is used to estimate the MOE when an estimate is zero. For a given geographic level, the MOEs will be identical for zero estimates. When summing estimates which include many zero estimates, the standard error and MOE in general will become unnaturally inflated. Therefore, users are advised to sum only one of the MOEs from all of the zero estimates.
Suppose we wish to estimate the total number of people whose first reported ancestry was 'Subsaharan African' in Rutland County, Vermont.
Table A: 2005-2009 Ancestry Categories from Table B04001: First Ancestry Reported
First Ancestry Reported Category |
Estimate |
MOE |
Subsaharan African: |
48 |
43 |
Cape Verdean |
9 |
15 |
Ethiopian |
0 |
93 |
Ghanian |
0 |
93 |
Kenyan |
0 |
93 |
Liberian |
0 |
93 |
Nigerian |
0 |
93 |
Senegalese |
0 |
93 |
Sierra Leonean |
0 |
93 |
Somalian |
0 |
93 |
South African |
10 |
16 |
Sudanese |
0 |
93 |
Ugandan |
0 |
93 |
Zimbabwean |
0 |
93 |
African |
20 |
33 |
Other Subsaharan African |
9 |
16 |
To estimate the total number of people, we add up all of the categories. Total Number of People = 9 + 0 + ...+ 0 + 10 + 0 ... + 20 + 9 = 48
To approximate the standard error using all of the MOEs we obtiain:
Using only one of the MOEs from the zero estimates, we obtain:
From the table, we know that the actual MOE is 43, giving a standard error of 43 / 1.645 = 26.1. The first method is roughly seven times larger than the actual standard error, while the second method is roughly 2.4 times larger.
Leaving out all of the MOEs from zero estimates we obtain:
In this case, it is very close to the actual SE. This is not always the case, as can be seen in the examples below.
B. Suppose we wish to estimate the total number of males with income below the poverty level in the past 12 months using both state and PUMA level estimates for the state of Wyoming. Part of the detailed table B170012 is displayed below with estimates and their margins of error in parentheses.
Table B: 2005-2009 ACS estimates of Males with Income Below Poverty from table B17001: Poverty Status in the Past 12 Months by Sex by Age
Characteristic |
Wyoming |
PUMA 00100 |
PUMA 00200 |
PUMA 00300 |
PUMA 00400 |
Male |
21,769 (1,480) |
4,496 (713) |
5,891 (622) |
4,706 (665) |
6,676 (742) |
Under 5 Years |
3,064 (422) |
550 (236) |
882 (222) |
746 (196) |
886 (237) |
5 Years Old |
348 (106) |
113 (65) |
89 (57) |
82 (55) |
64 (44) |
6 to 11 Years Old |
2,424 (421) |
737 (272) |
488 (157) |
562 (163) |
637 (196) |
12 to 14 Years Old |
1,281 (282) |
419 (157) |
406 (141) |
229 (106) |
227 (111) |
15 Years Old |
391 (128) |
51 (37) |
167 (101) |
132 (64) |
41 (38) |
16 and 17 Years Old |
779 (258) |
309 (197) |
220 (91) |
112 (72) |
138 (112) |
18 to 24 Years old |
4,504 (581) |
488 (192) |
843 (224) |
521 (343) |
2,652 (481) |
25 to 34 Years Old |
2,289 (366) |
516 (231) |
566 (158) |
542 (178) |
665 (207) |
35 to 44 Years Old |
2,003 (311) |
441 (122) |
535 (160) |
492 (148) |
535 (169) |
45 to 54 Years Old |
1,719 (264) |
326 (131) |
620 (181) |
475 (136) |
298 (113) |
55 to 64 Years Old |
1,766 (323) |
343 (139) |
653 (180) |
420 (135) |
350 (125) |
65 to 74 Years Old |
628 (142) |
109 (69) |
207 (77) |
217(72) |
95 (55) |
75 Years and Older |
573 (147) |
94 (53) |
215 (86) |
176 (72) |
88 (62) |
The first way is to sum the thirteen age groups for Wyoming:
Estimate(Male) = 3,064 + 348 + ... + 573 = 21,769.
The first approximation for the standard error in this case gives us:
A second way is to sum the four PUMA estimates for Male to obtain: Estimate(Male) = 4,496 + 5,891 + 4,706 + 6,676 = 21,769 as before. The second approximation for the standard error yields:
Finally, we can sum up all thirteen age groups for all four PUMAs to obtain an estimate based on a total of 52 estimates:
Estimate (Male) = 550 + 113 + - + 88 = 21,769
And the third approximated standard error is
However, we do know that the standard error using the published MOE is 1,480 /1.645 = 899.7. In this instance, all of the approximations under-estimate the published standard error and should be used with caution.
C. Suppose we wish to estimate the total number of males at the national level using age and citizenship status. The relevant data from table B05003 is displayed in table C below.
Table C: 2005-2009 ACS estimates of males from B05003: Sex by Age by Citizenship Status
Characteristic |
Estimate |
MOE |
Male |
148,535,646 |
6,574 |
Under 18 Years |
37,971,739 |
6,285 |
Native |
36,469,916 |
10,786 |
Foreign Born |
1,501,823 |
11,083 |
Naturalized U.S. Citizen |
282,744 |
4,284 |
Not a U.S. Citizen |
1,219,079 |
10,388 |
18 Years and Older |
110,563,907 |
6,908 |
Native |
93,306,609 |
57,285 |
Foreign Born |
17,257,298 |
52,916 |
Naturalized U.S. Citizen |
7,114,681 |
20,147 |
Not a U.S. Citizen |
10,142,617 |
53,041 |
The estimate and its MOE are actually published. However, if they were not available in the tables, one way of obtaining them would be to add together the number of males under 18 and over 18 to get:
Estimate (Male) = 37,971,739 + 110,563,907 = 148,535,646
And the first approximated standard error is
Another way would be to add up the estimates for the three subcategories (Native, and the two subcategories for Foreign Born: Naturalized U.S. Citizen, and Not a U.S. Citizen), for males under and over 18 years of age. From these six estimates we obtain:
Estimate (Male)
= 36,469,916 + 282,744 + 1,219,079 + 93,306,609 + 7,114,681 + 101,42,617 = 148,535,646
With a second approximated standard error of:
We do know that the standard error using the published margin of error is 6,574 / 1.645 = 3,996.4. With a quick glance, we can see that the ratio of the standard error of the first method to the published-based standard error yields 1.42; an over-estimate of roughly 42%, whereas the second method yields a ratio of 12.49 or an over-estimate of 1,149%. This is an example of what could happen to the approximate SE when the sum involves a controlled estimate. In this case, it is sex by age.
D. Suppose we are interested in the total number of people aged 65 or older and its standard error. Table D shows some of the estimates for the national level from table B01001 (the estimates in gray were derived for the purpose of this example only).
Table D: Some Estimates from AFF Table B01001: Sex by Age for 2005-2009
Age Category |
Estimate, Male |
MOE, Male |
Estimate, Female |
MOE, Female |
Total |
Approximated MOE, Total |
65 and 66 years old |
2,248,426 |
8,047 |
2,532,831 |
9,662 |
4,781,257 |
12,574 |
67 to 69 years old |
2,834,475 |
8,953 |
3,277,067 |
8,760 |
6,111,542 |
12,526 |
70 to 74 years old |
3,924,928 |
8,937 |
4,778,305 |
10,517 |
8,703,233 |
13,801 |
75 to 79 years old |
3,178,944 |
9,162 |
4,293,987 |
11,355 |
7,472,931 |
14,590 |
80 to 84 years old |
2,226,817 |
6,799 |
3,551,245 |
9,898 |
5,778,062 |
12,008 |
85 years and older |
1,613,740 |
7058 |
3,540,105 |
10,920 |
5,153,845 |
13,002 |
Total |
16,027,330 |
20,119 |
21,973,540 |
25,037 |
38,000,870 |
32,119 |
To begin we find the total number of people aged 65 and over by simply adding the totals for males and females to get 16,027,330 + 21,973,540 = 38,000,870. One way we could use is summing males and female for each age category and then using their MOEs to approximate the standard error for the total number of people over 65.
Now, we calculate for the number of people aged 65 or older to be 38,000,870 using the six derived estimates and approximate the standard error:
For this example the estimate and its MOE are published in table B09017. The total number of people aged 65 or older is 38,000,870 with a margin of error of 4,944. Therefore the published- based standard error is:
SE(38,000,870) = 4,944/1.645 = 3,005.
The approximated standard error, using six derived age group estimates, yields an approximated standard error roughly 10.7 times larger than the published-based standard error.
As a note, there are two additional ways to approximate the standard error of people aged 65 and over in addition to the way used above. The first is to find the published MOEs for the males age 65 and older and of females aged 65 and older separately and then combine to find the approximate standard error for the total. The second is to use all twelve of the published estimates together, that is, all estimates from the male age categories and female age categories, to create the SE for people aged 65 and older. However, in this particular example, the results from all three ways are the same. So no matter which way you use, you will obtain the same approximation for the SE. This is different from the results seen in example B.
E. For an alternative to approximating the standard error for people 65 years and older seen in part D, we could find the estimate and its SE by summing all of the estimate for the ages less than 65 years old and subtracting them from the estimate for the total population. Due to the large number of estimates, Table E does not show all of the age groups. In addition, the estimates in part of the table shaded gray were derived for the purposes of this example only and cannot be found in base table B01001.
Table E: Some Estimates from AFF Table B01001: Sex by Age for 2005-2009:
Age Category |
Estimate, Male |
MOE, Male | Estimate, Female |
MOE, Female |
Total |
Estimated MOE, Total |
Total Population |
148,535,646 |
6,574 |
152,925,887 |
6,584 |
301,461,533 |
9,304 |
Under 5 years |
10,663,983 |
3,725 |
10,196,361 |
3,557 |
20,860,344 |
5,151 |
5 to 9 years old |
10,137,130 |
15,577 |
9,726,229 |
16,323 |
19,863,359 |
22,563 |
10 to 14 years old |
10,567,932 |
16,183 |
10,022,963 |
17,199 |
20,590,895 |
23,616 |
... |
... | ... | ... | ... |
|
|
62 to 64 years old |
3,888,274 |
11,186 |
4,257,076 |
11,970 |
8,145,350 |
16,383 |
Total for Age 0 to 64 years old |
132,508,316 |
48,688 |
130,952,347 |
49,105 |
263,460,663 |
69,151 |
Total for Age 65 years and older |
16,027,330 |
49,130 |
21,973,540 |
49,544 |
38,000,870 |
69,774 |
An estimate for the number of people age 65 and older is equal to the total population minus the population between the ages of zero and 64 years old:
Number of people aged 65 and older: 301,461,533 - 263,460,663 = 38,000,870.
The way to approximate the SE is the same as in part D. First we will sum male and female estimates across each age category and then approximate the MOEs. We will use that information to approximate the standard error for our estimate of interest:
And the SE for the total number of people aged 65 and older is:
Again, as in Example D, the estimate and its MOE are published in B09017. The total number of people aged 65 or older is 38,000,870 with a margin of error of 4,944. Therefore the standard error is:
SE(38,000,870) = 4,944 / 1.645 = 3,005.
The approximated standard error using the seventeen derived age group estimates yields a standard error roughly 14.1 times larger than the actual SE.
Data users can mitigate the problems shown in examples A through E to some extent by utilizing a collapsed version of a detailed table (if it is available) which will reduce the number of estimates used in the approximation. These issues may also be avoided by creating estimates and SEs using the Public Use Microdata Sample (PUMS) or by requesting a custom tabulation, a fee- based service offered under certain conditions by the Census Bureau. More information regarding custom tabulations may be found at
.