Uniform Crime Reporting Program Data: County-Level Detailed Arrest and Offense Data, 2014: Codebook
3. ICPSR Data Collection Description
The UNIFORM CRIME REPORTING PROGRAM DATA: COUNTY-LEVEL DETAILED
ARREST AND OFFENSE DATA, 2014 reports counts of arrests and offenses
for the Uniform Crime Reports (UCR) Part I crimes: murder,
rape, robbery, aggravated assault, burglary, larceny, auto theft,
and arson. The UCR County-level Arrest files also report arrests for
additional (Part II) crimes such as forgery, fraud, vice offenses,
and drug possession or sale. The data were originally collected by
the Federal Bureau of Investigation (FBI) from reports submitted by
agencies and states participating in the UCR Program. Each agency or
jurisdiction is uniquely identified in the collection by the UCR
Originating Agency Identifier (ORI). In describing this data collection
the terms agency, jurisdiction, and ORI are used synonymously.
Detailed discussions of reporting procedures are found in the
UNIFORM CRIME REPORTING HANDBOOK (Washington, DC: U.S. Government
Printing Office, 1980), and in the codebooks for the ICPSR's
Agency-level UCR data collections (see ICPSR 9028).
It should be emphasized that, while UCR staff were consulted
in developing the new adjustment procedures, these UCR county-level
files are not official FBI UCR releases and are being provided for
research purposes only. Users with questions regarding these UCR
county-level data files can contact the National Archive of
Criminal Justice Data at the ICPSR.
Two major changes to the UCR county-level files were
implemented with the 1994 data and are continued through
the 2014 data. A new imputation algorithm to adjust for incomplete
reporting by individual law enforcement jurisdictions has been
adopted. Also, a new Coverage Indicator has been created to provide
users with a diagnostic measure of aggregated data quality in a
particular county. These developments are described in greater
detail below. The changes were instituted in response to comments
from a number of users and after almost a year of discussions by
UCR file users, the FBI, the Bureau of Justice Statistics, and the
Inter-university Consortium for Political and Social Research.
These changes result in a break in series from previous
UCR county-level files. Data from earlier year files should not be
compared to data from 1994 and subsequent years. Changes in procedures
used to adjust for incomplete reporting at the ORI or jurisdiction
level may be expected to have an impact on aggregates for counties
in which some ORIs have not reported for all 12 months. However,
the new adjustment procedures should result in county-level data
that are less sensitive to changes between years in the extent of
reporting by ORIs within a county. Consequently, data from 1994
forward should be more accurate estimates for longitudinal analysis.
The crimes reported data prior to 2009 included the variables INDEX and
MODINDX. For 2009 forward these variables, INDEX (The sum of variables
MURDERS through MVTHEFT) and MODINDX (The sum of variables MURDERS through
ARSON), were replaced with VIOL (The sum of variables MURDERS through
AGALST) and PROPERTY (The sum of variables BURGLRY through MVTHEFT). While
variable ARSON is listed separately for the crime reported data, it is
included in property crimes for the arrest data.
3.2 Changes in Imputation Procedures for Incomplete Reporting
IMPUTATION PROCEDURES USED FOR 1977-1993 UCR COUNTY-LEVEL
FILES: The data for any ORI reporting 12 months were used for
county aggregation as submitted. Data for an ORI reporting six to
11 months were increased by a weight of [12/months reported]. Data
for any ORI reporting less than 6 months were deleted from the
county total, and the population served by that ORI was deleted
from the county population total to help control for differential
data quality across counties.
The aim of this procedure was to produce comparable data
cross-sectionally across all counties. However, if there were
major changes in the ORIs that reported in a county across years,
artifactual changes in the longitudinal data for a county could be
introduced because of potential variation in the type of ORIs used
to compute imputed county totals and rates each year.
IMPUTATION PROCEDURES USED FOR 1994 UCR COUNTY-LEVEL FILES AND
ONWARD: The data for any ORI reporting 12 months were used for
county aggregation as submitted. Data for an ORI reporting 3 to
11 months were increased by a weight of [12/months reported]. For
ORIs reporting 0 to 2 months in the arrest files or 1 to 2 months in
the crimes reported files, data for these ORIs were set to zero
and then data were estimated using rates calculated from ORIs
reporting 12 months of data located in the ORI's geographic stratum
based on UCR Population Groups within their state. UCR Population
Groups are defined as follows:
Population Group
-----------------
- Cities 250,000 and over
- Cities 100,000 - 249,999
- Cities 50,000 - 99,999
- Cities 25,000 - 49,999
- Cities 10,000 - 24,999
- Cities 2,500 - 9,999
- Cities under 2,500
- Non-MSA counties and non-MSA State Police
- MSA counties and MSA State Police
There is an important distinction in estimation procedures for
the arrest versus the crimes reported files. For the arrest files
(Parts 1-3 and 5-7), data were estimated for agencies reporting 0-2
months based on the procedures mentioned above. However, due to the
structure of the original data received from the FBI, ICPSR was only
able to estimate data for agencies reporting 1-2 months in the crimes
reported files (Parts 4 and 8). No estimations are provided for
agencies reporting 0 months. To assist users, the county population
figures and number of agencies reporting for both the arrest and
crimes reported data files were added to the crimes reported files.
These variables provide users with information that can be used to
determine the proportion of the population or agencies not included
in the crime reported files due to the absence of records for agencies
reporting 0 months. Users should keep in mind these differences when
doing their analyses.
For releases of UCR county-level files before 1994, data from
jurisdictions reporting less than 6 months of data were not
included in county totals in an effort to ensure cross-sectional
data comparability and quality. With the new procedures to adjust
for incomplete reporting, data will be provided for each active
ORI that reports less than 12 months of data, whether through
weighting of partial year data or substitution of a value based on
population group and state. Instead of exercising an a priori
judgment that 6 months of data is the minimum threshold for
acceptable data quality, a new Coverage Indicator variable has
been created that will allow users to set their own threshold for
acceptable data quality and to include or exclude data based on
the standards they set themselves. The Coverage Indicator
variable represents the proportion of county data that is reported
for a given year. The indicator ranges from 0 to 100. A value of 0
indicates that no data for the county were reported and all data
have been imputed. A value of 100 indicates that all ORIs in the
county reported for all 12 months in the year.
Coverage Indicator is calculated as follows:
CI_x = 100 * ( 1 - SUM_i { [ORIPOP_i/COUNTYPOP] * [ (12 - MONTHSREPORTED_i)/12 ] } )
where CI = COVERAGE INDICATOR
x = county
i = ORI within county
Some ORIs do not have a population associated with their
jurisdiction. These ORIs report for jurisdictions such as national
parks, colleges and universities, toll bridges and tunnels, and
most state police departments. As the coverage indicator is based
on months of reporting and the population of each agency, this
variable will not show estimation that did occur for statewide
ORIs and for ORIs as listed above that do not have a population
but reported 3 to 11 months of data. Conversely, the coverage
indicator will indicate that estimation has occurred for ORIs with
a population that reported 3 to 11 months of data even if the ORIs
actually reported no crimes or arrests. Similarly, the coverage
indicator will indicate that estimation had occurred for ORIs with
a population that reported 0 to 2 months of arrest data and 1 to 2
months of crimes reported data, even though no rate was calculated to
estimate data because of the lack of ORIs in the agency's geographic
stratum reporting 12 months of data. Finally, since data for ORIs
that reported 0 to 2 months of arrest and 1 to 2 months of crime are
set to zero, users should be aware that no estimation of data was
possible for ORIs in these categories without a population. In the
crimes reported files, no estimation was possible for agencies reporting
0 months because population figures for these agencies were
not provided in the original files from the FBI.
3.4 Comparing "Crime in the United States 2014" To ICPSR 33523
"Crime in the United States 2014" is a publication prepared by the
Federal Bureau of Investigation and provides estimations of national
reported crime activity and arrest statistics from law enforcement
agencies in the UCR Program. Users of this data collection prepared
by ICPSR may not be able to match the statistics presented in "Crime
in the United States 2014" due to several factors:
- The UCR staff continue to update agency records when additions or
corrections are received by the UCR Program. The FBI statistics
presented in "Crime in the United States 2014" are based on the
data that the FBI has received prior to their established
publication deadlines. The data used by ICPSR to prepare the
data in ICPSR 33523 may contain additions or corrections to the
data submitted to the FBI after their publication deadline for
producing "Crime in the United States 2014."
- The new imputation algorithm that ICPSR implemented in 1994
approximates the estimation procedures that the FBI uses to adjust
for incomplete or unreported data from individual law enforcement
jurisdictions. Limited or no crime or arrest data were available
for some states in 2014. For some tables in "Crime in the United
States 2014" (for example, Section II, Table 2: Total Crime Counts
in the United States, and Section IV, Table 29: Estimated Arrests),
the FBI provided estimated crime counts and arrest totals for some
of these states (see Appendix I -- Methodology for the FBI's
estimation methods). The ICPSR algorithm estimates data for
agencies reporting less than three months of data by calculating
rates from agencies reporting 12 months of data located in the
agency's geographic stratum within that state. Since data for
these states were very limited or not present in the original
data, estimates for the entire state could not be produced by
ICPSR.
- Some tables in "Crime in the United States 2014" were prepared
using data only from agencies submitting complete reports for
the 12 months for 2014 prior to the publication deadline (for
example, Section IV, Table 69: Arrests by State). For this ICPSR
data collection, all law enforcement agency records present in the
original FBI data were used in the aggregation to the county
level. County records with a coverage indicator value of 100
contain only 12-month reporting agencies (see last paragraph of
the Coverage Indicator section in the Introduction for possible
exceptions). Agencies reporting 12 months of data within a county
with a coverage indicator value less than 100 cannot be separately
identified in this dataset aggregated at the county level.
3.5 Identifying Missing Data
In this data collection, zeroes may represent both true zeroes
or missing data, and it is possible to distinguish between the two.
In the arrest files (Parts 1-3 and 5-7), the coverage indictor alone
does not indicate whether or not zeroes are true or missing values. A
county can have a coverage indicator of zero, but still contain
arrest data. ICPSR can estimate arrest data for a county that did
not report any arrests based on data for other counties of comparable
population size in the same state that did report 12 months of data.
In order to distinguish true zeroes from missing values, the
user has to look at the coverage indicator for a county in conjunction
with the arrest variables. If an arrest variable for a particular
county has a value of zero and the coverage indicator is greater
than zero, then the arrest variable contains a true zero. This true
zero is not necessarily a reported zero as it may have been estimated.
If an arrest variable for a particular county has a value of zero and
the coverage indicator for that county is zero, then the user must
check the values of ALL of the arrest variables. If some arrest
variables contain zeroes and some contain other values, then the
zeroes for this county are true. If ALL arrest variables and the
coverage indicator for the county in question contain zeroes, then
these indicate missing data.
In the crimes reported file (Part 4), data can not be estimated
for non-reporting agencies. Therefore, counties with no population for
reporting agencies will also have a coverage indicator of zero and
all crime variables will have a value of zero. These zeroes indicate
missing data. If a county has one or more offense variables with a
value of zero, but has a value of greater than zero for the
population and coverage indicator variables, then the zeroes for
these particular offense variables should be considered true zeroes
for the purposes of analysis.
SUMMARY OF HOW TO IDENTIFY MISSING DATA
ARREST FILES:
TRUE ZERO
Coverage indicator > 0 or Coverage indicator = 0
AND at least one arrest count variable > 0
MISSING DATA
Coverage indicator = 0
AND all arrest count variables = 0
------------------------------------------------------
CRIMES REPORTED:
TRUE ZERO
Coverage indicator > 0
MISSING DATA
Coverage indicator = 0
AND all other crime count variables will necessarily = 0
Cities designated by the Census Bureau as independent cities
are reported separately and have their own "county" codes (see
Appendix). Some jurisdictions, such as state parks and some state
police, provide data only on a statewide basis. In these cases,
data are allocated to counties proportionate to their share of the
total state population. Parts 5 to 8 contain the amount of
statewide data allocated to each county. To obtain the county
total without the statewide allocation, the statewide values can
be subtracted from the county data.
State Police data for Vermont were not reported within a county
and the State Police data for Alaska and Connecticut are not allocated
to the counties. These three State Police records are identified by
the county code 999. In the 1997 data and onward another 999 county
code was added to these data files. The New York/New Jersey Port
Authority reported data in 1997 through two agencies, one in New York
and one in New Jersey. The FBI included the New York Port Authority
agency as part of New York County. However, the FBI did not assign a
UCR county code for the New Jersey Port Authority and did not provide
a population figure for the area covered by this jurisdiction.
Therefore, the New Jersey/New York Port Authority (NJNYPOA) is a
separate record in the data files from 1997 onward with a county code
of 999. New York City Metro Transportation Authority (NY03083)
is a separate record from 2007 onward with a county code of 999. From
2008, tribal agencies began to report offense data. Data for those
agencies in each state will be in one record identified by the county
code of 777.
ICPSR uses FIPS Publication 6-4 "Counties and Equivalent Entities
of the United States, Its Possessions, and Associated Areas" from the
National Institute of Standards and Technology (NIST), U.S. Department
of Commerce, as a reference for assigning FIPS codes for the counties
in this data collection. ICPSR consults this publication annually to
note changes in FIPS codes that may affect this collection.
The borough of Denali, Alaska (FIPS code 02068) was created from
part of the Yukon-Koyukuk Census Area (FIPS code 02290) and
unpopulated part of the Southeast Fairbanks Census Area (FIPS code
02240) effective December 7, 1990. The Denali Borough is NOT a
record in the data. However, since no agency record from Denali
was present in the original FBI data for either the arrest or
crimes reported files, no data are missing from the ICPSR data.
In May 1995, NIST announced that the independent city of South
Boston, Virginia reverted to town status and became part of Halifax
County as of June 30, 1995. The Virginia FIPS county code of 780 no
longer exists and South Boston is included in FIPS county code 083.
ICPSR did not adjust this data collection to reflect this change so
that the data files would remain consistent with earlier years. Users
who wish to use the current FIPS codes can combine the data from
South Boston with Halifax County.
In July 1999, NIST announced that Yellowstone National Park, which
had been treated as a county equivalent, is legally part of Gallatin
County and Park County, Montana. Yellowstone National Park was left as
a county equivalent in this data collection to remain consistent with
earlier years.
In July 1999, NIST announced that as of November 13, 1993 Dade
County, Florida officially changed its name to Miami-Dade County.
The county was assigned a new FIPS code (086) to maintain the
alphanumeric sequence of counties. In the 1997 data collection and
onward, ICPSR also changed the Miami-Dade county code from 025 to 086
to match the change in the FIPS publication. However, the 1993-1996
UCR county-level data collections still used the old name and FIPS
county code.
In July 2001, NIST announced that the independent city (county-equivalent)
of Clifton Forge, Virginia, had reverted to town status, effective midnight,
July 1, 2001. Clifton Forge is now an incorporated place within Alleghany
County, rather than a separate county-equivalent surrounded by Alleghany County.
This action will reduce the number of Virginia independent cities to 39 and
the number of United States counties and equivalent areas to 3,141. The action
reduces the total number of independent cities in the United States to 42.
The FIPS county code of 560 for Clifton Forge, Virginia, is deleted. The
FIPS-55 class code will change from C7 to C1. The census place code of 0285
and FIPS-55 code of 17440 are unaffected. The FIPS county code of 005 for
Alleghany County remains unchanged. ICPSR did not adjust this data collection
to reflect this change so that the data files would remain consistent with
earlier years. Users who wish to use the current FIPS codes can combine the
data from Clifton Forge with Alleghany County.
In January 2002, NIST announced that Broomfield County, Colorado,
had been created from parts of Adams (001), Boulder (013), Jefferson (059),
and Weld (123) counties effective November 15, 2001. The boundaries of
Broomfield County reflect the boundaries of Broomfield city legally in effect
on November 15, 2001. To maintain the alphanumeric sequence of counties,
Broomfield County will have a code of 014 for FIPS 6-4, Counties and
Equivalent Entities of the United States, Its Possessions and Associated Areas.
ICPSR created county code 014 for Broomfield County to match the change
in the FIPS publication.
In the UCR county-level arrest files, the population and data
for jurisdictions located in multiple counties are provided only
in the county containing the largest population component of the
jurisdiction. Counties containing smaller population components of
multiple-county jurisdictions will contain no population or arrest
data for these jurisdictions. Data in counties affected by one or
more multiple-county jurisdictions are indicated by a multi-county
jurisdiction flag variable. In the county-level crimes reported
files, the population and crime data for jurisdictions located in
multiple counties are provided by the UCR proportioned to each
county (maximum of three) in which the jurisdiction is located.
Drunkenness (FBI offense code 23) is not considered a crime
in some states. States that do not consider drunkenness a crime,
but which report data through the NIBRS system, such as North
Dakota, may show arrest data for the offense in this data collection.
This is because drunkenness may be listed as an incident in the NIBRS
reporting system, even though there was technically no arrest in
these states. Agencies in states where drunkenness is not a crime
may use their own discretion in reporting drunkenness as an incident.
Users should exercise caution when analyzing this variable because
of these differences in reporting.
The original data from the FBI contain one record for New York
City. Data from New York City are allocated into New York City's five
counties on the basis of the proportion of the population in each
county. For example, the population for Queens county is divided by the
total population of New York City and the resulting proportion is
multiplied with data from each of New York City's arrest and offense
categories to apportion data to Queens county. The total population
for New York City and the population for each of the five counties from
the 2010 U.S. Census were used to calculate these proportions. The
population of Goodlettsville City, Tennessee is proportioned to both
Davidson County and Sumner County. Population from the 2010 U.S. Census
was used to calculate the population breakdown of Goodlettsville,
Tennessee.
Seven dummy records were created for the arrest data files and 71
dummy records were created for the crimes reported data files for
counties not represented in the original FBI file. These county
records provide only the state and county FIPS codes with the rest of
the variables following county FIPS in the data file filled with
zeros. No arrest data were provided for Florida and Illinois.
Limited crimes reported data were available for Alaska, Mississippi,
and South Dakota.
One SPSS data definition statement file and one SAS data
definition statement file are provided for the one arrest data file
(all ages) and one SPSS data definition statement file and one SAS
data definition statement file are provided for the crimes reported
data file (crimes reported). One SPSS data definition statement file
and one SAS data definition statement file are provided for the one
allocated statewide data for arrests files and one SPSS data
definition statement file and one SAS data definition statement file
are provided for the allocated statewide data for crimes reported
file. The arrest data file has 56 variables, the crimes reported
data file has 21 variables, the allocated statewide data for arrests
file has 57 variables and the allocated statewide data for crimes
reported file has 22 variables. All data files contain 3,177 cases.