Examples of Secondary Analysis


Studies of voting patterns of women after the right to vote



Types of Available Data

Population, family, housing, social security and welfare, health and nutrition, crime and deviance, education and training, work, income and wealth, culture and leisure, social mobility and participation



Finding Datasets


Reference librarian

ICPSR (searchable dataset of over 17,000 datasets)

Popular Secondary Datasets



Evaluating the Appropriateness of Secondary Data


1. Does the data shed light on your research question?

2. Are the concepts operationalized appropriately for your research question? 

Example: pornography study

Is there a possibility of creating an index?  Example: Pornography study

3. Does the data have the same unit of analysis as your research question?

4. Does the sample represent your population?

5. Can you specify an accurate model?  Are the independent variables you need in the data?



Validity and Reliability in Secondary Data


Both have to do with operationalization of your key concepts


Validity: Does the variable in the dataset measure your concept?


Validity Examples:


Work Injury.  You define as minor cuts, bruises, sprains.  Dataset from U.S. Government defines as an injury that required physician or hospital visit.


Income:  You define as individual income.  Dataset defines it as household income.


Unemployment:  You define as anyone not working but who would work.  U.S. Government data defines unemployment as only people who are currently actively looking for work (leaves out discouraged workers and underemployed and some self-employed).  Economic vs. Social Policy goals.


Crime data:  U.S. government data includes reported crimes, not whether crime happened (victimization). 


Premarital Pregnancy:  research question is the number of marriages that were caused by pregnancy.  Data used is birth certificates that include date of birth of baby and date of marriage of parents.  Problems:






Reliability: Is there a chance in the conceptualization and operationalization of key phenomenon over time?


Reliability Examples:


·       Unemployment.  Used to be measured  as # of unemployed / # of people in civilian workforce (omitted military employees)  


·       GDP. Went from Gross National Product to Gross Domestic Product  (how treats income earned by foreigners here and Americans outside of the country – now counted)


·       Crime rates increase when police departments improve computer programs/systems


·       Units in national and international datasets are not equally reliable:


·       National: police departments, hospitals

·       International: unemployment, poverty (and many other concepts) measured differently across countries


·       Variation in interviewer reliability


·       Missing Data: Can be a huge problem in secondary data analysis. Since data is already collected there is nothing you can do to prevent it. Can use statistical procedures to adjust or correct data (imputation)





Ethical Issues in Secondary Data Analysis


Fewer concerns since you didn’t collect the data and have no control over collection processes.


Need to maintain confidentiality


Researcher integrity in conducting statistical analyses