Introduction to Multivariate Statistics

Multivariate analyses focus on the 1) ability to perform statistical control, and 2) the ability to look at how several independent variables together influence a dependent variable.

Statistical Control

Statistically, we can hold the variation on a third variable (or in the case of regression, a 4th, 5th, 6th or any number of additional variables) constant, so as not to interfere with relationship between the IV and the DV.

For example:  Say you want to know whether gender influences fear of crime.  You find that more women fear walking alone at night than men.  Are there any other variables you would want to "control" for in your analysis?  Say, how tall the person is? Or, whether they usually carry a gun?

In including these control variables, we rule out alternative explanations for the relationship, or lack thereof, between an  IV and a DV.  So, if we control for how tall people are, and we find that more women still fear walking alone at night than men, then we have ruled out this alternative explanation that it is not gender, but how big or small someone is that leads to fear.

Alternatively, if we control for whether or not people carry a gun, and we find that equal percentages of women and men fear walking alone at night, then we have found an alternative explanation for our relationship between gender and fear of crime -- it is not gender, perhaps, but your perceived ability to protect yourself.

Statistical control approximates the control achieved in experimental design (which is done via random assignment or matching).  However, to use statistical controls, we must collect data on all of the "control" variables.  For example, with a survey, you must anticipate apriori, when developing the survey itself, all of the alternative explanations for all of the relationships you plan to examine. (This is often a problem with secondary data analysis.)

Conceptually, when we use statistical control, we are examining the relationship between an IV and a DV at each value of the control variable.  For example, from our previous example:

DV = whether or not you fear walking alone at night
IV = gender
CV =
whether or not you carry a gun

For this analysis, we divide the sample into 2 groups - those that usually carry a gun, and those that don't.  Then we look at the relationship between gender and fear in both samples.  If we find that there is a relationship between gender and fear in the sample of people who do not carry a gun, but there is no relationship between gender and fear in the sample of people who do carry a gun, then gun ownership is an alternative explanation for the relationship between gender and fear.


Multiple Independent Variables

Multivariate analysis, with its ability to add control variables, is very useful in the social sciences.  When you are studying human beings or social organizations such as companies or counties, you need to be able to specify more than one independent variable because people and social organizations are complex.   One variable alone is not going to explain enough of the variation in the DV to say you understand the phenomenon.

For example, if you want to understand what causes crime rates to increase, would you want to be limited to only specifying one independent variable, say the number of police officers? Crime is complex  -- why people commit crime, when, how, what deters it, etc...   Much more than this one independent variable is needed to accurately explain why it increases or decreases over time.

As another example, if you want to understand student's grades -- what accounts for some students getting an A in a course, while others get B's, C's, D's and F's.  What would you want to include in your model?


Choosing Which Multivariate Analysis To Do

Examples:

DV = whether people oppose or favor requiring gun owners to have a permit (gunlaw)
IV = race
CV = educational degree (degree)

analysis  = elaboration (all variables are categorical)

 

DV = years of education (educ)
IV = father's degree (padeg)
CV = father's SEI (pasei)

analysis = ANCOVA (multivariate mean's test -- DV is continuous, IV is categorical, CV is continuous)

 

DV = number of children (childs)
IV = years of education (educ)
CV = age

analysis = partial correlation (all variables continuous and only one IV and one CV)

 

DV = TV hours
IV1 = ?
IV2 = ?
IV3 = ?
IV4 = ?

analysis = regression (all variables continuous and multiple IVs)