Multiple Regression

Multiple regression (also called ordinary least squares or OLS) is a statistical analysis that allows you to examine the linear causal effect of many independent variables on a dependent variable.  

The DV and all the IVs must be continuous and normally distributed, although OLS is robust to violations of this assumption in the IVs and the IVs can be transformed to satisfy the assumption (dummy variables and logs).  We are not covering these transformations in this course. 

If your DV is not continuous you can use another type of regression such as nominal or ordinal regression.  We are not covering these analyses in this course.

If the relationship between an IV and DV is not linear you can perform additional transformations (interactions) to enable OLS to be used.  However, we are not covering these transformations in this course.   


Statistics Calculated in Regression

Regression analysis calculates a coefficient called beta (b) for each independent variable. This coefficient estimates the exact change in the DV when the IV increases one unit, while holding all of the other IVs constant (i.e., controlling for all of the other IVs). The one unit increase in the IV should be interpreted in terms of how that variable is measured.  When b = 0 there is no causal, linear effect of the IV on the DV.  These beta (b) coefficients are sometimes referred to as unstandardized coefficients to separate them from standardized coefficients (B), see next paragraph. 

Regression also calculates a standardized beta coefficient (usually denoted with a B) which enables you to interpret the change in the DV and IV in terms of standard deviations, which in turn enables you to compare the relative effect of each IV on the DV.  Use the standardized coefficients in your interpretation to identify which IV's have the largest impact on the DV.

Like with correlations, regression also calculates an r2.  Since there are multiple variables in multiple regression we refer to this r2 as R2.  This statistic tells you how much variation in the DV the IVs collectively explain.  It is a measure of how good your model is.  


Hypothesis Tests in Regression

In multiple regression, there is an individual test for each independent variable or b coefficient, and a test for the whole set of independent variables collectively or R2


Unstandardized Beta Coefficients

Each beta coefficient is tested for significance with a t-test.  This test determines whether each IV has an effect on the DV, controlling for all other IVs.  The hypotheses are:

Ho: The IV has no influence on the DV, controlling for all of the other IVs.  b = 0

H1: The IV influences the DV, controlling for all of the other IVs.  b

*Fill in what IV, DV, and CV are in the above hypotheses.

You can also test whether b is > or < 0.  That would make a one-tailed test rather than a two-tailed test, and make it easier to reject the null hypothesis.  Usually we simply perform two tailed tests on each beta coefficient (b 0) because in the social sciences we often can not predict, either based on theory or from previous data, what the multivariate relationship will be for each IV.  SPSS assumes a two-tailed test on each beta coefficient.  But you could easily perform a one-tailed test by dividing the reported p in half. 

If you reject and conclude that b is significant, you then interpret it as the change in the DV when the IV increases one unit.  If you accept and conclude that b is not significant then you interpret that b as if it was 0 -- the IV has no effect on the DV. 

**Remember you do a t-test on each b -- i.e., on each IV.


Standardized Beta Coefficients

We don't do separate hypothesis tests on the standardized coefficients. The hypothesis test on the unstandardized coefficient on each IV suffices.  Use the standardized coefficients in your interpretation only, to compare which IV has the biggest effect on the DV.

 

Explained Variation

The R2 is tested for significance with a f-test. This test determines whether the IVs collectively have a statistically significant impact on the DV.  The hypotheses are:

Ho: The IVs collectively do not have a significant impact on the DV.  R2 = 0

H1: The IVs collectively have a significant impact on the DV.  R2 0

If you reject and conclude that R2 is significant then you interpret it as the percentage of variation in the DV that the IVs explain.    If you accept and conclude that R2 is not statistically significant  you should interpret it as R2 = 0  and state that the IVs do not explain a statistically significant amount of variation in the DV. 

 


Example 1:  Does the amount of TV that people watch, their education, and their age influence how much money they make?

Dependent Variable = income; measured in 12 categories, which we will treat as continuous

Independent Variable 1 = TV hours; measured in daily hours watched

Independent Variable 2 = education; measured in years

Independent Variable 3 = age in years

All variables are continuous, multiple IV's.  Analysis = regression.


Regression Analysis Results

  Beta (unstd) probability (p) Standardized Beta
TV hours    -.10  .000 -.11
Education .21 .000 .26
Age  .003 .38 .02
R2  .09 .000  


Hypotheses:

TV Hours

Null: The number of TV hours watched has no influence on income, controlling for education and age. 
b = 0

Research: The number of TV hours watched influences income, controlling for education and age. 
b 0

b = -.10, p = .000 

p is less than alpha. Reject null.   Controlling for education and age, TV hours has a negative effect on income.  As TV hours increases one hour per day, income decreases by -.10 units.

 

Education

Null: The number of years of education has no influence on income, controlling for the number of TV hours watched and age.  b = 0

Research: The number of years of education influences income, controlling for the number of TV hours watched and age.   b 0

b = .21, p = .000

p is less than alpha.  Reject null.  Controlling for TV hours and age, education has a positive effect on income.  As education increases one year, income increases by .21 units.

Age

Null: Age has no influence on income, controlling for the number of TV hours watched and years of education.  b = 0

Research: Age influences income, controlling for the number of TV hours watched and years of education.   b 0

b = .003, p =.38

p is greater than alpha.  Accept null.  Controlling for TV hours and education, age has no effect on income.


Standardized Betas

Of the variables in the model, education has the largest impact on income (.26), followed by TV hours (-.11). 

 

R2

Null: TV hours, education and age have no collective impact on income.  R2 = 0

Research: TV hours, education and age have a collective impact on income.   R2 0

p is less than alpha.  Reject null.  TV hours, education and age explain 9% of the variation in income.

 


Example 2:  Does education, number of siblings, number of hours worked, and age influence the number of children that people have?

DV = Number of children (continuous)

IV1 = Years of education (continuous)
IV2 = Number of siblings  (continuous)
IV3 = Number of hours worked last week (continuous)
IV4 = Age in years (continuous)

All variables are continuous, multiple IV's. Analysis = regression

Regression Results 

  Beta (unstd) probability (p) Standardized Beta
Education .04 .67 .06
Siblings .18 .02 .33
Hours Worked .03 .15 .19
Age  .04 .03 .29
R2  .18 .045  


Hypotheses

Education

Ho: Education does not influence the number of children that people have, holding the number of siblings, hours worked and age constant. b = 0

H1: Education does influence the number of children that people have, holding the number of siblings, hours worked and age constant.  b 0

b = .04, p=.67

p is greater than alpha.  Accept Null  

Education does not influence the number of children that people have, holding the number of siblings, hours worked and age constant. 

Siblings

Ho: The number of siblings that people have does not influence the number of children that people have, holding education, hours worked and age constant. b = 0

H1: The number of siblings that people have does influence the number of children that people have, holding education, hours worked and age constant.   b 0

b = .18, p = .02

p is less than alpha.  Reject null.

The number of siblings that people have does influence the number of children that people have, holding education, hours worked and age constant.  As the number of siblings that people have increases one sibling, the number of children that people have increases by .18 children, holding education, hours worked and age constant. 

Hours Worked

Ho: The number of hours that people work does not influence the number of children that people have, holding education, number of siblings and age constant. b = 0

H1: The number of hours that people work does influence the number of children that people have, holding education, number of siblings and age constant. b 0

b=.03, p=.15

p is greater than alpha.  Accept null.

The number of hours that people work does not influence the number of children that people have, holding education, number of siblings and age constant. 

Age

Ho: Age does not influence the number of children that people have, holding education, number of siblings and the number of hours that people work constant. b = 0

H1: Age does influence the number of children that people have, holding education, number of siblings and the number of hours that people work constant. b 0

b=.04, p=.03

p is less than alpha.  Reject null.

Age does influence the number of children that people have, holding education, number of siblings and the number of hours that people work constant.  As age increases one year, the number of children that people have increases by .04 children, holding education, number of siblings and the number of hours that people work constant. 

Standardized Betas

Of the variables in the model, the number of siblings has the largest impact on the number of children that people subsequently have (.33), followed by age (.29). 

R2

Null: Education, the number of siblings, the number of hours worked last week, and age have no collective impact on the number of children that people have.  R2 = 0

Research: Education, the number of siblings, the number of hours worked last week, and age  collectively have a significant impact on the number of children that people have.   R2 0

R2 = .18, p=.045

Reject null (it is right on the line though). 

Education, the number of siblings, the number of hours worked last week, and age explain 18% of the variation in the number of children that people have.  


Example 3: Take Home Exercise

Is socio-economic-status (measured on a scale of 0-100) influenced by person's age (in years), the number of hours they usually work per week (measured in hours), their mother's SES (measured on a scale of 0-100), and their father's SES (measured on a scale of 0-100)?

Statistical Results 

  Beta (unstd) probability (p) Standardized Beta
MASEI .24 .17 .27
PASEI .42 .14 .27
Age -.19 .55 -.11
Hours Worked .84 .006 .57
R2  .43 .02  
Adj. R2 .31    

Analysis = Regression, more than 2 IV's, all variables continuous

Hypotheses

Mother's SEI

Ho: Mothers' SEI does not influence their adult children's SEI when controlling for the number of hours the adult child usually works per week, adult child's age, and their father's SEI.  b=0

H1: Mothers' SEI does influence their adult children's SEI when controlling for the number of hours the adult child usually works per week, adult child's age, and their father's SEI.  b 0

p is greater than alpha. accept null.  Mothers' SEI does not influence their adult children's SEI when controlling for the number of hours the adult child usually works per week, adult child's age, and their father's SEI.

Father's SEI

Ho: Fathers' SEI does not influence their adult children's SEI when controlling for the number of hours the adult child usually works per week, adult child's age, and their mother's SEI.  b=0

H1: Fathers' SEI does influence their adult children's SEI when controlling for the number of hours the adult child usually works per week, adult child's age, and their mother's SEI.  b 0

p is greater than alpha. accept null.  Fathers' SEI does not influence their adult children's SEI when controlling for the number of hours the adult child usually works per week, adult child's age, and their mother's SEI.

Age

Ho:  Age does not influence socio-economic-status when controlling for the number of hours they usually work per week, their mother's SES, and their father's SEI.  b=0

H1: Age does influence socio-economic-status when controlling for the number of hours they usually work per week, their mother's SES, and their father's SEI.  b 0

p is greater than alpha.  Accept null.   Age does not influence socio-economic-status when controlling for the number of hours they usually work per week, their mother's SEI, and their father's SEI.

Hours Worked

Ho:  Hours worked does not influence socio-economic-status when controlling for age, their mother's SEI, and their father's SEI.  b=0

H1: Hours worked does influence socio-economic-status when controlling for age, their mother's SEI, and their father's SEI.  b 0

p is less than alpha (p =.006). Reject null. 

For every additional hour that people work per week, their SEI increases by .84 (on a scale of 0-100), when controlling for age, their mother's SEI, and their father's SEI.

Standardized Betas

Only hours worked is statistically significant, so that variable has the biggest impact in the model.

R2

Ho:  Age, the number of hours that people usually work per week, mother's SEI, and father's SEI combined do not have a statistically significant impact on adult children's SEI. R2 = 0

H1: Age, the number of hours that people usually work per week, mother's SEI, and father's SEI combined have a statistically significant impact on adult children's SEI.  R2 0

p is less than alpha (p=.02).  Reject null.  Age, the number of hours that people usually work per week, mother's SEI, and father's SEI combined have a statistically significant impact on adult children's SEI.  Combined, these variables explain 31% (adjusted R2) of the variation in adult children's SEI.  However, most of this effect is due to the number of hours worked that people work. 


Example 4:  Take Home Exercise, See Handout

What influences how much money states spend on prisons, jails, and other forms of corrections? Does population size, rate of homeownership, number of hospitals, average income, welfare spending, the % of the population graduating from college, number of public libraries, tobacco sales, or the percent of people who vote in each state influence how much money states spend on corrections?

This data was collected on each state in the US.  n=50

See handout for statistical results.

Analysis = Regression, more than 2 IV's, all variables continuous

Hypotheses

Population Size

Ho:  Population size does not influence how much states spend on corrections, when controlling for all the other variables in the model. b = 0

H1: Population size does influence how much states spend on corrections, when controlling for all the other variables in the model. b ≠ 0

b = .000, p = .20, accept null, p is greater than alpha


Rate of homeownership

Ho:  Rate of homeownership does not influence how much states spend on corrections, when controlling for all the other variables in the model. b = 0

H1: Rate of homeownership does influence how much states spend on corrections, when controlling for all the other variables in the model. b ≠ 0

b = -3.44, p = .005, reject null, p is less than alpha. Rate of homeownership does influence how much states spend on corrections, when controlling for all the other variables in the model.

For every 1% increase in the rate of homeownership, states spend $3.44 less per person in the state on corrections, when controlling for population size, number of hospitals, average income, welfare spending, the % of the population graduating from college, number of public libraries, tobacco sales, and the percent of people who vote in the state.


Number of hospitals

Ho:  Number of hospitals does not influence how much states spend on corrections, when controlling for all the other variables in the model. b = 0

H1: Number of hospitals does influence how much states spend on corrections, when controlling for all the other variables in the model. b ≠ 0

 p is greater than alpha. accept null. Number of hospitals does not influence how much states spend on corrections, when controlling for all the other variables in the model
 

Average income

Ho:  Average income does not influence how much states spend on corrections, when controlling for all the other variables in the model. b = 0

H1: Average income does influence how much states spend on corrections, when controlling for all the other variables in the model. b ≠ 0

p is less than alpha (p = .000). reject null.  Average income does influence how much states spend on corrections, when controlling for all the other variables in the model.  For every dollar increase in income, states spend .01 dollars more per person in the state on corrections, when controlling for population size, number of hospitals, homeownership rate, welfare spending, the % of the population graduating from college, number of public libraries, tobacco sales, and the percent of people who vote in the state. 


Welfare spending, the % of the population graduating from college, number of public libraries, tobacco sales, or the percent of people who vote in

Ho:  These variables do not individually influence how much states spend on corrections, when controlling for all the other variables in the model. b1 = 0, b2 = 0, b3 = 0, b4=0

H1: These variables do individually influence how much states spend on corrections, when controlling for all the other variables in the model. b1 ≠  0, b2 ≠  0, b3 ≠  0, b4≠ 0

p is greater than alpha on each coefficient.  accept null for each. Welfare spending, the % of the population graduating from college, number of public libraries, tobacco sales, and the percent of people who vote do not individually influence how much states spend on corrections, when controlling for all the other variables in the model.

 

Standardized Betas

Average annual income has the largest effect on how much states spend on corrections, followed by the homeownership rate.

R2

Ho: Population size, rate of homeownership, number of hospitals, average income, welfare spending, the % of the population graduating from college, number of public libraries, tobacco sales, and the percent of people who vote in each state combined do not have a significant impact on how much money states spend on corrections.  R2 = 0

H1:  Population size, rate of homeownership, number of hospitals, average income, welfare spending, the % of the population graduating from college, number of public libraries, tobacco sales, and the percent of people who vote in each state combined have a significant impact on how much money states spend on corrections.  R2 ≠ 0

p is less than alpha (p = .000).  Reject null.  Population size, rate of homeownership, number of hospitals, average income, welfare spending, the % of the population graduating from college, number of public libraries, tobacco sales, and the percent of people who vote in each state combined have a significant impact on how much money states spend on corrections.

Combined, these variables explain 71% of the variation in how much states spend on corrections.