**STATISTICAL ANALYSIS OF ECOLOGICAL DATA**

**I. Objectives:**

- Discuss why scientists employ statistics to understand ecological problems.
- Calculate the following descriptive statistics: mean, variance, standard deviation.
- Describe the concept of normal distribution.
- Understand and apply the t-test to differences in populations.
- Understand and apply the chi-square test to frequency or count data.

**II. Introduction:**

Ecologists are often concerned with numbers of organisms (density) and their patterns of distribution in nature. This makes ecology a quantitative science. However, ecologists cannot count and determine the location of every organism in a given area. Rather, ecologists must collect and analyze data from samples taken within the population. The quantitative data collected by ecologists can be (actually, must be) analyzed using statistics.

** A. Terminology:**Before we begin our work, some definitions are needed:

E = summation

x = single observation or data point

n = sample size (number of data points)

s

s = standard deviation

SS = sum of squares

df = degrees of freedom, often = n-1

** B. Descriptive Statistics**:

Descriptive statistics summarize some aspect of the population. The most commonly used are the mean, median, mode, variance and standard deviation. For ecological studies, the mean, variance and standard deviation are most often used.

**B1. Mean:**

The mean is a measure of the central tendency (average) for a population.

Mean = Ex/n

Example 1: In population #1, the following numbers of trees are counted in 5 quadrats: 1,
6, 11, 16, 21

Mean = 55/5 = 11

Example 2: In population #2, the following numbers of trees are counted in 5 quadrats: 10,
11, 11, 11, 12

Mean = 55/5 = 11

**B2. Variance:
**As shown above, two populations with the same mean may have quite different
variation in numbers. In the example given above, population #2 has a narrow range of
abundances per quadrat (10-12) while population #1 has a relatively wide range of numbers
per quadrat (1-21). The variance is a measure of this range of possible results.

S

SS can be calculated as E(x-mean)

SS = Ex

Example 1: For population #1 described earlier:

SS=855-[3025/5] = 855-605 = 250

S

Example 2: For population #2 described earlier:

SS= 607-605=2

S

**B3. Standard Deviation:
**The standard deviation is determined as the square root of the variance. For
population #1, the standard deviation would be 7.9. For population #2, the standard
deviation would be 0.7. The standard deviation is important because it provides an easily
visualized measure of the variation from the mean for normally distributed data.

What is normally distributed? A normal distribution is a typical bell curve, with the
peak of the curve corresponding to the mean. However, a bell curve can be narrow and tall
or broad and short, depending on whether the data has a low or high variance. The standard
deviation provides an easily understood estimate of this variability. For normally
distributed data, 95% of all possible observations (such as counts in quadrats) will lie
within 2 standard deviations of the mean. This is often known as the 95% confidence
limits. For example, population #2 has a standard deviation of 0.7. Two standard
deviations would thus be 1.4. Therefore, the 95% confidence limits for this population are
11 (the mean) __+__ 1.4 (or 9.6-12.4). This means that if we took further quadrat
samples from this population, on average 95% of these additional quadrats would have
densities between 9.6-12.4 trees per quadrat.

*C. Comparative Statistics*

**C.1 What are comparative statistics?
**Statistics can also be used to determine whether populations (or measurements of
population characteristics) are similar or different. For example:

Is the density of pine trees in two areas similar or different?

Is the number of crabs in the Cape Fear estuary more now than a decade ago?

This use of statistics is called significance testing. Using the scientific method, even using statistics, the scientist cannot prove anything. Statistics can only demonstrate that an event is very unlikely, but nothing is ever proved in the process. Typically, the investigator establishes a hypothesis and then tries to determine if that hypothesis is likely by showing the alternatives (called null hypotheses) are not likely. For example, one may have an hypothesis that densities of pine trees are different between 2 forests. However, because statistics do not prove differences, the investigator actually seeks to prove that the null hypothesis of no difference between the forests is unlikely to be true (confusing isn’t it!).

**C2. Example using the t-test.**

Let’s run through an example:

__Step 1__: A researcher develops the following hypothesis:

H_{a} = There is a difference in the density of pine trees between a recently
burned forest and a forest that has not been burned for 25 years.

__Step 2__: A null hypothesis is formed:

H_{o} (the null hypothesis) = There is no difference in densities between the two
forests.

__Step 3__: Data collection:

Next, the researcher collects data. In this case, quadrat sampling is appropriate and the
following counts of pine trees per 100 m^{2} quadrats are recorded:

Unburned forest (4 quadrats): 5, 2, 3, 8 Burned forest (5 quadrats): 15, 25, 20, 11, 15

__Step 4__: Analysis of data:

Now the densities (no. per quadrat) of pines can be compared statistically to determine if
there is a difference. Since this data represents replicate measures from 2 groups, an
appropriate test for comparing the groups is the t-test.

Calculation by Hand:

The t-test has the following formula: t = __|mean _{1} – mean_{2}|__

S

Where S_{x1-x2 }= sqrt[(s_{p}^{2}/n_{1}) + (s_{p}^{2}/n_{2})]
and s_{p}^{2} = (SS_{1} + SS_{2})/(df_{1} +
df_{2})

The following numbers are calculated to determine the t-statistic for the two populations (b=burned, u=unburned)

mean_{u} = 18/4 = 4.5 mean_{b} = 86/5 = 17.2

n_{u} = 4 n_{b} = 5

SS_{u} = 102-(324/4) = 21 SS_{b} = 1596-(7396/5) = 116.8

S_{p}^{2} = (21+116.8)/(3+4) = 19.7

S_{x1-x2 }= sqrt[(19.7/4) + (19.7/5)] = sqrt(8.97) = 2.99

t = __|4.5 – 17.2|__ = 4.26

2.99

The degrees of freedom (df) for this test are (n_{u}-1) + (n_{b}-1) =
(4-1) + (5-1) = 7

This t-value can be looked up in a t-table. If the calculated values is greater than the value under the df. row for 0.05 probability level, then you reject the null hypothesis and conclude there is a significant difference between the burned and unburned forests. In this case, the table value for 7 df and 0.05 significance level is 1.895. Therefore, we can conclude that pine tree density is greater in the burned forest.

Calculation using JMP IN:

In this course, we can also calculate a t-test using a standard, commercial statistical package, JMP IN. To do this, do the following steps:

- Double click on the JMP icon
- A table will appear. If there is only 1 column (labeled column 1), you will need to create a second by double clicking in the right side space (to the right of "column 1").
- Click on the column 1 square and then click on the name. Type in "forest type". Do the same for column 2, typing in "density". For forest type, change the default data type (the letter in the small box above the name) from "c" to "n". Leave density as "c".
- Enter the data in the following format (u=unburned forest, b=burned forest):

Forest type | Density | |

1 | U | 5 |

2 | U | 2 |

3 | U | 3 |

4 | U | 8 |

5 | B | 15 |

6 | B | 25 |

7 | B | 20 |

8 | B | 11 |

9 | B | 15 |

- From the menu bar, choose
**Analyze – Fit Y by X** - Choose forest type as X and density as Y
- Do the group means/one-way ANOVA comparison (which will be the default comparison for you data).
- You will get a graph of the data. Choose
**means, ANOVA/t-test**. - The results of the t-test will be displayed along with the results of several other tests. Please note that the calculated t-value is the same as we calculated by hand, and a p-value of less than 0.05 is shown, indicating a significant difference.

**C4. Chi Square (X ^{2}) test**

Another test that is useful for comparing totals, counts or frequencies is the X^{2}
test. Using the X^{2} test, scientists can determine if observed values are the
same as values expected for a given situation. For example, you survey the number of crabs
under "large" rocks and "small" rocks in a swift current to determine
if there is a difference in the number of crabs under each rock type.

The total number of crabs under 20 rocks was:

__
Large rocks Small rocks
__Observed 200
10

Expected 105 105

The expected number is established by determining the number of crabs expected if the
null hypothesis were true. In this case there is a hypothesis (H_{a}) of a
difference between the rocks and a null hypothesis (H_{o}) of no difference. So,
if there are a total of 210 crabs collected, with no difference in the number found under
each rock type, there must be an expected number of 105 for both large and small rocks
(105+105 = 210).

The X^{2} statistic is then calculated by:

X^{2} = E [(observed – expected)^{2}/expected]

In this example, X^{2} = (200-105)^{2}/105 + (10-105)2/105 = 171.9

For this case, the degrees of freedom (df) for the test is determined by the number of groups minus 1 (2-1=1). For 1 degree of freedom at a 0.05 significance level, the critical table value is 3.84. Since your calculated value is greater that the table value, the null hypothesis is rejected and you conclude there is a difference in the number of crabs under large rocks versus small rocks.

Return to Ecology Lab Syllabus