STATISTICAL ANALYSIS OF ECOLOGICAL DATA
I. Objectives:
II. Introduction:
Ecologists are often concerned with numbers of organisms (density) and their patterns of distribution in nature. This makes ecology a quantitative science. However, ecologists cannot count and determine the location of every organism in a given area. Rather, ecologists must collect and analyze data from samples taken within the population. The quantitative data collected by ecologists can be (actually, must be) analyzed using statistics.
A. Terminology:
Before we begin our work, some definitions are needed:
E = summation
x = single observation or data point
n = sample size (number of data points)
s2 = variance
s = standard deviation
SS = sum of squares
df = degrees of freedom, often = n-1
B. Descriptive Statistics:
Descriptive statistics summarize some aspect of the population. The most commonly used are
the mean, median, mode, variance and standard deviation. For ecological studies, the mean,
variance and standard deviation are most often used.
B1. Mean:
The mean is a measure of the central tendency (average) for a population.
Mean = Ex/n
Example 1: In population #1, the following numbers of trees are counted in 5 quadrats: 1,
6, 11, 16, 21
Mean = 55/5 = 11
Example 2: In population #2, the following numbers of trees are counted in 5 quadrats: 10,
11, 11, 11, 12
Mean = 55/5 = 11
B2. Variance:
As shown above, two populations with the same mean may have quite different
variation in numbers. In the example given above, population #2 has a narrow range of
abundances per quadrat (10-12) while population #1 has a relatively wide range of numbers
per quadrat (1-21). The variance is a measure of this range of possible results.
S2 = SS/df
SS can be calculated as E(x-mean)2, but this
is a cumbersome equation to use when there are large numbers of data. A simpler way of
calculating SS on most calculators is:
SS = Ex2 [(Ex)2/n]
Example 1: For population #1 described earlier:
SS=855-[3025/5] = 855-605 = 250
S2=250/(5-1)=62.5
Example 2: For population #2 described earlier:
SS= 607-605=2
S2=2/(5-1)=0.5
B3. Standard Deviation:
The standard deviation is determined as the square root of the variance. For
population #1, the standard deviation would be 7.9. For population #2, the standard
deviation would be 0.7. The standard deviation is important because it provides an easily
visualized measure of the variation from the mean for normally distributed data.
What is normally distributed? A normal distribution is a typical bell curve, with the peak of the curve corresponding to the mean. However, a bell curve can be narrow and tall or broad and short, depending on whether the data has a low or high variance. The standard deviation provides an easily understood estimate of this variability. For normally distributed data, 95% of all possible observations (such as counts in quadrats) will lie within 2 standard deviations of the mean. This is often known as the 95% confidence limits. For example, population #2 has a standard deviation of 0.7. Two standard deviations would thus be 1.4. Therefore, the 95% confidence limits for this population are 11 (the mean) + 1.4 (or 9.6-12.4). This means that if we took further quadrat samples from this population, on average 95% of these additional quadrats would have densities between 9.6-12.4 trees per quadrat.
C. Comparative Statistics
C.1 What are comparative statistics?
Statistics can also be used to determine whether populations (or measurements of
population characteristics) are similar or different. For example:
Is the density of pine trees in two areas similar or different?
Is the number of crabs in the Cape Fear estuary more now than a decade
ago?
This use of statistics is called significance testing. Using the scientific method, even using statistics, the scientist cannot prove anything. Statistics can only demonstrate that an event is very unlikely, but nothing is ever proved in the process. Typically, the investigator establishes a hypothesis and then tries to determine if that hypothesis is likely by showing the alternatives (called null hypotheses) are not likely. For example, one may have an hypothesis that densities of pine trees are different between 2 forests. However, because statistics do not prove differences, the investigator actually seeks to prove that the null hypothesis of no difference between the forests is unlikely to be true (confusing isnt it!).
C2. Example using the t-test.
Lets run through an example:
Step 1: A researcher develops the following hypothesis:
Ha = There is a difference in the density of pine trees between a recently
burned forest and a forest that has not been burned for 25 years.
Step 2: A null hypothesis is formed:
Ho (the null hypothesis) = There is no difference in densities between the two
forests.
Step 3: Data collection:
Next, the researcher collects data. In this case, quadrat sampling is appropriate and the
following counts of pine trees per 100 m2 quadrats are recorded:
Unburned forest (4 quadrats): 5, 2, 3, 8 Burned forest (5 quadrats): 15, 25, 20, 11, 15
Step 4: Analysis of data:
Now the densities (no. per quadrat) of pines can be compared statistically to determine if
there is a difference. Since this data represents replicate measures from 2 groups, an
appropriate test for comparing the groups is the t-test.
Calculation by Hand:
The t-test has the following formula: t = |mean1 mean2|
Sx1-x2
Where Sx1-x2 = sqrt[(sp2/n1) + (sp2/n2)] and sp2 = (SS1 + SS2)/(df1 + df2)
The following numbers are calculated to determine the t-statistic for the two populations (b=burned, u=unburned)
meanu = 18/4 = 4.5 meanb = 86/5 = 17.2
nu = 4 nb = 5
SSu = 102-(324/4) = 21 SSb = 1596-(7396/5) = 116.8
Sp2 = (21+116.8)/(3+4) = 19.7
Sx1-x2 = sqrt[(19.7/4) + (19.7/5)] = sqrt(8.97) = 2.99
t = |4.5 17.2| = 4.26
2.99
The degrees of freedom (df) for this test are (nu-1) + (nb-1) = (4-1) + (5-1) = 7
This t-value can be looked up in a t-table. If the calculated values is greater than the value under the df. row for 0.05 probability level, then you reject the null hypothesis and conclude there is a significant difference between the burned and unburned forests. In this case, the table value for 7 df and 0.05 significance level is 1.895. Therefore, we can conclude that pine tree density is greater in the burned forest.
Calculation using JMP IN:
In this course, we can also calculate a t-test using a standard, commercial statistical package, JMP IN. To do this, do the following steps:
Forest type | Density | |
1 | U | 5 |
2 | U | 2 |
3 | U | 3 |
4 | U | 8 |
5 | B | 15 |
6 | B | 25 |
7 | B | 20 |
8 | B | 11 |
9 | B | 15 |
C4. Chi Square (X2) test
Another test that is useful for comparing totals, counts or frequencies is the X2 test. Using the X2 test, scientists can determine if observed values are the same as values expected for a given situation. For example, you survey the number of crabs under "large" rocks and "small" rocks in a swift current to determine if there is a difference in the number of crabs under each rock type.
The total number of crabs under 20 rocks was:
Large rocks Small rocks
Observed 200
10
Expected 105
105
The expected number is established by determining the number of crabs expected if the null hypothesis were true. In this case there is a hypothesis (Ha) of a difference between the rocks and a null hypothesis (Ho) of no difference. So, if there are a total of 210 crabs collected, with no difference in the number found under each rock type, there must be an expected number of 105 for both large and small rocks (105+105 = 210).
The X2 statistic is then calculated by:
X2 = E [(observed expected)2/expected]
In this example, X2 = (200-105)2/105 + (10-105)2/105 = 171.9
For this case, the degrees of freedom (df) for the test is determined by the number of groups minus 1 (2-1=1). For 1 degree of freedom at a 0.05 significance level, the critical table value is 3.84. Since your calculated value is greater that the table value, the null hypothesis is rejected and you conclude there is a difference in the number of crabs under large rocks versus small rocks.
Return to Ecology Lab Syllabus