*Diversity*? Comme au courant! Well, you know I like to cover all the bases.

⁂

There are (or were when I was in college) three types of ecological diversity: alpha, beta, and gamma. Let's say we're talking about a territory in which there are a number of separate forests, each of which contains a number of trees which may be classified into discrete species.

**Gamma diversity** is the total diversity of trees within the territory. If we select two of the territory's trees at random, what is the probability that they will be of different species? That number (a "diversity index") would be a quantification of the territory's gamma diversity. You can think of the *gamma* as standing for *global*; we're looking at the diversity of individuals (trees) in the entire territory, without considering any of the smaller subgroups (forests) among which those individuals are distributed.

**Alpha diversity** is the diversity of trees within each forest. If we randomly select two trees from the same forest, what is the probability that they will be of different species? This is an index of alpha diversity. Think of *alpha* as representing the article *a* -- the internal diversity within *a* single forest. We can calculate a diversity index for each of the forests in the territory, and the mean of these numbers will be the alpha diversity of the territory as a whole.

For maximum simplicity, let's just look at territories that have only two forests (Forest 1 and Forest 2), which each have the same number of trees, and only two tree species (redwoods and bluewoods).

*different*species) is 1 minus that number, or 37.5%.

*forests*of the territory), that difference must be accounted for by

**beta diversity**: diversity

*between*forests. (Think of

*beta*as standing for

*between*-- though of course you really ought to say

*among*if there are more than two.) The remainder of this post will discuss the relative merits of various ways of calculating beta diversity.

**Approach 1: Forests as units**

*could*, but that would mean treating entire

*forests*the way we have been treating trees -- as unanalyzable units to be classified into a finite number of discrete "species." For example, if a territory had 10 spruce-fir forests, 5 oak-hickory forests, and 5 maple-beech-birch forests, we could calculate its beta diversity as 1 - (.5² + .25² + .25²) = .625.

*aren't*unanalyzable units, and classifying them qualitatively seems the wrong way to go about things. Forests can be more or less similar in their species profile; it's not a binary same/different question. Imagine a spruce-fir forest that is pretty much just spruce and fir, and a maple-beech-birch forest that is also pretty much just what it says on the tin. Now imagine a different country where the spruce-fir forest also has significant numbers of maple and birch trees, and where both the spruce-fir and the maple-beech-birch forests have plenty of hemlocks. This latter country obviously has less beta diversity -- that is, its forests are more similar to one another -- but this approach can't see that.

**Approach 2: All the gamma that's not alpha**

*β*=

*γ*/

*α*, which is obviously suboptimal. It would make 1 the

*minimum*figure for beta diversity, when it is the hypothetical

*maximum*for alpha and gamma, making it incommensurable with the other two types of diversity. It is also unable to deal countries like Ablestan, which have 0 alpha diversity and thus cause a divide-by-zero error.

*β*=

*γ*-

*α*. Let's look at our three territories again (reproduced here so you don't have to scroll up).

*between*the two forests -- be exactly the same as Ablestan's? In both territories, the trees in Forest 1 are 100% different from those in Forest 2. But the subtractive formula gives us a beta of only 37.5% for Easystan, lower than Ablestan's 50%. Obviously this formula is not capturing the intuitive meaning of beta diversity.

*more*in Foxstan than in Ablestan, because they differ in size as well as in species profile. But if we use the formula

*β*=

*γ*-

*α*, and alpha is 0, each territory's beta is equal to its gamma, which means Foxstan has

*less*beta diversity than Ablestan. This seems clearly wrong.

**Approach 3: An outgroup diversity index**

*not*in the same forest?

*because*the two forests are identical -- comparing two random trees from different forests is the same as comparing two from the same forest, or from the territory as a whole, so

*β*=

*α*=

*γ*

*.*This method gives Bakerstan a beta of 50%, when it ought to be 0. That's a pretty serious error!

*xi*) minus ingroup diversity (alpha):

*β*=

*ξ*-

*α*. That would give us the desired 0 beta value for Bakerstan and Charliestan. Does it work more generally? No. It fails the Easystan test.

*also*maximally different from one another -- not a single tree in Forest 1 is the same species as any tree in Forest 2 -- so its xi is 1, and its beta ought to be 1 as well. But because it has an alpha of 25%, its beta is only 75%.

**Approach 4: Slice-matching**

*sad*. (Alas, my experience with astronomy does not fill me with optimism.)

*every*forest? No, that clearly won't work. Imagine a territory with 4 all-redwood forests and 1 all-bluewood forest; no slices could be removed, and thus the beta would be 1 -- maximal beta diversity, despite the fact that three of the four forests are identical. No, slice-matching can only be done between a

*pair*of forests, and the beta diversity of the whole territory is calculated by taking the mean beta of all possible pairs of forests. In our example, there are 5 forests and thus 10 possible pairs of forests. Of these, 6 are red-red pairs with beta of 0, and 4 are red-blue pairs with beta of 1. The mean beta diversity for the whole territory would thus be 40%.

- Georgestan
- gamma = 75%
- alpha = 72.7%
- beta = 18.8%
- Howstan
- gamma = 75%
- alpha = 57.8%
- beta = 52.1%

- Itemstan
- gamma = 75%
- alpha = 75%
- beta = 0%
- Jigstan
- gamma = 75%
- alpha = 0%
- beta = 100%

**Is there a formula?**

*β*=

*γ*/

*α.*-- which we have found inadequate. Can the slice-matching approach to beta diversity also be reduced to a formula? This much seems intuitively obvious:

*Seems.*I haven't fully thought this through yet.)

*cannot*derive gamma if we know alpha and beta. Jigstan and Ablestan both have an alpha of 0 and a beta of 1, but their gamma is different. This is only possible because Ablestan has two forests but Jigstan has four, so perhaps a fourth variable -- the number of forests -- has to be included in the formula. My hunch (just a hunch) is that any one of those variables should be derivable from the other three, hopefully in a tolerably elegant manner.

## 6 comments:

This is a great post. Your method of comparisons is a creative idea, it somewhat reminds me of the Condorcet method of pairwise comparisons for voting. I don't have a formula for the three, but here are some preliminary thoughts:

Alpha and beta do not depend on the size of the forests because we are only concerned with the proportion. But gamma does depend on the size of the forests because we are adding up all the trees together. So that is another variable we need to consider. If all the forests have the same size, it's not a conern, but in the Foxstan example, it would be.

Another method for beta diversity would be the probability of picking two different trees given that we pick one tree from forest one and one from forest two.

So for Dogstan, we have four possibilities:

RR, RB, BR, BB. The probabilities are:

RR: 0.75*0.25 = 0.1875

RB: 0.75*0.75 = 0.5625

BR: 0.25*0.25 = 0.0625

BB: 0.25*0.75 = 0.1875

So, then the two that involve picking different trees are RB and BR, so beta = 0.5625+0.0625 = 0.625

Doing the same method with Georgestan is more work, but you get six values, one for each of the pairs and then average these so the beta is 0.7890625.

Some of the details of this are a little unclear to me.

The definition of alpha is clear when we are talking about one forest, because it's the same as gamma diversity applied to that forest. But how is alpha typically defined when you have a territory with, say, two forests of different size? My understanding from your post is that alpha diversity is defined as the average gamma diversity of the two forests, separately.

But then I am a bit confused about the statement that alpha diversity is necessarily lower than gamma diversity, because if you have one forest with a million red trees and another with 1 green tree and 1 red tree, the gamma diversity is close to zero, but the alpha diversity is some average of the gamma diversity of the first forest (zero) and the second (50%). For your claim to be true, the average would have to be weighted by the number of trees in the forest, right?

I'm interested in the problem you posed at the end, but the basic problem is still not well formed in my mind.

Good catch, John. Yes, alpha can exceed gamma when the forests are unequal in size.

@NLR

"Another method for beta diversity would be the probability of picking two different trees given that we pick one tree from forest one and one from forest two."

Isn't that my Approach 3? If there's a difference, I'm missing it.

A comment from my much more mathematical brother Luther:

I'm assuming you have, or can easily gain, some familiarity with P-norms (https://en.wikipedia.org/wiki/Norm_(mathematics)#p-norm and https://en.wikipedia.org/wiki/Lp_space#The_p-norm_in_finite_dimensions).

Your task in beta is to measure the distance between population vectors; that is vectors whose sum is one. Ergo, the question is what is the best measurement of distance for such vectors?

Your final beta for two populations is (a normalizing ½ times) the L-1 distance between the population vectors. That is, given x = (x1, x2, ... xn) where sum(xi) = 1 and y = (y1, y2, ... yn) where sum(yi) = 1 you report ½ sum(|xi-yi|). Given the base-line assumption that tree species are all equally far apart, L-1 is a reasonable choice, though not the only option. Do you want the difference between (1, 0, 0) and (0, 1, 0) to be equal to the difference between (1, 0, 0) and (0, 0.5, 0.5)? If so, use L-1. If you want the second to be more different, use a fractional norm instead (like L-⅔).

If you wanted to handle the "all pines are similar" you mentioned in the introduction, you'd instead want to have a weighted feature vector (I'd probably implement the weights as a matrix to make the whole a well-defined inner-product space). Once you add weights, it is likely that L-2 will be what you want rather than L-1, as L-2 is better at measuring comparable distances and outliers are not a problem when you have just two vectors.

For 3+ populations, you are computing all-pairs L-1 distances and averaging them. I suppose that's a fine approach, though expensive to compute if there are many forests. It's also a bit tricky to weight if you want big forests to matter more than little forests. I'd probably have found the weighted average population vector, then taken the weighted average difference from that average population to each other population instead of simply averaging all pairwise differences; but to know if that approach is actually better or not would require knowing the intended application of the measurement.

Yes, you're right. That is Approach 3.

Post a Comment