I want to test the relationship between two variables. Specifically, I want to know whether the slope of the relationship is different from some theoretical expectation. For example,

*is it 2/3 or 3/4*? That should be simple enough, I tell myself... I fit a regression model. Done, right?

**Not so fast.** This deceptively simple problem of learning about slopes has been fodder for some great debates in ecology (e.g.

Kolokotrones et al. 2010).

**Why the debate?** We have a theoretical expectation (or competing expectations), so we turn to our data to see whether it matches. The numbers will tell us! Well, what if there are different ways to estimate a slope? And what if our statistical workhorse, the ordinary least squares (OLS) regression, is biased in some situations?? Then we might be in some quantitative trouble.

*(NOTE: mathematical details to follow. If you don't care, you can skip the next two paragraphs.)*
**Why do we use OLS regression?** OLS regression is very powerful. It allows us to test for an association between two variables (does Y change systematically with X?). It also gives us a model for predicting Y with (new) observations of X. It depends on fitting a line between the variables that makes the expected Y values as close as possible to the observed Y values (minimizing the sum of squares residuals). The OLS slope equals cov(y,x)/var(x).

*This is where the problem arises*, because the model conditions on the X observations. If there is uncertainty in the X values, OLS regression underestimates the slope of the actual relationship (see below).

**How about an alternative?** Yes, it is called standardized major axis (SMA) regression (also called reduced major axis or model-II by some). Instead of minimizing the distance between the fitted and the observed Y values, SMA regression identifies the shortest distance from the fit to the point (which is often at an angle and not parallel to the Y axis). The SMA slope equals sign(var(y))sqrt(var(y)/var(x)). It conditions on both the X and Y values, which is kind of like the first principal components axis. The SMA regression doesn't work for prediction because our observations are (X+ɛX) and (Y+ɛY) and our intention is to predict Y with new observations of X (i.e., X+ɛX), which means that our model should include any X errors that might exist. The SMA regression does work for identifying the (symmetric) line that best describes the relationship between X and Y.

In applications like allometry, where the ecologist is interested in the slope of the line between X and Y and whether it is different from some expectation, this can be a big deal!

**Can't I always use SMA for estimating slopes?** This seems to be what many researchers do. When it is reported, allometric slopes are frequently estimated with SMA regression (e.g.,

Niklas and Enquist 2002). And it is all good (even with OLS slopes) as long as the overall variability is low (correlation is high). With some variability, though, one needs to think about which method to use.

**Which regression slope?** To examine our problem, I simulated some data (like the figure above), introduced different amounts of error, calculated the OLS and SMA regression slopes, and compared them to the actual (simulated) slope. I bootstrapped the simulation 10,000 times to generate 95% confidence intervals (dotted lines).

Here are some findings:

- The OLS expectation (blue line) matches the actual slope when there is no uncertainty in the X variable.
- As uncertainty is added to the X value (moving left to right in the figure), the OLS expectation significantly underestimates the actual slope.
- The SMA expectation (pink line) matches the actual slope when the y-x variance ratio equals the actual slope.
- When uncertainty in the X variable is low compared to the uncertainty in the Y variable, the SMA expectation significantly overestimates the slope.
- As overall uncertainty increases, the biases are amplified.

Point 4 demonstrates that SMA slopes are also biased, depending on the x-y variance ratio (

Smith 2009). SMA regression is appropriate in many cases for testing the value of a slope (

Warton et al. 2007). But OLS works just as well, if not better, when the variability in the X variable is much lower than the variability in the Y variable, since the x-y variance ratio influences the SMA estimate.

**An example from the literature**
Fifteen years ago Brian Enquist and colleagues (

1998) examined how tree water use depends on tree body size (see figure below), with a theoretically expected slope of 2. The authors conclude that the relationship is "nearly indistinguishable from [the] predicted" slope. The correlation is pretty darn good (r

^{2} = 0.912), and the OLS and SMA slopes don't look too different. Not enough to be meaningful, right?

*Wrong.* The SMA slope (1.83, 95% CI: 1.70-1.97) is weakly significantly different from 2 (P = 0.0180) while the OLS slope (1.75, 95% CI: 1.62-1.88) is clearly significantly different (P = 0.0003).

The estimated slopes are much closer to a more recent theoretical expectation of 1.85 (

Savage et al. 2010) (OLS P = 0.14; SMA P = 0.79), but the difference is still pretty apparent, with the SMA slope being much steeper than the OLS slope and the OLS slope being close to significantly different from the expectation.

This all means that the water is still muddy, even when we throw data at it, because there are different ways to fit the data. In this example, I think that the uncertainty in stem diameter measurements is probably much lower than the uncertainty in xylem water transport, which means that I would probably use OLS regression (I think that is also what the original authors used). As the variance in the X variable gets closer to the variance in the Y variable, however, SMA slopes become more informative.

Read More...