### Plants drive global water loss to the atmosphere

Water use by plants is extravagant. Not because they are wasteful, but because they must lose water from their leaves (transpiration) to maintain metabolism. And the air demands substantial amounts of water (because of high vapor pressure deficits).

Researchers have been interested in the role of plants in the water cycle for a long time (e.g., Zon 1927). But it is difficult to constrain realistic values at large scales because of challenges in scaling up from individual trees to forest stands and entire regions. This week, though, Scott Jasechko and colleagues report on their estimates for the plant contribution to the evapotranspiration flux. And it is huge.

They estimate that 80-90% of global terrestrial evapotranspiration goes through plants, based on measurements of stable isotopes in lakes around the world. This value is much higher and tighter than previous values (which are often based on scaled models and spatially-restricted observations of surface water fluxes). They attribute the reduction in uncertainty to their water isotope approach.

What do lake water isotopes tell us about plant water use? Well, lake water integrates all of the inputs from precipitation and losses from evaporation, transpiration, plant interception, and groundwater discharge. Most water has an atomic weight of 18u, but some of it is a little bit heavier from the heavier stable isotopes (19-22u). During surface evaporation, the lighter water molecules tend to go to the atmosphere first and the heavier molecules stick around. This causes the lake water to have more heavy water molecules than the incoming precipitation. But plants don't usually discriminate when they take water up from the soil (Lambers et al. 2008; but see Ellsworth and Williams 2007), which means that the relative contributions of evaporation and transpiration can be calculated based on the difference between the precipitation and lake water isotopes.

There are still major uncertainties, especially statistically (e.g., the East African transpiration estimates range from 10-200% of total evapotranspiration in some cases - highlighting some problems). But this is a cool study that will likely invigorate the research community (from biogeochemical modeling to plant physiology) to test its conclusions. It is definitely worth a close look.

### The BMI of forest trees

Forest trees follow some basic physical rules for body size allometry. For example, as trees grow taller, they must also grow wider to maintain mechanical stability. In the Duke Forest, this can be seen with the (fuzzy) relationship between diameter and height:

The residual variation has something to do with other anatomical properties of the trees. We can estimate aboveground biomass with tree height, trunk diameter, and some information on species-specific wood density:

$\large B(kg)=H(m)\times (D(m))^2\times WD(kg/m^{3})$

The allometric fit (from OLS regression, since I am more certain about the independent variable than I am about the dependent variable) shows a strong relationship between diameter and aboveground biomass.

What do these patterns mean for tree health? We might think that there is some relationship between health and size. Given the strong relationship between tree trunk diameter and biomass, we might be comfortable with assuming that we can understand size with just trunk diameter. But the relationship between diameter and height is not very certain and is asymmetric (the slope is higher at small diameters, even more than a simple power relationship), which means that we can't predict height very well. And height is an important variable in competitive ability.

Alternatives include metrics that tell us about how tall a tree is given its biomass, like the body mass index (BMI), which is heavily used in human health sciences:

$\large BMI=B(kg)/(H(m))^2$

In general, trees are very tall for their weight (especially when compared to humans, whose BMIs are 'normal' between 18 and 25). But there is quite a bit of variation (coefficient of variation is 66%). The distribution of tree BMIs follows a gamma probability density pretty nicely (pink line in the figure). This distribution is skewed to lower values, which makes sense if it is advantageous to be taller than neighboring trees of the same biomass.

The BMI struggles to be useful for human populations because we are uncertain about how it relates to body fat percentage (Romero-Corral et al. 2008). For trees, though, it might be a nice index of height allocation.

### Learning from the MOOCs

Amid the wide discussion about MOOCs, Nature magazine published an article yesterday considering how online courses may challenge and change higher education. Regardless of what eventually happens with the MOOC 'revolution' (dwindle, thrive, coexist with brick-and-mortar institutions?), it is probably the biggest and fastest trend in modern higher education... and the pedagogical consequences will be substantial.

The article discusses how online course platforms could contribute to research in higher education:
Instead of looking at aggregate data about students on average, for example, researchers can finally — with appropriate permissions and privacy safeguards — follow individual students throughout their university careers, measuring exactly how specific experiences and interactions affect their learning.
These edu-analytics could shed important insight into how students learn, allowing academics to quantitatively demonstrate how brick-and-mortar institutions contribute to learning outcomes (spoiler: online lectures cannot substitute for active learning).

But, will the major online education players (Coursera, edX, Udacity) make their data available? The 'open source' ethos of these institutions would suggest that they might, but edX is the only platform that explicitly states in their privacy policy that data might be used for "purposes of scientific research, particularly, for example, in the areas of cognitive science and education." edX is also the only platform that partners with particular institutions, which makes me think that they may only make this data available to individuals at those institutions.

If these organizations are committed to advancing education, I think that they should make their (privacy-protected) data publicly available. This step would not only keep the institutions accountable in their records of enrollment, completion, etc., but would also allow researchers to take full advantage of the rich educational information that performance data could offer. I am envisioning a public database with privacy safeguards that are similar to the U.S. Forest Inventory and Analysis program, which makes its information publicly available without specific identifiers (for the forest plots they are GPS coordinates on private land, for the MOOCs they might be traceable unique identifiers like names, IP addresses, etc.). Longitudinal information about student performance and learning could be the greatest contribution of the MOOCs to higher education — especially if it is open.

### The rise of quantitative ecology

Ecologists and environmental scientists (EES) are increasingly quantitative (Hastings et al. 2005, Jones et al. 2006). During my lifetime, ecology has gone from a field that was rarely quantitative to a field that frequently uses modern computational, mathematical, and statistical techniques.

To visualize this change, I searched the Web of Science for scholarly articles in "Ecology" or "Environmental Science" that contained (quantitative OR "computer programming" OR model OR simulation OR likelihood OR Bayes).

Articles with quantitative content have increased by 1% per year on average. Today, 1/3 of articles in these two subject categories discuss quantitative concepts. Quite the transformation!

I don't know what explains the jump in 1991... There was no change in the total number of publications. Any ideas?

### The grass is always greener: job prospects in academia

The job market
The National Science Foundation reports that scientific Ph.D. employment or postdoctoral study after graduation has declined over the last decade, from a high of 73% to a low of 66% in 2011 (source). This data and other findings have set off a wave of discussion about the merits of pursuing a scientific doctoral degree (in blogs blogs, news outlets, and journals). These analyses don't provide much information from other work sectors for comparison, which makes me concerned that we've been asking the wrong questions. Here I suggest that the "Ph.D. problem" might not be related to science at all.

Compared to other professions that require higher degrees, the scientific Ph.D. statistic is not unique. In 2011, just 63% of J.D. graduates were employed in a job that either required or preferred a law degree within nine months of graduation (source). The report adds that only 55% of graduates had "full-time, long-term jobs that required a law degree" within nine months. Similarly, from U.S. News & World Report statistics, 78% of M.B.A. graduates were employed within three months of getting their degrees (source). Also in 2011, only 73% of M.D. graduates who applied for a first-year residency position were placed (source).

The hours and compensation
Another component of the "Ph.D. problem" is the amount of work required to "make it," usually framed around horror stories of long hours working in the lab, preparing for classes, or grading exams. The NSF reports that scientists in education settings work 50.6 hours per week on average (source), with 82% of "scientists" working between 40 and 70 hours per week (source). While that figure is pretty jarring, it is not too different from other figures for professional fields in the US. For example, medical doctors across specialties average 53.9 hours per week (source). The Bureau of Labor Statistics reports that, across industries, 22% of all full-time employed Americans work over 48 hours a week (and 33% work over 40 hours, source). That demographic is concentrated with college graduates (source). Why do some people work so much? I'd argue that it is related to the prospect of earnings. Across industries in the US, workers who have longer work weeks bring in higher incomes on average (r = 0.6) (source). And people who do "research and development in the physical, engineering, and life sciences" for their living are relatively high up in both hours and earnings.

The grass is always greener
In sum, I want to emphasize that working is hard, no matter what one does. Job prospects are bleak across industries. Academia and scientific research are "extremistan," just like many other professional fields. I find that some of the self-perceived difficulty in academia is either ignorance of what other industries are like or intra-industry rhetoric. Luckily, at the population level, hard work leads to rewards, monetary and otherwise. This doesn't mean that it is acceptable or desirable to work 50 hours a week, but it may help dispel concerns that academic science is unique as a profession.

Back to work...

### Sell your science in two minutes

Nature magazine reports on an "Elevator Speech Contest" hosted by the American Society of Cell Biology. Condensing research into a short, easy to follow summary is challenging for many scientists. But it can be helpful for communicating our ideas to a broader audience.

From the article, Nancy Baron, a science communicator from Santa Barbara, California, "suggests thinking about four key topics: the problem, why it matters, potential solutions and the benefits of fixing it." The article also discusses focusing on the broader impacts and avoiding jargon and caveats.

Can we have a two-minute Elevator Pitch Contest at ESA this year?

### Which slope is correct?

I want to test the relationship between two variables. Specifically, I want to know whether the slope of the relationship is different from some theoretical expectation. For example, is it 2/3 or 3/4? That should be simple enough, I tell myself... I fit a regression model. Done, right? Not so fast. This deceptively simple problem of learning about slopes has been fodder for some great debates in ecology (e.g. Kolokotrones et al. 2010).

Why the debate? We have a theoretical expectation (or competing expectations), so we turn to our data to see whether it matches. The numbers will tell us! Well, what if there are different ways to estimate a slope? And what if our statistical workhorse, the ordinary least squares (OLS) regression, is biased in some situations?? Then we might be in some quantitative trouble.

(NOTE: mathematical details to follow. If you don't care, you can skip the next two paragraphs.)
Why do we use OLS regression? OLS regression is very powerful. It allows us to test for an association between two variables (does Y change systematically with X?). It also gives us a model for predicting Y with (new) observations of X. It depends on fitting a line between the variables that makes the expected Y values as close as possible to the observed Y values (minimizing the sum of squares residuals). The OLS slope equals cov(y,x)/var(x). This is where the problem arises, because the model conditions on the X observations. If there is uncertainty in the X values, OLS regression underestimates the slope of the actual relationship (see below).

How about an alternative? Yes, it is called standardized major axis (SMA) regression (also called reduced major axis or model-II by some). Instead of minimizing the distance between the fitted and the observed Y values, SMA regression identifies the shortest distance from the fit to the point (which is often at an angle and not parallel to the Y axis). The SMA slope equals sign(var(y))sqrt(var(y)/var(x)). It conditions on both the X and Y values, which is kind of like the first principal components axis. The SMA regression doesn't work for prediction because our observations are (X+ɛX) and (Y+ɛY) and our intention is to predict Y with new observations of X (i.e., X+ɛX), which means that our model should include any X errors that might exist. The SMA regression does work for identifying the (symmetric) line that best describes the relationship between X and Y.

In applications like allometry, where the ecologist is interested in the slope of the line between X and Y and whether it is different from some expectation, this can be a big deal! Can't I always use SMA for estimating slopes? This seems to be what many researchers do. When it is reported, allometric slopes are frequently estimated with SMA regression (e.g., Niklas and Enquist 2002). And it is all good (even with OLS slopes) as long as the overall variability is low (correlation is high). With some variability, though, one needs to think about which method to use.

Which regression slope? To examine our problem, I simulated some data (like the figure above), introduced different amounts of error, calculated the OLS and SMA regression slopes, and compared them to the actual (simulated) slope. I bootstrapped the simulation 10,000 times to generate 95% confidence intervals (dotted lines).

Here are some findings:
1. The OLS expectation (blue line) matches the actual slope when there is no uncertainty in the X variable.
2. As uncertainty is added to the X value (moving left to right in the figure), the OLS expectation significantly underestimates the actual slope.
3. The SMA expectation (pink line) matches the actual slope when the y-x variance ratio equals the actual slope.
4. When uncertainty in the X variable is low compared to the uncertainty in the Y variable, the SMA expectation significantly overestimates the slope.
5. As overall uncertainty increases, the biases are amplified.
Point 4 demonstrates that SMA slopes are also biased, depending on the x-y variance ratio (Smith 2009). SMA regression is appropriate in many cases for testing the value of a slope (Warton et al. 2007). But OLS works just as well, if not better, when the variability in the X variable is much lower than the variability in the Y variable, since the x-y variance ratio influences the SMA estimate.

An example from the literature
Fifteen years ago Brian Enquist and colleagues (1998) examined how tree water use depends on tree body size (see figure below), with a theoretically expected slope of 2. The authors conclude that the relationship is "nearly indistinguishable from [the] predicted" slope. The correlation is pretty darn good (r2 = 0.912), and the OLS and SMA slopes don't look too different. Not enough to be meaningful, right? Wrong. The SMA slope (1.83, 95% CI: 1.70-1.97) is weakly significantly different from 2 (P = 0.0180) while the OLS slope (1.75, 95% CI: 1.62-1.88) is clearly significantly different (P = 0.0003).

The estimated slopes are much closer to a more recent theoretical expectation of 1.85 (Savage et al. 2010) (OLS P = 0.14; SMA P = 0.79), but the difference is still pretty apparent, with the SMA slope being much steeper than the OLS slope and the OLS slope being close to significantly different from the expectation.

This all means that the water is still muddy, even when we throw data at it, because there are different ways to fit the data. In this example, I think that the uncertainty in stem diameter measurements is probably much lower than the uncertainty in xylem water transport, which means that I would probably use OLS regression (I think that is also what the original authors used). As the variance in the X variable gets closer to the variance in the Y variable, however, SMA slopes become more informative.

### A GEM of an idea

"...In that Empire, the Art of Cartography attained such Perfection that the map of a single Province occupied the entirety of a City, and the map of the Empire, the entirety of a Province. In time, those Unconscionable Maps no longer satisfied, and the Cartographers Guilds struck a Map of the Empire whose size was that of the Empire, and which coincided point for point with it. The following Generations, who were not so fond of the Study of Cartography as their Forebears had been, saw that that vast Map was Useless, and not without some Pitilessness was it, that they delivered it up to the Inclemencies of Sun and Winters." - Jorge Luis Borges (1946)
Simulate whole ecosystems, from elements to communities at the scale of the biosphere? That sounds like an idea that will ruffle some feathers... but it is what Drew Purves and colleagues propose in this week's issue of Nature. They suggest that ecologists should work to create general ecosystem models (GEMs), analogous to the general circulation models (GCMs) built and employed by atmospheric scientists, to advance our understanding of how Earth's ecosystems work and improve ecological policy recommendations.

In some ways, this idea is radical. Purves and others advocate for extreme abstraction and generality, which have conflicted histories in ecology. (As Purves and others note, "most researchers have resisted abstraction because ecological complexity is so obvious in nature. ... Many in the field also emphasize that findings in one ecosystem do not generalize to others, and that randomness and history could be as important in affecting some particular measurement as any deterministic rules.")

But in another sense, it seems old. The authors state that "Ecologists could apply a GEM to African savannas, for instance, to model the total biomass of all the plants, the grazers that feed on the plants, the carnivores that feed on the grazers and so on." This sounds strikingly like what ecologists have been doing, or attempting to do, for quite some time. The International Biological Program was a project initiated in the 1960s that was primarily focused on producing large-scale, mechanistic models of how Earth's major ecosystems function. It was highly influential in the development of ecosystem ecology and continues to influence the new (spatio-temporal) global carbon cycle models. Similarly, building on the development and expansion of food web theory in the early 20th century, researchers are now quantifying energy flows through trophic levels with high-level representations of ecosystem structure and function (e.g., Thompson et al. 2012). It might resemble a DGVM with animals.

So, to me, a GEM sounds like a meta-food web model (food webs linked with dispersal!) connected to a global carbon cycle model. Not necessarily novel, but not trivial by any means. Throughout the history of ecology we have relied on zero, one, and two dimensional (space, latitude, time... take your pick) representations of Earth's ecosystems. Perhaps the main developments from GEMs will be moving from the traditional site-based views to large-scale spatio-temporal simulation of multiple processes (which has been a general push in the community for at least the last decade). Then, the main discussion is whether or not we can obtain coherent output with an analytically tractable representation of our 'general ecology.' I can't say that I would want to take this project on (it could be a colossal flop), but I also can't wait to see what will come of it.

### How will science be communicated in the future?

The NextGen VOICES series at Science Magazine asks "how will scientists share their results with each other and the public in 50 years?"

Journals? OA journals? Blogs? Databases? What do you think? They're taking 250-word replies until April 5, 2013.

### How to write an ecological research article (a rough start, at least)

Thomas Basbøll, a research writing coach, states that scholarly research articles 'consist of roughly 40 paragraphs.' I am excited by his suggestions for getting my writing into shape (nerdy new years resolutions, anyone?), and so I wanted to investigate whether this axiom was true in ecology. I looked at the number of paragraphs across journals, and how these pargraphs were distributed among common sections (introduction, methods, results, and discussion).

I counted the number and distribution of paragraphs in the 20 most recent research articles in three diverse ecological journals: Ecology Letters, Global Change Biology, and Ecology (data link). There were no differences among journals in the average number of paragraphs per article (F2,57 = 0.8, P = 0.46). The median number of paragraphs was 30.5, but there was substantial variability across articles (range: 20 to 51), so to investigate how the average ecology article was structured, I also calculated the fraction of each category in each article.

The average ecology article has 5±2 introduction paragraphs, 11±3 methods paragraphs, 6±2 results paragraphs, and 8±3 discussion paragraphs (median ± sd). These distributions were consistent across journals, which is pretty cool! Of course, it assumes that paragraphs are the same length. Maybe people who write fewer paragraphs write longer ones?

I was surprised by the extent of the methods sections... Anything else interesting to anyone? It might be cool to investigate this by number of words, and expand it to include other journals. That would benefit from some sort of scraper program to automate it....