Fixing the bridge between biologists and statisticians

Models are wrong... but, some are useful (G. Box)!


Pairwise comparisons in nonlinear regression

Published at February 23, 2024 ·  8 min read

Pairwise comparisons are one of the most debated topic in agricultural research: they are very often used and, sometimes, abused, in literature. I have nothing against the appropriate use of this very useful technique and, for those who are interested, some colleagues and I have given a bunch of (hopefully) useful suggestions in a paper, a few years ago (follow this link here). According to the emails I often receive, there might be some interest in making pairwise comparisons in linear/nonlinear regression models....

Regression analyses with common checks in pesticide research

Published at December 15, 2023 ·  4 min read

In pesticide research or, in general, agriculture research, we very commonly encounter experiments with, e.g., several herbicides tested at different doses and in different conditions. For these experiments, the untreated control is always added and, of course, such control is common to all herbicides. For example, in another post (see here) we have considered an experiment with two herbicides (rimsulfuron and dicamba) at two rates (40 and 60 g/ha for rimsulfuron and 0....

Factorial designs with check in pesticide research

Published at December 15, 2023 ·  6 min read

In pesticide research or, in general, agriculture research, we very commonly encouter experiments with two/three crossed factors and some other treatment that is not included in the factorial structure. For example, let’s consider an experiment with two herbicides (rimsulfuron and dicamba) at two rates (40 and 60 g/ha for rimsulfuron and 0.6 and 1 kg/ha for dicamba) and with four adjuvant treatments (surfactant, frigate, mineral oil and no adjuvant). Apart from this fully crossed structure, we need to introduce, at least, an untreated control and a hand-weeded control....

Repeated measures and subsampling with perennial crops

Published at December 4, 2023 ·  5 min read

In a recent post, I have talked about repeated measures, for a case where measurements were taken repeatedly in the same plots across years see here. Previously, in another post, I had talked about subsampling, for a case where several random samples were taken from the same plot see here. Repeated measures and subsampling are vastly different: in the first case I am specifically interested in the ‘evolution’ of the response over time (or space, sometimes)....

Back-transformations with emmeans()

Published at November 30, 2023 ·  5 min read

I am one of those old guys who still uses the stabilising transformations, when the data do not conform to the basic assumptions for ANOVA. Indeed, apart from counts and proportions, where GLMs can be very useful, I have not yet found a simple way to deal with heteroscedasticity for continuous variables, such as yield, weight, height and so on. Yes, I know, Generalised Least Squares (GLS) can be useful to fit heteroscedastic models, but I would argue that stabilising transformations are, conceptually, very much simpler and they can be easily thought to PhD students and practitioners, with only a basic level of knowledge about statistics....

Designed experiments with replicates: Principal components or Canonical Variates?

Published at November 2, 2023 ·  16 min read

A few days ago, a colleague of mine wanted to hear my opinion about what multivariate method would be the best for a randomised field experiment with replicates. We had a nice discussion and I thought that such a case-study might be generally interesting for the agricultural sciences; thus, I decided to take my Apple Mac-Book PRO, sit down, relax and write a new post on this matter. My colleague’s research study was similar to this one: a randomised block field experiment (three replicates) with 16 durum wheat genotypes, which was repeated in four years....

GGE analyses for multi-environment studies

Published at May 31, 2023 ·  12 min read

In a recent post we have seen that we can use Principal Component Analyses (PCA) to elucidate the ‘genotype by environment’ relationship (see this post). Whenever the starting point for PCA is the doubly-centered (centered by rows and columns) matrix of yields across environments, we talk about AMMI analysis, which is often used to get insight into the stability of genotype yields across environments. By changing the starting matrix, we can obtain a different perspective and put focus on the definition of macroenvironments and on the selection of winning genotypes....

AMMI analyses for multi-environment studies

Published at May 26, 2023 ·  19 min read

Again into a subject that is rather important for most agronomists, i.e. the selection of crop varieties. All farmers are perfectly aware that crop performances are affected both by the genotype and by the environment. These two effects are not purely additive and they often show a significant interaction. By this word, we mean that a genotype can give particularly good/bad performances in some specific environmental situations, which we may not expect, considering its average behaviour in other environmental conditions....

Repeated measures with perennial crops

Published at March 30, 2023 ·  8 min read

In this post, I want to discuss a concept that is often mistaken by some of my collegues. With all crops, we are used to repeating experiments across years to obtain multi-year data; the structure of the resulting dataset is always the same and it is exemplified in the box below, that refers to a multi-year genotype experiment with winter wheat. rm(list = ls()) library(tidyverse) library(nlme) library(emmeans) filePath <- "https://www....

Subsampling in field experiments

Published at March 29, 2023 ·  11 min read

Subsampling is very common in field experiments in agriculture. It happens when we collect several random samples from each plot and we submit them to some sort of measurement process. Some examples? Let’s imagine that we have randomised field experiments with three replicates and, either,: we collect the whole grain yield in each plot, select four subsamples and measure, in each subsample, the oil content or some other relevant chemical property, or we collect, from each plot, four plants and measure their heights, or we collect a representative soil sample from each plot and perform chemical analyses in triplicate....