Fixing the bridge between biologists and statisticians

Published at June 6, 2024 · 7 min read

Very often, we agronomists have to deal with weather data, e.g., to evaluate and explain the behaviour of genotypes in different environments. We are very much used to representing temperature and rainfall data in one single graph with two y-axis, which gives a good immediate insight on the weather pattern at a certain location. Unfortunately, I had to discover that doing such graphs with ggplot() is not a straightforward task....

Here is why I don't care about the Levene's test

Published at March 15, 2024 · 5 min read

During my stat courses, I never give my students any information about the Levene’s test (Levene and Howard, 1960), or other similar tests for homoscedasticity, unless I am specifically prompted to do so. It is not that I intend to underrate the tremendous importance of checking for the basic assumptions of linear model! On the contrary, I always show my students several methods for the graphical inspection of model residuals, but I do not share the same aching desire for a P-value, that most of my colleagues seem to possess....

Pairwise comparisons in nonlinear regression

Published at February 23, 2024 · 8 min read

Pairwise comparisons are one of the most debated topic in agricultural research: they are very often used and, sometimes, abused, in literature. I have nothing against the appropriate use of this very useful technique and, for those who are interested, some colleagues and I have given a bunch of (hopefully) useful suggestions in a paper, a few years ago (follow this link here). According to the emails I often receive, there might be some interest in making pairwise comparisons in linear/nonlinear regression models....

Factorial designs with check in pesticide research

Published at December 15, 2023 · 6 min read

In pesticide research or, in general, agriculture research, we very commonly encouter experiments with two/three crossed factors and some other treatment that is not included in the factorial structure. For example, let’s consider an experiment with two herbicides (rimsulfuron and dicamba) at two rates (40 and 60 g/ha for rimsulfuron and 0.6 and 1 kg/ha for dicamba) and with four adjuvant treatments (surfactant, frigate, mineral oil and no adjuvant). Apart from this fully crossed structure, we need to introduce, at least, an untreated control and a hand-weeded control....

Regression analyses with common checks in pesticide research

Published at December 15, 2023 · 4 min read

In pesticide research or, in general, agriculture research, we very commonly encounter experiments with, e.g., several herbicides tested at different doses and in different conditions. For these experiments, the untreated control is always added and, of course, such control is common to all herbicides. For example, in another post (see here) we have considered an experiment with two herbicides (rimsulfuron and dicamba) at two rates (40 and 60 g/ha for rimsulfuron and 0....

Repeated measures and subsampling with perennial crops

Published at December 4, 2023 · 5 min read

In a recent post, I have talked about repeated measures, for a case where measurements were taken repeatedly in the same plots across years see here. Previously, in another post, I had talked about subsampling, for a case where several random samples were taken from the same plot see here. Repeated measures and subsampling are vastly different: in the first case I am specifically interested in the ‘evolution’ of the response over time (or space, sometimes)....

Back-transformations with emmeans()

Published at November 30, 2023 · 5 min read

I am one of those old guys who still uses the stabilising transformations, when the data do not conform to the basic assumptions for ANOVA. Indeed, apart from counts and proportions, where GLMs can be very useful, I have not yet found a simple way to deal with heteroscedasticity for continuous variables, such as yield, weight, height and so on. Yes, I know, Generalised Least Squares (GLS) can be useful to fit heteroscedastic models, but I would argue that stabilising transformations are, conceptually, very much simpler and they can be easily thought to PhD students and practitioners, with only a basic level of knowledge about statistics....

Designed experiments with replicates: Principal components or Canonical Variates?

Published at November 2, 2023 · 16 min read

A few days ago, a colleague of mine wanted to hear my opinion about what multivariate method would be the best for a randomised field experiment with replicates. We had a nice discussion and I thought that such a case-study might be generally interesting for the agricultural sciences; thus, I decided to take my Apple Mac-Book PRO, sit down, relax and write a new post on this matter. My colleague’s research study was similar to this one: a randomised block field experiment (three replicates) with 16 durum wheat genotypes, which was repeated in four years....

GGE analyses for multi-environment studies

Published at May 31, 2023 · 12 min read

In a recent post we have seen that we can use Principal Component Analyses (PCA) to elucidate the ‘genotype by environment’ relationship (see this post). Whenever the starting point for PCA is the doubly-centered (centered by rows and columns) matrix of yields across environments, we talk about AMMI analysis, which is often used to get insight into the stability of genotype yields across environments. By changing the starting matrix, we can obtain a different perspective and put focus on the definition of macroenvironments and on the selection of winning genotypes....

AMMI analyses for multi-environment studies

Published at May 26, 2023 · 19 min read

Again into a subject that is rather important for most agronomists, i.e. the selection of crop varieties. All farmers are perfectly aware that crop performances are affected both by the genotype and by the environment. These two effects are not purely additive and they often show a significant interaction. By this word, we mean that a genotype can give particularly good/bad performances in some specific environmental situations, which we may not expect, considering its average behaviour in other environmental conditions....

Repeated measures with perennial crops

Published at March 30, 2023 · 8 min read

In this post, I want to discuss a concept that is often mistaken by some of my collegues. With all crops, we are used to repeating experiments across years to obtain multi-year data; the structure of the resulting dataset is always the same and it is exemplified in the box below, that refers to a multi-year genotype experiment with winter wheat. rm(list = ls()) library(tidyverse) library(nlme) library(emmeans) filePath <- "https://www....

Subsampling in field experiments

Published at March 29, 2023 · 11 min read

Subsampling is very common in field experiments in agriculture. It happens when we collect several random samples from each plot and we submit them to some sort of measurement process. Some examples? Let’s imagine that we have randomised field experiments with three replicates and, either,: we collect the whole grain yield in each plot, select four subsamples and measure, in each subsample, the oil content or some other relevant chemical property, or we collect, from each plot, four plants and measure their heights, or we collect a representative soil sample from each plot and perform chemical analyses in triplicate....

Fitting threshold models to seed germination data

Published at March 13, 2023 · 19 min read

In previous posts we have shown that we can use time-to-event curves to describe the germination pattern of a seed population (see here). We have also shown that these curves can be modified to include the effects of external/internal factors/covariates, such as the genotype, the species, the humidity content and temperature in the substrate (see here and here). These modified time-to-event curves can be fitted in ‘one-step’, i.e., we start from the germination data with the appropriate shape (see here), fit the model and retrieve the estimates of model parameters ( go to here for an example )....

Fitting hydro-thermal-time-models to seed germination data

Published at February 10, 2023 · 15 min read

This is a follow-up post. If you are interested in other posts of this series, please go to: https://www.statforbiology.com/tags/drcte/. All these posts exapand on a paper that we have recently published in the Journal ‘Weed Science’; please follow this link to the paper. Germination assay This dataset was obtained from previously published work (Mesgaran et al., 2017) with Hordeum spontaneum [C. Koch] Thell. The germination assay was conducted using four replicates of 20 seeds tested at six different water potential levels (0, −0....

Fitting thermal-time-models to seed germination data

Published at February 10, 2023 · 7 min read

This is a follow-up post. If you are interested in other posts of this series, please go to: https://www.statforbiology.com/tags/drcte/. All these posts exapand on a paper that we have recently published in the Journal ‘Weed Science’; please follow this link to the paper. A motivating examples In recent times, we wanted to model the effect of temperature on seed germination for Hordeum vulgare and we made an assay with three replicated Petri dishes (50 seeds each) at 9 constant temperature levels (1, 3, 7, 10, 15, 20, 25, 30, 35, 40 °C)....

The coefficient of determination: is it the R-squared or r-squared?

Published at November 26, 2022 · 9 min read

We often use the coefficient of determination as a swift ‘measure’ of goodness of fit for our regression models. Unfortunately, there is no unique symbol for such a coefficient and both $R^2$ and $r^2$ are used in literature, almost interchangeably. Such an interchangeability is also endorsed by the Wikipedia (see at: https://en.wikipedia.org/wiki/Coefficient_of_determination ), where both symbols are reported as the abbreviations for this statistical index. As an editor of several International Journals, I should not agree with such an approach; indeed, the two symbols $R^2$ and $r^2$ mean two different things, and they are not necessarily interchangeable, because, depending on the setting, either of the two may be wrong or ambiguous....

Multi-environment split-plot experiments

Published at September 13, 2022 · 7 min read

Have you made a split-plot field experiment? Have you repeated such an experiment in two (or more) years/locations? Have you run into troubles, because the reviewer told you that your ANOVA model was invalid? If so, please, stop for awhile and read: this post might help you understand what was wrong with your analyses. Motivating example Let’s think of a field experiment, where 6 genotypes of faba bean were compared under two different sowing times (autumn and spring)....

Meta-analysis for a single study. Is it possible?

Published at July 21, 2022 · 12 min read

We all know that the word meta-analysis encompasses a body of statistical techniques to combine quantitative evidence from several independent studies. However, I have recently discovered that meta-analytic methods can also be used to analyse the results of a single research project. That happened a few months ago, when I was reading a paper from Damesa et al. (2017), where the authors describe some interesting methods of data analyses for multi-environment genotype experiments....

Should I say ''there is no difference'' or ''the difference is not significant''?

Published at June 1, 2022 · 5 min read

In a recent manuscript we wrote a sentence similar to the following: “On average, the genotype A gave a yield of 12.4 tons per hectare, while the genotype B gave 10.6 tons per hectare and such a difference was not significant (P = 0.20)”. Perhaps I should point out that we were talking about maize yields… One of the reviewers complained that “This is an example of expression having no place in a scientific paper” and that we should write: “… no difference in yield was found between A and B (P = 0....

Analysing seed germination and emergence data with R (a tutorial). Part 6

Published at January 18, 2022 · 13 min read

This is a follow-up post. If you are interested in other posts of this series, please go to: https://www.statforbiology.com/tags/drcte/. All these posts exapand on a paper that we have recently published in the Journal ‘Weed Science’; please follow this link to the paper. Fitting time-to-event models with environmental covariates In the previous post we have shown that time-to-event curves (e.g., germination or emergence curves) can be used to describe the time course of germinations/emergences for a seed lot (this post)....

Analysing seed germination and emergence data with R (a tutorial). Part 7

Published at January 18, 2022 · 4 min read

This is a follow-up post. If you are interested in other posts of this series, please go to: https://www.statforbiology.com/tags/drcte/. All these posts expand on a paper that we have recently published in the Journal ‘Weed Science’; please follow this link to the paper. Exploring the results of a time-to-event fit: model parameters In the previous post we have shown that time-to-event curves (e.g., germination or emergence curves) can be used to describe the time course of germinations/emergences for a seed lot (this post)....

Analysing seed germination and emergence data with R (a tutorial). Part 8

Published at January 18, 2022 · 8 min read

This is a follow-up post. If you are interested in other posts of this series, please go to: https://www.statforbiology.com/tags/drcte/. All these posts expand on a paper that we have recently published in the Journal ‘Weed Science’; please follow this link to the paper. Predictions from a parametric time-to-event model In previous posts we have shown that time-to-event models (e.g., germination or emergence models) can be used to describe the time course of germinations/emergences for a seed lot (this post) or for several seed lots, submitted to different experimental treatments (this post)....

Analysing seed germination and emergence data with R (a tutorial). Part 9

Published at January 18, 2022 · 10 min read

This is a follow-up post. If you are interested in other posts of this series, please go to: https://www.statforbiology.com/tags/drcte/. All these posts expand on a manuscript that we have recently published in the Journal ‘Weed Science’; please follow this link to the paper. In order to work throughout this post, you need to install the ‘drcte’ and ‘drcSeedGerm’ packages, by using the code provided in this page. Quantiles from time-to-event models We have previously shown that time-to-event models (e....

Analysing seed germination and emergence data with R: a tutorial. Part 5

Published at December 23, 2021 · 14 min read

This is a follow-up post. If you are interested in other posts of this series, please go to: https://www.statforbiology.com/tags/drcte/. All these posts exapand on a paper that we have recently published in the Journal ‘Weed Science’; please follow this link to the paper. Comparing germination/emergence for several seed lots Very often, seed scientists need to compare the germination behavior of different seed populations, e.g., different plant species, or one single plant species submitted to different temperatures, light conditions, priming treatments and so on....

Analysing seed germination and emergence data with R: a tutorial. Part 4

Published at December 6, 2021 · 9 min read

This is a follow-up post. If you are interested in other posts of this series, please go to: https://www.statforbiology.com/tags/drcte/. All these posts exapand on a paper that we have recently published in the Journal ‘Weed Science’; please follow this link to the paper. Time-to-event models for seed germination/emergence The individual seeds within a population do not germinate/emerge altogether at the same moment; this is an undisputed fact, resulting from seed-to-seed variability in germination/emergence time....

Biplots are everywhere: where do they come from?

Published at November 24, 2021 · 25 min read

Principal Component Analysis (PCA) is perhaps the most widespread multivariate technique in biology and it is used to summarise the results of experiments in a wide range of disciplines, from agronomy to botany, from entomology to plant pathology. Whenever possible, the results are presented by way of a biplot, an ubiquitous type of graph with a formidable descriptive value. Indeed, carefully drawn biplots can be used to represent, altogether, the experimental subjects, the experimental variables and their reciprocal relationships (distances and correlations)....

Principal Component Analysis: a brief intro for biologists

Published at November 23, 2021 · 24 min read

In this post I am revisiting the concept of Principal Component Analysis (PCA). You might say that there is no need for that, as the Internet is full with posts relating to such a rather old technique. However, I feel that, in those posts, the theoretical aspects are either too deeply rooted in maths or they are skipped altogether, so that the main emphasis is on interpreting the output of an R function....

Analysing seed germination and emergence data with R: a tutorial. Part 3

Published at October 19, 2021 · 12 min read

This is a follow-up post. If you are interested in other posts of this series, please go to: https://www.statforbiology.com/tags/drcte/. All these posts exapand on a paper that we have recently published in the Journal ‘Weed Science’; please follow this link to the paper. Reshaping time-to-event data The first thing we should consider before working through this tutorial is the structure of germination/emergence data. To our experience, seed scientists are used to storing their datasets in several formats, that may not be immediately usable with the ‘drcte’ and ‘drc’ packages, which this tutorial is built upon....

Analysing seed germination and emergence data with R (a tutorial). Part 2

Published at October 9, 2021 · 17 min read

This is a follow-up post: if you are interested in other posts of this series, please go to: https://www.statforbiology.com/tags/drcte/. All these posts exapand on a paper that we have recently published in the Journal ‘Weed Science’; please follow this link to the paper. Survival analysis and germination/emergence data: an overlooked connection Seed germination and emergence data describe the time until the event of interest occurs and, therefore, they can be put together in the wide group of time-to-event data....

Analysing seed germination and emergence data with R (a tutorial). Part 1

Published at October 7, 2021 · 4 min read

Introduction to the tutorial Germination/emergence assays are relatively easy to perform, by following standardised procedures, as described, e.g., by the International Seed Testing Association (see here ). In short, we take a sample of seeds and we put them in an appropriate container. We put the container in the right environmental conditions (e.g., relating to humidity content and temperature) and we inspect the seeds according to a regular schedule (e.g., daily)....

Why are derivatives important in life? A case-study with nonlinear regression

Published at June 9, 2021 · 7 min read

In general, undergraduate students in biology/ecology courses tend to consider the derivatives as a very abstract entity, with no real usefulness in the everyday life. In my work as a teacher, I have often tried to fight against such an attitude, by providing convincing examples on how we can use the derivatives to get a better understanding about the changes on a given system. In this post I’ll tell you about a recent situation where I was involved with derivatives....

Other useful functions for nonlinear regression: threshold models and all that

Published at May 1, 2021 · 13 min read

In a recent post I presented several equations and just as many self-starting functions for nonlinear regression analyses in R. Today, I would like to build upon that post and present some further equations, relating to the so-called threshold models. But, … what are threshold models? In some instances, we need to describe relationships where the response variable changes abruptly, following a small change in the predictor. A typical threshold model looks like that in the Figure below, where we see three threshold levels:...

The R-squared and nonlinear regression: a difficult marriage?

Published at March 25, 2021 · 4 min read

Making sure that a fitted model gives a good description of the observed data is a fundamental step of every nonlinear regression analysis. To this aim we can (and should) use several techniques, either graphical or based on formal hypothesis testing methods. However, in the end, I must admit that I often feel the need of displaying a simple index, based on a single and largely understood value, that reassures the readers about the goodness of fit of my models....

lmDiallel: a new R package to fit diallel models. Multienvironment diallel experiments

Published at March 5, 2021 · 7 min read

In recent times, a few colleagues at my Department and I have devoted some research effort to data management for diallel mating experiments, which we have summarised in a paper (Onofri et al., 2020) and a series of five blog posts (see here). A final topic that remains to be covered relates to the frequent possibility that these diallel experiments are repeated across years and/or locations. How should the resulting dataset be analysed?...

lmDiallel: a new R package to fit diallel models. The Gardner-Eberhart models

Published at February 22, 2021 · 15 min read

Another post for this series about diallel mating experiments. So far, we have published a paper in Plant Breeding (Onofri et al., 2020), where we presented lmDiallel, a new R package to fit diallel models. We followed up this paper with a series of four blog posts, giving more detail about the package (see here), about the Hayman’s models type 1 (see here) and type 2 (see here) and about the Griffing’s family of models (see here)....

Split-plot designs: the transition to mixed models for a dinosaur

Published at February 11, 2021 · 15 min read

Those who long ago took courses in ‘analysis of variance’ or ‘experimental design’ … would have learned methods … based on observed and expected mean squares and methods of testing based on ‘error strata’ (if you weren’t forced to learn this, consider yourself lucky). (Douglas Bates, 2006). In a previous post, I already mentioned that, due to my age, I see myself as a dinosaur within the R-users community. I already mentioned how difficult it is, for a dinosaur, to adjust to new concepts and paradigms in data analysis, after having done things differently for a long time ( see this post here )....

Pairwise comparisons in nonlinear regression

Published at January 19, 2021 · 6 min read

Pairwise comparisons are one of the most debated topic in agricultural research: they are very often used and, sometimes, abused, in literature. I have nothing against the appropriate use of this very useful technique and, for those who are interested, some colleagues and I have given a bunch of (hopefully) useful suggestions in a paper, a few years ago (follow this link here). Pairwise comparisons usually follow the application of some sort of linear or generalised linear model; in this setting, the ‘emmeans’ package (Lenth, 2020) is very handy, as it uses a very logical approach....

lmDiallel: a new R package to fit diallel models. The Griffing's models (1956)

Published at January 12, 2021 · 10 min read

Diallel mating designs are often used by plant breeders to compare the possible crosses between a set of genotypes. In spite of such widespread usage, the process of data analysis in R is not yet strightforward and it is not clear which tool should be routinely used. We recently gave a small contribution by publishing a paper in Plant Breeding (Onofri et al., 2020 ), where we advocated the idea that models for diallel crosses are just a class of general linear models, that should be fit by Ordinary Least Squares (OLS) or REstricted Maximum Likelihood methods (REML)....

lmDiallel: a new R package to fit diallel models. The Hayman's model (type 2)

Published at January 5, 2021 · 9 min read

This posts follows two other previously published posts, where we presented our new ‘lmDiallel’ package (see here) and showed how we can use it to fit the Hayman’s model type 1, as proposed in Hayman (1954) (see here). In this post, we will give a further example relating to another very widespread model from the same author, the Hayman’s model type 2. We apologise for some overlapping with previous posts: we think this is necessary so that each post can be read on its own....

General code to fit ANOVA models with JAGS and 'rjags'

Published at December 23, 2020 · 15 min read

One of the reasons why I like BUGS and all related dialects has been put nicely in a very good book, i.e. “Introduction to WinBUGS for ecologists” (Kery, 2010); at page 11, the author says: “WinBUGS helps free the modeler in you”. Ultimately, that statement is true: when I have fully understood a model with all its components (and thus I have become a modeler), I can very logically translate it to BUGS code....

From ''for()'' loops to the ''split-apply-combine'' paradigm for column-wise tasks: the transition for a dinosaur

Published at December 11, 2020 · 9 min read

I have been involved with data crunching for 30 years, and, due to my age, I see myself as a dinosaur within the R-users community. I must admit, I’m rather slow to incorporate new paradigms in my programming workflow … I’m pretty busy and the time I save today is often more important than the time I could save in the future, by picking up new techniques. However, resisting to progress is not necessarily a good idea and, from time to time, also a dinosaur feels like living more dangerously and exploring new ideas and views....

Accounting for the experimental design in linear/nonlinear regression analyses

Published at December 4, 2020 · 11 min read

In this post, I am going to talk about an issue that is often overlooked by agronomists and biologists. The point is that field experiments are very often laid down in blocks, using split-plot designs, strip-plot designs or other types of designs with grouping factors (blocks, main-plots, sub-plots). We know that these grouping factors should be appropriately accounted for in data analyses: ‘analyze them as you have randomized them’ is a common saying attributed to Ronald Fisher....

lmDiallel: a new R package to fit diallel models. The Hayman's model (type 1)

Published at November 26, 2020 · 15 min read

In a previous post we have presented our new ‘lmDiallel’ package (see this link here and see also the original paper in Theoretical and Applied Genetics). This package provides an extensions to fit a class of linear models of interest for plant breeders or geneticists, the so-called diallel models. In this post and other future posts we would like to present some examples of how to use this package: please, sit back and relax and, if you have comments, let us know, using the email link at the bottom of this post....

lmDiallel: a new R package to fit diallel models. Introduction

Published at November 11, 2020 · 7 min read

Together with some colleagues from the plant breeding group, we have just published a new paper, where we presented a bunch of R functions to analyse the data from diallel experiments. The paper is titled ‘Linear models for diallel crosses: a review with R functions’ and it is published in the ‘Theoretical and Applied Genetics’ Journal. If you are interested, you can take a look here at this link....

QQ-plots and Box-Whisker plots: where do they come from?

Published at October 15, 2020 · 7 min read

For the most curious students QQ-plots and Box-Whisker plots usually become part of the statistical toolbox for the students attending my course of ‘Experimental methods in agriculture’. Most of them learn that the QQ-plot can be used to check for the basic assumption of gaussian residuals in linear models and that the Box-Whisker plot can be used to describe the experimental groups, when their size is big enough and we do not want to assume a gaussian distribution....

Building ANOVA-models for long-term experiments in agriculture

Published at August 20, 2020 · 29 min read

This is the follow-up of a manuscript that we (some colleagues and I) have published in 2016 in the European Journal of Agronomy (Onofri et al., 2016). I thought that it might be a good idea to rework some concepts to make them less formal, simpler to follow and more closely related to the implementation with R. Please, be patient: this lesson may be longer than usual. What are long-term experiments?...

Fitting complex mixed models with nlme. Example #5

Published at June 5, 2020 · 14 min read

A Joint Regression model Let’s talk about a very old, but, nonetheless, useful technique. It is widely known that the yield of a genotype in different environments depends on environmental covariates, such as the amount of rainfall in some critical periods of time. Apart from rain, also temperature, wind, solar radiation, air humidity and soil characteristics may concur to characterise a certain environment as good or bad and, ultimately, to determine yield potential....

Seed germination: fitting hydro-time models with R

Published at March 23, 2020 · 16 min read

THE CODE IN THIS POST WAS UPDATED ON JANUARY 2022 I am locked at home, due to the COVID-19 emergency in Italy. Luckily I am healthy, but there is not much to do, inside. I thought it might be nice to spend some time to talk about seed germination models and the connections with survival analysis. We all know that seeds need water to germinate. Indeed, the absorption of water activates the hydrolytic enzymes, which break down food resources stored in seeds and provide energy for germination....

A collection of self-starters for nonlinear regression in R

Published at February 26, 2020 · 29 min read

Usually, the first step of every nonlinear regression analysis is to select the function $f$, which best describes the phenomenon under study. The next step is to fit this function to the observed data, possibly by using some sort of nonlinear least squares algorithms. These algorithms are iterative, in the sense that they start from some initial values of model parameters and repeat a sequence of operations, which continuously improve the initial guesses, until the least squares solution is approximately reached....

Self-starting routines for nonlinear regression models

Published at February 14, 2020 · 8 min read

(Post updated on 17/07/2023) In R, the drc package represents one of the main solutions for nonlinear regression and dose-response analyses (Ritz et al., 2015). It comes with a lot of nonlinear models, which are useful to describe several biological processes, from plant growth to bioassays, from herbicide degradation to seed germination. These models are provided with self-starting functions, which free the user from the hassle of providing initial guesses for model parameters....

Some everyday data tasks: a few hints with R (revisited)

Published at January 28, 2020 · 12 min read

One year ago, I published a post titled ‘Some everyday data tasks: a few hints with R’. In that post, I considered four data tasks, that we all need to accomplish daily, i.e. subsetting sorting casting melting In that post, I used the methods I was more familiar with. And, as a long-time R user, I have mainly incorporated in my workflow all the functions from the base R implementation....

Nonlinear combinations of model parameters in regression

Published at January 9, 2020 · 11 min read

Nonlinear regression plays an important role in my research and teaching activities. While I often use the ‘drm()’ function in the ‘drc’ package for my research work, I tend to prefer the ‘nls()’ function for teaching purposes, mainly because, in my opinion, the transition from linear models to nonlinear models is smoother, for beginners. One problem with ‘nls()’ is that, in contrast to ‘drm()’, it is not specifically tailored to the needs of biologists or students in biology....

Fitting 'complex' mixed models with 'nlme': Example #2

Published at September 13, 2019 · 9 min read

A repeated split-plot experiment with heteroscedastic errors Let’s imagine a field experiment, where different genotypes of khorasan wheat are to be compared under different nitrogen (N) fertilisation systems. Genotypes require bigger plots, with respect to fertilisation treatments and, therefore, the most convenient choice would be to lay-out the experiment as a split-plot, in a randomised complete block design. Genotypes would be randomly allocated to main plots, while fertilisation systems would be randomly allocated to sub-plots....

Testing for interactions in nonlinear regression

Published at September 13, 2019 · 11 min read

Fitting ‘complex’ mixed models with ‘nlme’: Example #4 Factorial experiments are very common in agriculture and they are usually laid down to test for the significance of interactions between experimental factors. For example, genotype assessments may be performed at two different nitrogen fertilisation levels (e.g. high and low) to understand whether the ranking of genotypes depends on nutrient availability. For those of you who are not very much into agriculture, I will only say that such an assessment is relevant, because we need to know whether we can recommend the same genotypes, e....

Fitting 'complex' mixed models with 'nlme'. Example #1

Published at August 20, 2019 · 9 min read

The environmental variance model Fitting mixed models has become very common in biology and recent developments involve the manipulation of the variance-covariance matrix for random effects and residuals. To the best of my knowledge, within the frame of frequentist methods, the only freeware solution in R should be based on the ‘nlme’ package, as the ‘lmer’ package does not easily permit such manipulations. The ‘nlme’ package is fully described in Pinheiro and Bates (2000)....

Germination data and time-to-event methods: comparing germination curves

Published at July 20, 2019 · 11 min read

Very often, seed scientists need to compare the germination behaviour of different seed populations, e.g., different plant species, or one single plant species submitted to different temperatures, light conditions, priming treatments and so on. How should such a comparison be performed? Let’s take a practical approach and start from an appropriate example: a few years ago, some collegues studied the germination behaviour for seeds of a plant species (Verbascum arcturus, BTW…), in different conditions....

Survival analysis and germination data: an overlooked connection

Published at July 2, 2019 · 16 min read

The background Seed germination data describe the time until an event of interest occurs. In this sense, they are very similar to survival data, apart from the fact that we deal with a different (and less sad) event: germination instead of death. But, seed germination data are also similar to failure-time data, phenological data, time-to-remission data… the first point is: germination data are time-to-event data. You may wonder: what’s the matter with time-to-event data?...

Stabilising transformations: how do I present my results?

Published at June 15, 2019 · 5 min read

ANOVA is routinely used in applied biology for data analyses, although, in some instances, the basic assumptions of normality and homoscedasticity of residuals do not hold. In those instances, most biologists would be inclined to adopt some sort of stabilising transformations (logarithm, square root, arcsin square root…), prior to ANOVA. Yes, there might be more advanced and elegant solutions, but stabilising transformations are suggested in most traditional biometry books, they are very straightforward to apply and they do not require any specific statistical software....

Genotype experiments: fitting a stability variance model with R

Published at June 6, 2019 · 8 min read

Yield stability is a fundamental aspect for the selection of crop genotypes. The definition of stability is rather complex (see, for example, Annichiarico, 2002); in simple terms, the yield is stable when it does not change much from one environment to the other. It is an important trait, that helps farmers to maintain a good income in most years. Agronomists and plant breeders are continuosly concerned with the assessment of genotype stability; this is accomplished by planning genotype experiments, where a number of genotypes is compared on randomised complete block designs, with three to five replicates....

How do we combine errors, in biology? The delta method

Published at May 25, 2019 · 7 min read

In a recent post I have shown that we can build linear combinations of model parameters (see here ). For example, if we have two parameter estimates, say Q and W, with standard errors respectively equal to $\sigma_Q$ and $\sigma_W$, we can build a linear combination as follows: \[Z = AQ + BW + C\] where A, B and C are three coefficients. The standard error for this combination can be obtained as:...

Dealing with correlation in designed field experiments: part II

Published at May 10, 2019 · 12 min read

With field experiments, studying the correlation between the observed traits may not be an easy task. Indeed, in these experiments, subjects are not independent, but they are grouped by treatment factors (e.g., genotypes or weed control methods) or by blocking factors (e.g., blocks, plots, main-plots). I have dealt with this problem in a previous post and I gave a solution based on traditional methods of data analyses. In a recent paper, Piepho (2018) proposed a more advanced solution based on mixed models....

Dealing with correlation in designed field experiments: part I

Published at April 30, 2019 · 7 min read

Observations are grouped When we have recorded two traits in different subjects, we can be interested in describing their joint variability, by using the Pearson’s correlation coefficient. That’s ok, altough we have to respect some basic assumptions (e.g. linearity) that have been detailed elsewhere (see here). Problems may arise when we need to test the hypothesis that the correlation coefficient is equal to 0. In this case, we need to make sure that all the couples of observations are taken on independent subjects....

How do we combine errors? The linear case

Published at April 15, 2019 · 7 min read

In our research work, we usually fit models to experimental data. Our aim is to estimate some biologically relevant parameters, together with their standard errors. Very often, these parameters are interesting in themselves, as they represent means, differences, rates or other important descriptors. In other cases, we use those estimates to derive further indices, by way of some appropriate calculations. For example, think that we have two parameter estimates, say Q and W, with standard errors respectively equal to $\sigma_Q$ and $\sigma_W$: it might be relevant to calculate the amount:...

Some everyday data tasks: a few hints with R

Published at March 27, 2019 · 9 min read

We all work with data frames and it is important that we know how we can reshape them, as necessary to meet our needs. I think that there are, at least, four routine tasks that we need to be able to accomplish: subsetting sorting casting melting Obviously, there is a wide array of possibilities; I’ll just mention a few, which I regularly use. Subsetting the data Subsetting means selecting the records (rows) or the variables (columns) which satisfy certain criteria....

Drowning in a glass of water: variance-covariance and correlation matrices

Published at February 19, 2019 · 3 min read

One of the easiest tasks in R is to get correlations between each pair of variables in a dataset. As an example, let’s take the first four columns in the ‘mtcars’ dataset, that is available within R. Getting the variances-covariances and the correlations is straightforward. data(mtcars) matr <- mtcars[,1:4] #Covariances cov(matr) ## mpg cyl disp hp ## mpg 36.324103 -9.172379 -633.0972 -320.7321 ## cyl -9.172379 3.189516 199.6603 101.9315 ## disp -633....

Going back to the basics: the correlation coefficient

Published at February 7, 2019 · 7 min read

A measure of joint variability In statistics, dependence or association is any statistical relationship, whether causal or not, between two random variables or bivariate data. It is often measured by the Pearson correlation coefficient: $$\rho _{X,Y} =\textrm{corr} (X,Y) = \frac {\textrm{cov}(X,Y) }{ \sigma_X \sigma_Y } = \frac{ \sum_{1 = 1}^n [(X - \mu_X)(Y - \mu_Y)] }{ \sigma_X \sigma_Y }$$ Other measures of correlation can be thought of, such as the Spearman $\rho$ rank correlation coefficient or Kendall $\tau$ rank correlation coefficient....

My first experience with blogdown

Published at November 15, 2018 · 1 min read

This is my first day at work with blogdown. I must admit it is pretty overwhelming at the beginning … I thought that it might be useful to write down a few notes, to summarise my steps ahead, during the learning process. I do not work with blogdown everyday and I tend to forget things quite easily. Therefore, these notes may help me recap how far I have come. And they might also help other beginners, to speed up their initial steps with such a powerful blogging platform....

Is R dangerous? Side effects of free software for biologists

Published at June 8, 2014 · 3 min read

When I started my career in the biological field (it’s already 25 years ago), only the luckiest of us had access to very advanced statistical software. Licenses were very expensive and it was not easy to convince the boss that they were really necessary: “Why do you need to spend so much money to perform an ANOVA?”. Indeed, simple one-way or two-ways ANOVAs were quite easy to perform and one of the people in my group had already built the appropriate routines for several designs, by using the GW-BASIC language....

#R

Tags

Recent posts

Plotting weather data with ggplot()

Here is why I don't care about the Levene's test

Pairwise comparisons in nonlinear regression

Regression analyses with common checks in pesticide research

Factorial designs with check in pesticide research

Archives