Models are wrong

...but, some are useful (G. Box)!


From ''for()'' loops to the ''split-apply-combine'' paradigm for column-wise tasks: the transition for a dinosaur

Published at December 11, 2020 ·  9 min read

I have been involved with data crunching for 30 years, and, due to my age, I see myself as a dinosaur within the R-users community. I must admit, I’m rather slow to incorporate new paradigms in my programming workflow … I’m pretty busy and the time I save today is often more important than the time I could save in the future, by picking up new techniques. However, resisting to progress is not necessarily a good idea and, from time to time, also a dinosaur feels like living more dangerously and exploring new ideas and views....

Accounting for the experimental design in linear/nonlinear regression analyses

Published at December 4, 2020 ·  11 min read

In this post, I am going to talk about an issue that is often overlooked by agronomists and biologists. The point is that field experiments are very often laid down in blocks, using split-plot designs, strip-plot designs or other types of designs with grouping factors (blocks, main-plots, sub-plots). We know that these grouping factors should be appropriately accounted for in data analyses: ‘analyze them as you have randomized them’ is a common saying attributed to Ronald Fisher....

lmDiallel: a new R package to fit diallel models. The Hayman's model (type 1)

Published at November 26, 2020 ·  15 min read

In a previous post we have presented our new ‘lmDiallel’ package (see this link here and see also the original paper in Theoretical and Applied Genetics). This package provides an extensions to fit a class of linear models of interest for plant breeders or geneticists, the so-called diallel models. In this post and other future posts we would like to present some examples of how to use this package: please, sit back and relax and, if you have comments, let us know, using the email link at the bottom of this post....

lmDiallel: a new R package to fit diallel models. Introduction

Published at November 11, 2020 ·  7 min read

Together with some colleagues from the plant breeding group, we have just published a new paper, where we presented a bunch of R functions to analyse the data from diallel experiments. The paper is titled ‘Linear models for diallel crosses: a review with R functions’ and it is published in the ‘Theoretical and Applied Genetics’ Journal. If you are interested, you can take a look here at this link....

QQ-plots and Box-Whisker plots: where do they come from?

Published at October 15, 2020 ·  7 min read

For the most curious students QQ-plots and Box-Whisker plots usually become part of the statistical toolbox for the students attending my course of ‘Experimental methods in agriculture’. Most of them learn that the QQ-plot can be used to check for the basic assumption of gaussian residuals in linear models and that the Box-Whisker plot can be used to describe the experimental groups, when their size is big enough and we do not want to assume a gaussian distribution....

Building ANOVA-models for long-term experiments in agriculture

Published at August 20, 2020 ·  29 min read

This is the follow-up of a manuscript that we (some colleagues and I) have published in 2016 in the European Journal of Agronomy (Onofri et al., 2016). I thought that it might be a good idea to rework some concepts to make them less formal, simpler to follow and more closely related to the implementation with R. Please, be patient: this lesson may be longer than usual. What are long-term experiments?...

Fitting complex mixed models with nlme. Example #5

Published at June 5, 2020 ·  14 min read

A Joint Regression model Let’s talk about a very old, but, nonetheless, useful technique. It is widely known that the yield of a genotype in different environments depends on environmental covariates, such as the amount of rainfall in some critical periods of time. Apart from rain, also temperature, wind, solar radiation, air humidity and soil characteristics may concur to characterise a certain environment as good or bad and, ultimately, to determine yield potential....

AMMI analyses for GE interactions

Published at May 12, 2020 ·  19 min read

The CoViD-19 situation in Italy is little by little improving and I feel a bit more optimistic. It’s time for a new post! I will go back to a subject that is rather important for most agronomists, i.e. the selection of crop varieties. All farmers are perfectly aware that crop performances are affected both by the genotype and by the environment. These two effects are not purely additive and they often show a significant interaction....

Seed germination: fitting hydro-time models with R

Published at March 23, 2020 ·  17 min read

I am locked at home, due to the COVID-19 emergency in Italy. Luckily I am healthy, but there is not much to do, inside. I thought it might be nice to spend some time to talk about seed germination models and the connections with survival analysis. We all know that seeds need water to germinate. Indeed, the absorption of water activates the hydrolytic enzymes, which break down food resources stored in seeds and provide energy for germination....

A collection of self-starters for nonlinear regression in R

Published at February 26, 2020 ·  29 min read

Usually, the first step of every nonlinear regression analysis is to select the function \(f\), which best describes the phenomenon under study. The next step is to fit this function to the observed data, possibly by using some sort of nonlinear least squares algorithms. These algorithms are iterative, in the sense that they start from some initial values of model parameters and repeat a sequence of operations, which continuously improve the initial guesses, until the least squares solution is approximately reached....