#Correlation

The coefficient of determination: is it the R-squared or r-squared?

Published at November 26, 2022 ·  9 min read

We often use the coefficient of determination as a swift ‘measure’ of goodness of fit for our regression models. Unfortunately, there is no unique symbol for such a coefficient and both \(R^2\) and \(r^2\) are used in literature, almost interchangeably. Such an interchangeability is also endorsed by the Wikipedia (see at: https://en.wikipedia.org/wiki/Coefficient_of_determination ), where both symbols are reported as the abbreviations for this statistical index.

As an editor of several International Journals, I should not agree with such an approach; indeed, the two symbols \(R^2\) and \(r^2\) mean two different things, and they are not necessarily interchangeable, because, depending on the setting, either of the two may be wrong or ambiguous. Let’s pay a little attention to such an issue.

...


Dealing with correlation in designed field experiments: part II

Published at May 10, 2019 ·  12 min read

With field experiments, studying the correlation between the observed traits may not be an easy task. Indeed, in these experiments, subjects are not independent, but they are grouped by treatment factors (e.g., genotypes or weed control methods) or by blocking factors (e.g., blocks, plots, main-plots). I have dealt with this problem in a previous post and I gave a solution based on traditional methods of data analyses.

In a recent paper, Piepho (2018) proposed a more advanced solution based on mixed models. He presented four examplary datasets and gave SAS code to analyse them, based on PROC MIXED. I was very interested in those examples, but I do not use SAS. Therefore, I tried to ‘transport’ the models in R, which turned out to be a difficult task. After struggling for awhile with several mixed model packages, I came to an acceptable solution, which I would like to share.

...


Dealing with correlation in designed field experiments: part I

Published at April 30, 2019 ·  7 min read

Observations are grouped

When we have recorded two traits in different subjects, we can be interested in describing their joint variability, by using the Pearson’s correlation coefficient. That’s ok, altough we have to respect some basic assumptions (e.g. linearity) that have been detailed elsewhere (see here). Problems may arise when we need to test the hypothesis that the correlation coefficient is equal to 0. In this case, we need to make sure that all the couples of observations are taken on independent subjects.

...


Drowning in a glass of water: variance-covariance and correlation matrices

Published at February 19, 2019 ·  3 min read

One of the easiest tasks in R is to get correlations between each pair of variables in a dataset. As an example, let’s take the first four columns in the ‘mtcars’ dataset, that is available within R. Getting the variances-covariances and the correlations is straightforward.

data(mtcars)
matr <- mtcars[,1:4]

#Covariances
cov(matr)
##              mpg        cyl       disp        hp
## mpg    36.324103  -9.172379  -633.0972 -320.7321
## cyl    -9.172379   3.189516   199.6603  101.9315
## disp -633.097208 199.660282 15360.7998 6721.1587
## hp   -320.732056 101.931452  6721.1587 4700.8669
#Correlations
cor(matr)
##             mpg        cyl       disp         hp
## mpg   1.0000000 -0.8521620 -0.8475514 -0.7761684
## cyl  -0.8521620  1.0000000  0.9020329  0.8324475
## disp -0.8475514  0.9020329  1.0000000  0.7909486
## hp   -0.7761684  0.8324475  0.7909486  1.0000000

It’s really a piece of cake! Unfortunately, a few days ago I had a covariance matrix without the original dataset and I wanted the corresponding correlation matrix. Although this is an easy task as well, at first I was stuck, because I could not find an immediate solution… So I started wondering how I could make it.

...


Going back to the basics: the correlation coefficient

Published at February 7, 2019 ·  7 min read

A measure of joint variability

In statistics, dependence or association is any statistical relationship, whether causal or not, between two random variables or bivariate data. It is often measured by the Pearson correlation coefficient:

\[\rho _{X,Y} =\textrm{corr} (X,Y) = \frac {\textrm{cov}(X,Y) }{ \sigma_X \sigma_Y } = \frac{ \sum_{1 = 1}^n [(X - \mu_X)(Y - \mu_Y)] }{ \sigma_X \sigma_Y }\]

Other measures of correlation can be thought of, such as the Spearman \(\rho\) rank correlation coefficient or Kendall \(\tau\) rank correlation coefficient.

...