Chapter 15 Exercises

This book was not intended to build a solid theoretical foundation in biometry, but it was mainly intended to give you the tools to organise experiments and analyse their results. Therefore, we propose a list of exercises and case studies, so that you can build some practical experience an this matter and evaluate how clear are the concepts exposed earlier in this book. The exercises are organised in sections and each section corresponds to one or more book chapters. In some cases you will need to enter small datasets in R, while, for the bigger datasets, we usually provide the related file in an external repository, so that you can load them in R, by using the appropriate function.

15.1 Designing experiments (ch. 1 to 2)

15.1.1 Exercise 1

You have been requested to lay-out a breeding experiment, with 16 wheat genotypes, coded by using letters of the Roman alphabet. The aim is to determine which genotype is the best in a given environment.

Write the experimental protocol, where you specify all the main elements of your project (subjects, variables, replicates, experimental design) and draw the field map.

15.1.2 Exercise 2

Describe the protocol of an experiment to determine the effect of sowing date (autumn and spring) on seven faba bean genotypes. Include all possible elements to assess whether the experiment is valid, describe the type of design and include the field map, showing all relevant information (including plot sizes and orientation in space). What type of check would you add (if any)? Motivate all your choices.

15.1.3 Exercise 3

Describe the protocol of an experiment to determine the effect of nitrogen dose on several wheat genotypes. Include all possible elements to assess whether the experiment is valid, describe the type of design and include the field map, showing all relevant information (including plot sizes and orientation in space). Motivate all your choices.


15.2 Describing the observations (ch. 3)

15.2.1 Exercise 1

A chemical analysis was performed in triplicate, with the following results: 125, 169 and 142 ng/g. Calculate mean, sum of squares, mean square, standard deviation and coefficient of variation. What is a correct way to display the result?

15.2.2 Exercise 2

Consider the Excel file ‘rimsulfuron.csv’ from https://www.casaonofri.it/_datasets/rimsulfuron.csv (you can either download it, or read it directly from the web repository). This is a dataset relating to a field experiment to compare 14 herbicides and two untreated checks, with 4 replicates per treatment. The response variables are maize yield and weed coverage. Describe the dataset and show the results on a barplot, including some measure of variability. Check whether yield correlates to weed coverage and comment on the results.

15.2.3 Exercise 3

Load the csv file ‘students.csv’ from https://www.casaonofri.it/_datasets/students.csv. This dataset relates to a number of students, their votes in several undergraduate exams and information on high school. Determine: (i) the absolute and relative frequencies for the different subjects; (ii) the frequency distribution of votes in three classes (bins): <24, 24-27, >27; (iii) whether the votes depend on the exam subject and (iv) whether the votes depend on the high school type.


15.3 Modeling the experimental data (ch. 4)

15.3.1 Exercise 1

A xenobiotic substance degrades in soil following a first-order kinetic, which is described by the following equation:

\[Y = 100 \, e^{-0.07 \, t}\]

where Y is the concentration at time \(t\). After spraying this substance in soil, what is the probability that 50 days later we observe a concentration below the toxicity threshold for mammalians (2 ng/g)? Please, consider that all the unknown sources of experimental error can be regarded as gaussian, with a coefficient of variability equal to 20%.

15.3.2 Exercise 2

Crop yield is a function of its density, according to the following function:

\[ Y = 0.8 + 0.8 \, X - 0.07 \, X^2\]

Draw the graph and find the required density to obtain the highest yield (use a simple graphical method). What is the probability of obtaining a yield level between 2.5 and 3 t/ha, by using the optimal density? Consider that random variability is 12%.

15.3.3 Exercise 3

The toxicity of a compound changes with the dose, according to the following expression:

\[ Y = \frac{1}{1 + exp\left\{ -2 \, \left[log(X) - log(15)\right] \right\}}\]

where \(Y\) is the proportion of dead animals and \(X\) is the dose. If we treat 150 animals with a dose of 35 g, what is the probability of finding more than 120 dead animals? The individual variability can be approximated by using a gaussian distribution, with a standard error equal to 10.

15.3.4 Exercise 4

Consider the sample C = [140 - 170 - 155], which was drawn by a gaussian distribution. Calculate the probability of drawing an individual value from the same pupulation in the following intervals:

  1. higher than 170
  2. lower than 140
  3. within the range from 170 and 140

15.3.5 Exercise 5

Reproduce the possible results of a genotype experiment, with five maize genotypes (A, B, C, D and E) and expected values of, respectively, 12, 13, 12.5, 14 and 11 tons per hectare. Assume that the experimental (random) variability can be described by a gaussian distribution, with mean equal to 0 and standard deviation equal to 1.25 (common value for all genotypes). The experiment is designed as completely randomised, with four replicates.

15.3.6 Exercise 6

Consider the relationship between crop yield and density, as shown in Exercise 2 (\(Y = 0.8 + 0.8 \, X - 0.07 \, X^2\)). Reproduce the results of a completely randomised (four replicates) sowing density experiment, with five densities (2, 4, 6, 8 and 10 plants per square meter), considering that the experimental (random) variability can be described by a gaussian distribution, with mean equal to 0 and standard deviation equal to 0.25 (common value for all densities).


15.4 Interval estimation of model parameters (ch. 5)

15.4.1 Exercise 1

A chemical analysis was repeated three times, with the following results: 125, 169 and 142 ng/g. Calculate mean, deviance, variance, standard deviation, standard error and confidence intervals (P = 0.95 and P = 0.99).

15.4.2 Exercise 2

An experiment was carried out, comparing the yield of four wheat genotypes (in tons per hectar). The results are as follows:

Genotype Rep-1 Rep-2 Rep-3 Rep4
A 4.72 5.45 5.13 5.19
B 6.29 6.79 7.55 5.86
C 5.54 4.44 5.16 5.92
D 6.68 6.30 6.70 7.77

For each genotype, calculate the mean, deviance, variance, standard deviation, standard error and confidence interval (P = 0.95).

15.4.3 Exercise 3

We have measured the length of 30 maize seedlings, treated with selenium in water solution. The observed lengths are:

length <- c(2.07, 2.23, 2.04, 2.16, 2.12, 2.33, 2.21, 2.22, 2.29, 2.28, 
2.44, 2.04, 2.02, 1.49, 2.12, 2.38, 2.51, 2.27, 2.55, 2.44, 2.28, 
2.2, 2.03, 2.35, 2.34, 2.34, 1.99, 2.44, 2.44, 1.91)

For the above sample, calculate the mean, deviance, variance, standard deviation, standard error and confidence interval (P = 0.95).

15.4.4 Exercise 4

A sample of 400 insects was sprayed with an insecticide and 136 individuals survived the treatment. Determine the efficacy of the insecticide, in terms of proportion of dead insects, together with 95% confidence limits.


15.5 Making decisions under uncertainty (ch. 6)

15.5.1 Exercise 1

We have compared two herbicides for weed control in maize. With the first herbicide (A), we observed the following weed coverings: 9.3, 10.2, 9.7 %. With the second herbicide, we observed: 12.6, 12.3 and 12.5 %. Are the means for the two herbicides significantly different (\(\alpha = 0.05\))?

15.5.2 Exercise 2

We have made an experiment to compare two fungicides A and B. The first fungicide was used to treat 200 fungi colonies and the number of surviving colonies was 180. B was used to treat 100 colonies and 50 of those survived. Is there a significant difference between the efficiacies of A and B (\(\alpha = 0.05\))?

15.5.3 Exercise 3

A plant pathologist studied the crop performances with (A) and without (NT) a fungicide treatment. The results (yield in tons per hectar) are as follows:

A NT
65 54
71 51
68 59

Was the treatment effect significant (\(\alpha = 0.05\))?

15.5.4 Exercise 4

In this year, an assay showed that 600 olive drupes out of 750 were attacked by Daucus olee. In a close field, under the same environmental conditions, the count of attacked drupes was 120 on 750. Is the the observed difference statistically significant (\(\alpha = 0.05\)) or is it just due to random fluctuation?

15.5.5 Exercise 5

In a hospital, blood cholesterol level was measured for eight patients, before and after a three months terapy. The observed values were:

Patient Before After
1 167.3 126.7
2 186.7 154.2
3 105.0 107.9
4 214.5 209.3
5 148.5 138.5
6 171.5 121.3
7 161.5 112.4
8 243.6 190.5

Can we say that this terapy is effective, or (\(\alpha = 0.05\))?

15.5.6 Exercise 6

A plant breeder organised an experiment to compare three wheat genotypes, i.e. GUERCINO, ARNOVA and BOLOGNA, according to a completely randomised design with 10 replicates. The observed yields are:

guercino arnova bologna
53.2 53.1 43.5
59.1 51.0 41.0
62.3 51.9 41.2
48.6 55.3 44.8
59.7 58.8 40.2
60.0 54.6 37.2
55.7 53.0 45.3
55.8 51.4 38.9
55.7 51.7 42.9
54.4 64.7 39.3
  1. Describe the three samples, by using the appropriate statistics of central tendency and spread
  2. Infere the means of the pupulations from where the samples were drawn
  3. For each of the three possible couples (GUERCINO vs ARNOVA, GUERCINO vs BOLOGNA and ARNOVA vs BOLOGNA), test the hypothesis that the two means are significantly different (\(\alpha = 0.05\)).

15.5.7 Exercise 7

A botanist counted the number of germinated seeds for oilseed rape at two different temperatures (15 and 25°C). At 15°C, 358 germinations were counted out of 400 seeds. At 25°C, 286 germinations were counted out of 380 seeds.

  1. Describe the proportions of germination for the three samples
  2. Infere the proportion of germinated seeds in the two populations, from where the samples of seeds were extracted (remember that the variance for a proportion is calculated as \(p \times (1- p)\)).
  3. Test the hypothesis that temperature had a significant effect on the germinability of oilseed rape seeds.

15.6 Fitting models to data from agriculture experiments

In the following sections we include several case studies that imply a process of model fitting. In most of the cases, these datasets are taken from real experiments and your main aim should be to learn something from those datasets, by asking the right questions. Therefore, do not limit yourself to producing the right statistics and writing the correct R coding, but try to use the statistical tools you have built up to obtain the answers for your questions.

Please, follow the workplan outlined below.

  1. Load the data and make the necessary transformations.
  2. Describe the data, by calculating, at least, the means and standard deviations for the experimental groups. This is usually called ‘Initial Data Analysis’ (IDA) and it is meant to get an idea about the main traits of the data at hand.
  3. Specify the model, explain its components and fit the model into the data.
  4. Check the model for the basic assumptions and, if necessary, adopt the appropriate correcting measures and re-fit the model
  5. Test the significance of all effects, by using the appropriate variance partitioning.
  6. If it is appropriate, compare the means for the most relevant effects.
  7. Present the results and comment on them

The datasets for the following cases studies are bigger than the ones you met so far and you may not like to enter all the data in R. Therefore, we put all the datasets at your disposal in an Excel file, which you can download from the following link https://www.casaonofri.it/_datasets/BookExercises.xlsx. Each dataset is in a different sheet and the sheet names are given in each exercise, so that you can load them by using the ‘readxl()’ function.

In order to ease you mind, we provide a summary table with the models described in this book and the R coding to fit them.

Table 15.1: Summary of main models to describe the results of experiments in agriculture
Model Design R.function Specification
One-way ANOVA CRD lm() Y ~ F1
One-way ANOVA CRBD lm() Y ~ F1 + BL
Two-way ANOVA CRD lm() Y ~ F1 * F2
Two-way ANOVA CRBD lm() Y ~ F1 * F2 + BL
Two-way ANOVA Split-plot CRD lmer() Y ~ F1 * F2 + (1|MAIN)
Two-way ANOVA Split-plot CRBD lmer() Y ~ F1 * F2 + BL + (1|MAIN)
Two-way ANOVA Strip-plot CRD lmer() Y ~ F1 * F2 + (1|ROW) + (1|COL)
Two-way ANOVA Strip-plot CRBD lmer() Y ~ F1 * F2 + BL + (1|ROW + (1|COL)
One-way ANOVA One-way CRD, two environments lm()/lmer() Y ~ F1 * ENV
One-way ANOVA One-way CRBD, two environments lm()/lmer() Y ~ F1 * ENV + BL|ENV
Simple Linear Regression CRD lm() Y ~ X1
Simple Linear Regression CRBD lm() Y ~ BL + X1

In the table above, Y is the response variable, that is always continuous/discrete, F1 and F2 are the names of two experimental factors (nominal variables), while X1 is the name of a covariate (continuous variable). BL is the block variable (factor), ENV is the environment variable (factor) and MAIN, ROW, COL are, respectively, the variables (factors) that represent the main plots in a split-plot design and the rows/columns in a strip-plot design.

15.7 One-way ANOVA models (ch. 7 to 9)

15.7.1 Exercise 1

An experiment was conducted with a completely randomised design to compare the yield of 5 wheat genotypes. The results (in bushels per acre) are as follows:

Variety 1 2 3
A 32.4 34.3 37.3
B 20.2 27.5 25.9
C 29.2 27.8 30.2
D 12.8 12.3 14.8
E 21.7 24.5 23.4

The example is taken from: Le Clerg et al. (1962).

[Sheet: 7.1]

15.7.2 Exercise 2

Cell cultures of tomato were grown by using three types of media, based on glucose, fructose and sucrose. The experiment was conducted with a completely randomised design with 5 replicates and a control was also added to the design. Cell growths are reported in the table below:

Control Glucose Fructose Sucrose
45 25 28 31
39 28 31 37
40 30 24 35
45 29 28 33
42 33 27 34

[Sheet: 7.2]

15.7.3 Exercise 3

The failure time for a heating system was assessed, to discover the effect of the operating temperature. Four temperatures were tested with 6 replicates, according to a completely randomised design and the number of hours before failure were measured. The results are as follows:

Temp. Hours to failure
1520 1953
1520 2135
1520 2471
1520 4727
1520 6134
1520 6314
1620 1190
1620 1286
1620 1550
1620 2125
1620 2557
1620 2845
1660 651
1660 837
1660 848
1660 1038
1660 1361
1660 1543
1708 511
1708 651
1708 651
1708 652
1708 688
1708 729

Regard the temperature as a factor and determine the best operating temperature, in order to delay failure.

[Sheet: 7.3]

15.7.4 Exercise 4

An entomologist counted the number of eggs laid from a lepidopter on three tobacco genotypes. 15 females were tested for each genotype and the results are as follows:

Female Field Resistant USDA
1 211 0 448
2 276 9 906
3 415 143 28
4 787 1 277
5 18 26 634
6 118 127 48
7 1 161 369
8 151 294 137
9 0 0 29
10 253 348 522
11 61 0 319
12 0 14 242
13 275 21 261
14 0 0 566
15 153 218 734

Which is the most resistant genotype?
[Sheet: 7.4]


15.8 Multi-way ANOVA models (ch. 10)

15.8.1 Exercise 1

Data were collected about 5 types of irrigation on orange trees in Spain. The experiment was laid down as complete randomised blocks with 5 replicates and the results are as follows:

Method 1 2 3 4 5
Localised 438 413 375 127 320
Surface 413 398 348 112 297
Sprinkler 346 334 281 43 231
Sprinkler + localised 335 321 267 33 219
Submersion 403 380 336 101 293

[Sheet: 10.1]

15.8.2 Exercise 2

A fertilisation trial was conducted according to a randomised complete block design with five replicates. One value is missing for the second treatment in the fifth block. The observed data are percentage contents in P2 O5 in leaf samples:

Treatment 1 2 3 4 5
Unfertilised 5.6 6.1 5.3 5.9 9.4
50 lb N 7.3 6.0 7.7 7.7 NA
100 lb N 6.9 6.0 5.6 7.4 8.2
50 lb N + 75 lb P2O5 10.8 11.2 8.8 10.4 12.9
100 lb N + 75 lb P205 9.6 9.3 12 10.6 11.6

Is the addition of P2 O5 a convenient practice, in terms of agronomic effect?

[Sheet: 10.2]

15.8.3 Exercise 3

A latin square experiment was planned to assess effect of four different fertilisers on lettuce yield. The observed data are as follows:

Fertiliser Row Column Yield
A 1 1 104
B 1 2 114
C 1 3 90
D 1 4 140
A 2 4 134
B 2 3 130
C 2 1 144
D 2 2 174
A 3 3 146
B 3 4 142
C 3 2 152
D 3 1 156
A 4 2 147
B 4 1 160
C 4 4 160
D 4 3 163

What is the best fertiliser?

[Sheet: 10.3]


15.9 Multi-way ANOVA models with interactions (ch. 11 and 13)

Some of the following datasets were obtained by experiments designed as split-plots or strip-plots (see Chapter 2); please, note that, in practice, disregarding the experimental design during data analysis is not admissible! If you have not yet read Chapter 13, you can still analyse these datasets by paying attention to the following issues.

For split-plot and strip-plot designs, we need to use the ‘lmer()’ function and, thus, we need to install and load the ‘lme4’ and ‘lmerTest’ packages.

Before fitting the models, we need to uniquely identify the main-plots (for split-plot designs) and the rows/columns (for strip-plot designs). The main plots can be uniquely identified by crossing the block and main plot factor variables, as in the example below, with the ‘Tillage’ and ‘Block’ variables, in the ‘dataset’ data frame.

dataset$mainPlot <- with(dataset, factor(Block:Tillage))

For strip-plot designs, the rows and columns can be uniquely identified by crossing the block and each factor variables, as in the example below, with the ‘Crop/Herbicide’ and ‘Block’ variables, in the ‘dataset’ data frame.

dataset$Rows <- factor(dataset$Crop:dataset$Block)
dataset$Columns <- factor(dataset$Herbicide:dataset$Block)

The code for fitting the models is reported in the table 15.1. The ‘plot()’ method only returns the plot of ‘residuals against expected values’ and the ‘which’ argument does not work. Thus, do not perform the check for the normality of residuals.

15.9.1 Exercise 1

A pot experiment was planned to evaluate the best timing for herbicide application against rhizome Sorghum halepense. Five timings were compared (2-3, 4-5, 6-7 and 8-9 leaves), including a splitted treatment in two timings (3-4/8-9 leaves) and the untreated control. In order to understand whether the application is effective against plants coming from rhizomes of different sizes, a second factor was included in the experiment, i.e. rhizome size (2, 4, six nodes). The design was a fully crossed two-way factorial, laid down as completely randomised with four replicates. The results (plant weights three weeks after the herbicide application) are as follows:

Sizes / Timings 2-3 4-5 6-7 8-9 3-4/8-9 Untreated
2-nodes 34.03 0.10 30.91 33.21 2.89 41.63
22.31 6.08 35.34 43.44 19.06 22.96
21.70 3.73 24.23 44.06 0.10 52.14
14.90 9.15 28.27 35.34 0.68 59.81
4-nodes 42.19 14.86 52.34 39.06 8.62 68.15
51.06 36.03 43.17 61.59 0.05 42.75
43.77 21.85 57.28 48.89 0.10 57.77
31.74 8.71 29.71 49.14 9.65 44.85
6-nodes 20.84 11.37 55.00 41.77 9.80 43.20
26.12 2.24 28.46 37.38 0.10 40.68
35.24 14.17 21.81 39.55 1.42 34.11
13.32 23.93 60.72 48.37 6.83 32.21

In which timing the herbicide is most effective?

[Sheet: 11.1]

15.9.2 Exercise 2

Six faba bean genotypes were tested in two sowing times, according to a split-plot design in 4 complete blocks. Sowing times were randomised to main-plots within blocks and genotypes were randomised to sub-plots within main-plots and blocks. Results are:

Sowing Time Genotype 1 2 3 4
Autum Chiaro 4.36 4.00 4.23 3.83
Collameno 3.01 3.32 3.27 3.40
Palombino 3.85 3.85 3.68 3.98
Scuro 4.97 3.98 4.39 4.14
Sicania 4.38 4.01 3.94 2.99
Vesuvio 3.94 4.47 3.93 4.21
Spring Chiaro 2.76 2.64 2.25 2.38
Collameno 2.50 1.79 1.57 1.77
Palombino 2.24 2.21 2.50 2.05
Scuro 3.45 2.94 3.12 2.69
Sicania 3.24 3.60 3.16 3.08
Vesuvio 2.34 2.44 1.71 2.00

What is the best genotype for autumn sowing?

[Sheet: 11.2]

15.9.3 Exercise 3

Four crops were sown in soil 20 days after the application of three herbicide treatments, in order to evaluate possible carry-over effects of residuals. The untreated control was also added for comparison and the weight of plants was assessed four weeks after sowing. The experiment was laid down as strip-plot and, within each block, the herbicide were randomised to rows and crops to columns. The weight of plants is reported below:

Herbidicide Block sorghum rape soyabean sunflower
Untreated 1 180 157 199 201
2 236 111 257 358
3 287 217 346 435
4 350 170 211 327
Imazethapyr 1 47 10 193 51
2 43 1 113 4
3 0 20 187 13
4 3 21 122 15
primisulfuron 1 271 8 335 379
2 182 0 201 201
3 283 22 206 307
4 147 24 240 337
rimsulfuron 1 403 238 226 290
2 227 169 195 494
3 400 364 257 397
4 171 134 137 180

What crops could be safely sown 20 days after the application of imazethapyr, primisulfuron and rimsulfuron?

[Sheet: 11.3]

15.9.4 Exercise 4

A field experiment was conducted to evaluate the effect of fertilisation timing (early, medium, late) on two genotypes. The experiment was designed as a randomised complete block design and the data represent the amount of absorbed nitrogen by the plant:

Genotype Block Early Med Late
A 1 21.4 50.8 53.2
2 11.3 42.7 44.8
3 34.9 61.8 57.8
B 1 54.8 56.9 57.7
2 47.9 46.8 54.0
3 40.1 57.9 62.0

What is the best genotype? What is the best fertilisation timing? Do these two factors interact and how?

[Sheet: 11.4]

15.9.5 Exercise 5

A study was carried out to evaluate the effect of washing temperature on the reduction of length for four types of fabric. Results are expressed as percentage reduction and the experiment was completely randomised, with two replicates:

Fabric 210 °F 215 °F 220 °F 225 °F
A 1.8 2.0 4.6 7.5
2.1 2.1 5.0 7.9
B 2.2 4.2 5.4 9.8
2.4 4.0 5.6 9.2
C 2.8 4.4 8.7 13.2
3.2 4.8 8.4 13.0
D 3.2 3.3 5.7 10.9
3.6 3.5 5.8 11.1

Consider the temperature as a factor and answer the following questions:

  1. What is the best fabric, in terms of tolerance to high temperatures? What is the highest safe temperature, for each fabric?

[Sheet: 11.5]

15.9.6 Exercise 6

A chemical process requires one alcohol and one base. A study is organised to evaluate the factorial combinations of three alcohols and two bases on the efficiency of the process, expressed as a percentage. The experiment is designed as completely randomised.

Base Alcohol 1 Alcohol 2 Alcohol 3
A 91.3 89.9 89.3
88.1 89.5 87.6
90.7 91.4 90.4
91.4 88.3 90.3
B 87.3 89.4 92.3
91.5 93.1 90.7
91.5 88.3 90.6
94.7 91.5 89.8

What is the combination that gives the highest efficiency?

[Sheet: 11.6]


15.10 Simple linear regression (ch. 12)

When the predictor is a quantitative variable, we may be interested to check for the existence of a possible functional relationship, in the form of some sort of dose-response curve. The simplest regression model is the ‘straight line’, with equation \(Y = b_0 + b_1 \, X\); fitting this model implies finding maximum likelihood values for \(b_0\) and \(b_1\), so that the resulting line passes as close as possible to the observed values. When we fit a regression model, the workplan at Section 15.6 is changed at step 4 (model checking); indeed, apart from the usual check for the homoscedasticity and normality of residuals, we have to check for possible lack of fit (i.e., the model sistematically deviates from the observed responses). This is done by plotting the observed values against the predictor and adding the regression line. This may be done by using the following code:

model <- lm(Y ~ X1, data = dataset) # fits the model
summary(model) # get model parameters and use them below
b0 <- 10 # value obtained from the previous step
b1 <- 0.5 # value obtained from the previous step
plot(Y ~ X1, data = dataset) # plot the observed values
curve(B0 + B1 * x, add = T) # 'x' must be used in small letter

Obviously, when we fit a regression model we do not need to do any multiple comparison testing: if we can prove that the response follows a certain continuous curve, every predictor value produces a more or less different response from any other predictor value.

Regression models can be fitted also to unreplicated data or to the group means. However, when we have the replicates, we have two possible models to fit. The first possibility is to disregard the quantitative nature of the independent variable, regard it as a factor (qualitative variable) and fit an ANOVA model, as we have seen in the previous sections. This first model is the best fitting one, because we do not put any constraint on the ‘shape’ of the response, matching exactly the mean responses for all treatment groups. The second possibility is to recognise that the predictor is a quantity (e.g., a dose) and fit a simple linear regression model; in this case, we constrain the response to be linear and continuously increase/decrease as the predictor increases/decreases. This second model is worse than the first one, because it will never match the mean responses for all treatment group, unless these means are perfectly aligned, which is almost impossible. However, it is also simpler because, regardless of the number of levels for the predictor, we always fit two parameters: an intercept and a slope. We can test the hypothesis of no lack of fit, by comparing the regression model and the ANOVA model by using the following code:

anova(model1, model2)

The null hypothesis is that there is no lack of fit and, thus, the fit of the two models is more or less similar. This null can be accepted if the P-level is higher than 0.05 and, in such a case, we favour the regression model over the ANOVA model, for its higher simplicity (Occam’s razor principle).

In the case of replicated experiments designed in blocks, we also have to add the block effect to the model, regarding the block variable as a factor (as usual). In this case the model is:

model <- lm(Y ~ BL + X1, data = dataset) # fits the model
summary(model)

and the ‘summary()’ method produces several intercepts (one per each block) and one slope value. In order to plot the model, we need to calculate the average intercept value, which can be done by using the ‘emmeans()’ function, as shown below:

emmeans(model, ~1, at = (X1 = 0))

15.10.1 Exercise 1

A study was conducted to evaluate the effect of nitrogen fertilisation in lettuce. The experiment is completely randomised with 4 replicates and the yield results are as follows:

N level B1 B2 B3 B4
0 124 114 109 124
50 134 120 114 134
100 146 132 122 146
150 157 150 140 163
200 163 156 156 171

What yield might be obtained by using 120 kg N ha-1?

[Sheet: 12.1]

15.10.2 Exercise 2

A study was conducted to evaluate the effect of increasing densities of a weed (Sinapis arvensis) on sunflower yield. The experiment was completely randomised and the observed results are:

density Rep yield
0 1 36.63
14 1 29.73
19 1 32.12
28 1 30.61
32 1 27.7
38 1 27.43
54 1 24.79
0 2 36.11
14 2 34.72
19 2 30.12
28 2 30.8
32 2 26.53
38 2 27.6
54 2 23.31
0 3 38.35
14 3 32.16
19 3 31.72
28 3 28.69
32 3 25.88
38 3 28.43
54 3 30.26
0 4 36.74
14 4 32.566
19 4 29.57
28 4 33.663
32 4 28.751
38 4 27.114
54 4 24.664

Assuming that the yield response is linear, parameterise the model, check the goodness of fit and find the economical threshold level of weed density, considering that the yield worths 150 Euros per ton and the herbicide treatment costs 40 Euros per hectar.

[Sheet: 12.2]


15.11 Nonlinear regression (ch. 14)

15.11.1 Exercise 1

Two soil samples were treated with two herbicides and put in a climatic chamber at 20°C. Sub-samples were collected from both samples in different times and the concentration of herbicide residues was measured. The results are as follows:

Time Herbicide A Herbicide B
0 100.00 100.00
10 50.00 60.00
20 25.00 40.00
30 15.00 23.00
40 7.00 19.00
50 3.50 11.00
60 2.00 5.10
70 1.00 3.00

Assuming that the degradation follows an exponential decay trend, determine the half-life for both herbicides.
[Sheet: 14.1]

15.11.2 Exercise 2

A microbial population grows exponentially over time. Considering the following data, determine the relative rate of growth, by fitting the exponential growth model.

Time Cells
0 2
10 3
20 5
30 9
40 17
50 39
60 94
70 201

How long does it take before we reach 100 cells?

[Sheet: 14.2]

15.11.3 Exercise 3

An experiment was conducted to determine the absorption of nitrogen by roots of Lemna minor in hydroponic colture. Results (N content) are the following:

Conc Rate
2.86 14.58
5.00 24.74
7.52 31.34
22.10 72.97
27.77 77.50
39.20 96.09
45.48 96.97
203.78 108.88

Use nonlinear least squares to estimate the parameters for the rectangular hyperbola (Michaelis-Menten model):

\[Y = \frac{a X} {b + X}\]

and make sure that model fit is good enough.

[Sheet: 14.3]

15.11.4 Exercise 4

An experiment was conducted to determine the yield of sunflower at increasing densities of a weed (Ammi majus). Based on the following results, parameterise a rectangular hyperbola (\(Y = (a \, X)/(b + X)\) and test for possible lack of fit. The results are:

Weed density Yield Loss (%)
0 0
23 17.9
31 21.6
39 26.9
61 29.5

[Sheet: 14.4]

15.11.5 Exercise 5

An experiment was conducted in a pasture, to determine the effect of sampling area on the number of plant species (in general, the higher the sampling area and the higher the number of sampled species). The results are as follows:.

Area N. of species
1 4
2 5
4 7
8 8
16 10
32 14
64 19
128 22
256 22

By using the above data, parameterise a power curve \(Y = a \, X^b\) and test for lack of fit.

[Sheet: 14.5]

15.11.6 Exercise 6

Crop growth can be often described by using a Gompertz model. The data below refer to an experiment were sugarbeet was grown either weed free, or weed infested; the weight of the crop per unit area was measured after six different numbers of Days After Emergence (DAE). The experiment was conducted by using a completely randomised design with three replicates and the results are reported below:

DAE Infested Weed Free
21 0.06 0.07
21 0.06 0.07
21 0.11 0.07
27 0.20 0.34
27 0.20 0.40
27 0.21 0.25
38 2.13 2.32
38 3.03 1.72
38 1.27 1.22
49 6.13 11.78
49 5.76 13.62
49 7.78 12.15
65 17.05 33.11
65 22.48 24.96
65 12.66 34.66
186 21.51 38.83
186 26.26 27.84
186 27.68 37.72

Parameterise two Gompertz growth models (one for the weed-free crop and one for the infested crop) and evalaute which of the parameters are most influenced by the competition. The Gompertz growth model is:

\[Y = d \cdot exp\left\{- exp \left[ - b (X - e)\right] \right\}\]

[Sheet: 14.6]

15.11.7 Exercise 7

Plants of Tripleuspermum inodorum in pots were treated with a sulphonylurea herbicide (tribenuron-methyl) at increasing rates. Three weeks after the treatment the weight per pot was recorded, with the following results:

Dose (g a.i. ha\(^{-1}\)) Fresh weight (g pot \(^{-1}\))
0 115.83
0 102.90
0 114.35
0.25 91.60
0.25 103.23
0.25 133.97
0.5 98.66
0.5 92.51
0.5 124.19
1 93.92
1 49.21
1 49.24
2 21.85
2 23.77
2 22.46

Assuming that the dose-response relationship can be described by using the following log-logistic model:

\[Y = c + \frac{d - c}{1 + exp \left\{ - b \left[ log (X) - log (e) \right] \right\}}\]

Parameterise the model and evaluate the goodnes of fit.

[Sheet: 14.7]