Chapter 8 Updated list of datasets
Apart from those listed in the book, the package statforbiology contains many other dataset and the list is continuously updated. In this section you will find the full list of available datasets.
adjuvantsLS
An experiment to compare three adjuvants (ammonium sulfate, mineral oil, non-ionic surfactant, and a control with no adjuvant) for rimsulfuron, which is a herbicide for weed control in maize. The experiment was designed as a Latin square with five replicates (Onofri, unpublished data), and the resulting dataset was slightly modified to make it more suitable for teaching purposes.
- Herbicide (character): herbicide name
- Adjuvant (character): adjuvant name
- Dose (character): dose of the herbicide + adjuvant
- Code (character): code for the adjuvants
- Plot (numerical coding): code for the plots
- Column (numerical coding): code for the column in the Latin square grid
- Row (numerical coding): code for the row in the Latin square grid
- Columns 8 to 23 (character): each column is a weed (identified by the respective Bayer code), and the values represent the abundances (Braun-Blanquet codes)
- Yield (numeric): crop yield (100 kg per hectare)
- Height (numeric): crop height (cm)
alfalfa3years
A genotype experiment in alfalfa (Ligabue, Onofri, and Ruozzi 2009) to compare 20 genotypes in central Italy (medium Tiber Valley). The experiment was laid out as an RCBD with four replicates and the total yearly forage yield (sum of 3-4 cuts per year) was measured in each plot over a 3 years (from 2006 to 2008).
- Plot (numeric): code for the plot
- Block (numeric): code for the block
- Genotype (character): name of the genotype
- Year (numeric): year when yield was measured
- Yield (numeric): total yearly forage yield (sum of 3-4 cuts per year; in tons per hectare of dry matter)
Ammi94
A field experiment to evaluate the effect of different densities of Ammi majus on the achene yield of sunflower (A. Onofri and Tei 1994a). The values represent the means of 4 replicates.
- Density (numeric): number of weed plants per square meter
- Yield (numeric): sunflower yield in tons per hectare
beet
A split-plot tillage experiment in sugarbeet, where three types of tillage (minimum tillage = MIN; shallow plowing = SP; deep plowing = DP) and two types of chemical weed control methods (broadcast = TOT; in-furrow = PART) were compared in four complete blocks with three main-plots per block, split into two sub-plots per main-plot; the three types of tillage were randomly allocated to the main-plots in each block, while the two weed control treatments were randomly allocated to the sub-plots within each main-plot (Bianchi, 1992; unpublished data).
- Tillage (character): tillage method
- WeedControl (character): weed control method
- Block (numeric): code for the block
- Yield (numeric): sugarbeet root yield in tons per hectare
beetGrowth
An experiment in which sugarbeet was grown either weed-free, or weed-infested (Covarelli and Onofri 1998). Crop weight per unit area (in kg/ha) was measured at six different timings (in Days After Emergence; DAE), using destructive methods from independent plots. The experiment was conducted using a completely randomized design with three replicates.
- DAE (numeric): Day After Emergence
- Infested (numeric): Crop Dry Weight in weed-infested plots (kg/ha)
- WeedFree (numeric): Crop Dry Weight in weed-free plots (kg/ha)
citrusGrove or speciesArea
An experiment was conducted in a citrus grove in Sicily (Southern Italy) to determine the species-area relationship for the local weed community. In this study, a nested-plot survey was employed (Muller-Dumbois and Ellenberg 1974) where the number of weed species was counted in a plot of 1 m2 surface and the sampling area was progressively doubled in size. At each step, the number of new weed species was counted. The dataset is taken from Cristaudo et al. (2015).
- Area (numeric): sampled area (in square meters)
- numSpecies (numeric): count of the number of species
crosses
An experiment to compare nine maize hybrids, obtained from three pollinating inbred lines (A1, A2, and A3), each one crossed with three different female inbred lines (A1 was crossed with B1, B2, and B3, A2 was crossed with B4, B5, and B6, while A3 was crossed B7, B8, and B9). The experiment was laid out as complete blocks with four replicates (36 subjects in total). The dataset was generated through Monte Carlo simulation.
- Male (character): pollinating line
- Female (character): female line
- Block (numeric): code for the blocks
- Yield (numeric): maize yield (in tons per hectare)
failureTimes
An experiment to record the failure time of heating systems as affected by four different operating temperatures. The design was completely randomized with six replicates, and the number of hours before failure was measured. The dataset is taken from W. (1972).
- Temp (numeric): testing temperature (in \(^{\circ}\)F)
- Hours_to_failure (numeric): number of hours before failure
FertilizationTiming
A field experiment to evaluate the effect of fertilization timing (early, optimal, late) on two genotypes (A and B). The experiment was laid out as an RCBD, and the response represents the amount of absorbed nitrogen by the plant (simulated data).
- Timing (character): timing of fertilization (early, optimal, late)
- Genotype (character): genotype
- Block (numeric): code for the block
- Nabs (numeric): amount of absorbed nitrogen
FGP_rape
An experiment where the germination of three genotypes of oilseed rape was assessed in controlled conditions, at 20 \(^{\circ}\)C, according to an RCBD with six replicates. One replicate per genotype, consisting of a Petri dish with 50 seeds, was put in each of six shelves inside the oven. The number of germinated seeds was counted 15 days after the start of the assay and expressed as the Final Proportion of Germinated seeds (FGP). The assay was repeated twice in different and independent ovens with the same experimental design. The dataset is taken from Pace et al. (2012).
- Dish (numeric): code for the Petri dish
- Run (numeric coding): code for the run
- Shelf (numeric): code for the shelf
- Species (character): plant species
- Genotype (character): genotype
- FGP (numeric): Final Germinated Proportion in each dish
floristicData1
An experiment where the weed flora composed by 8 weed species was recorded in 120 plots, included in a three-factor factorial experiment in a randomised blocks design. The counts of weeds are reported and the dataset was modified from Scavo et al (2020).
- F1 to F3 (character): the three experimental factors, respectively with 5, 2 and 3 levels
- Rep (character): the blocks
- SP1 to SP8 (numeric): the counts for the eight weed species
heights
Highly unbalanced dataset, containing the height (cm) and yield (t/ha) of 20 maize plants belonging to 4 different genotypes. The dataset was generated through Monte Carlo simulation.
- Id (numeric): code for the plot
- var (character): code for the genotype
- height (numeric): crop height (cm)
- yield (numeric): crop yield (t/ha)
insects
An experiment in which tobacco plants were treated with three different insecticides was conducted using a completely randomized design with five replicates (resulting in a total of fifteen plants). The number of insects over the surface of leaves in each plant was counted three weeks after the treatments.
- Insecticide (character): code for the insecticide
- Rep (numeric): code for the replicate (not blocks)
- Count (numeric). count of insects
johnsongrass
A pot experiment to evaluate the best timing for herbicide application against Sorghum halepense originated by rhizomes. Five timings were compared including a split treatment (2-3, 4-5, 6-7, 8-9, and 3-4/8-9 leaves), and an untreated check was added for comparison. Treatments were repeated on plants originating from rhizomes of different lengths (2, 4, and 6 nodes). The design was a fully crossed two-way factorial, laid out as completely randomized with four replicates. The dataset is taken from A. Onofri and Tei (1994b).
- Length (character): length of rhizomes at the beginning of the experiment
- Timing (character): crop stage at spraying
- RizomeWeight (numeric): weight of rhizomes four weeks after spraying (in grams per pot)
LeClerg
A genotype experiment in a completely randomized design, with five genotypes and four replicates. The dataset is modified from (Le Clerg, Leonard, and Clark 1962).
- Variety (character): code for the genotype
- Yield (numeric): crop yield in bushels per acre
LepidopteraEggs
An entomologist counted the number of eggs laid by a species of moth, on three different substrates using a completely randomized design with 15 replicates. This dataset was modified from Kuehl (2000).
- Substrate (character): code for the substrate
- No_of_eggs (numeric): count of the number of eggs
lettuceLS
An experiment to assess the effect of four different fertilizers on lettuce yield. The experiment was laid out as a Latin square with four replicates. The dataset was generated through Monte Carlo simulation.
- Fertilizer (character): code for the fertilizer
- Row (numeric): code for the row in the Latin square grid
- Column (numeric): code for the column in the Latin square grid
- Yield (numeric): lettuce yield (in kg per hectare \(\times\) 100)
maizeMET
A multi-environment experiment in mazie, with four genotypes and 14 environments. The data represent the average yields for each genotype in each environment. This dataset is presented in Piepho (1998) and it can be used to reproduce the analyses presented there.
- Env (numeric coding): code for the environment
- Gen (character): code for the genotype
- Yield (numeric): maize yield in tons per hectar
metamitron
An experiment to study the degradation of the herbicide metamitron (M) in soil, either alone or in the presence of two co-applied herbicides, i.e. phenmedipham (P) and chloridazon (C). Ninety-six independent soil samples were treated with four herbicide combinations (i.e., M, M+P, M+C, and M+P+C, 32 samples per combination) and stored in a climatic chamber at 20\(^{\circ}\)C. Three random soil samples were collected for each herbicide combination at eight different time points (0, 7, 14, 21, 32, 42, 55 and 67 days after treatment). Theu were stored in a refrigerator until chemical analyses. At the end of the experiment, all soil samples were analyzed to determine the residual concentration of metamitron. This dataset is taken from Vischetti et al. (1996), and it has been modified through Monte Carlo simulation, to make it more suitable for teaching purposes.
- Time (numeric): the time at sampling (in Days after the treatment)
- Herbicide (character): code for the herbicide combination
- Rep (numeric): code for the replicate (not blocks; the experiment was fully randomized)
- Conc (numeric): the concentration of metamitron in % of the initial value
missingVal
A fertilization trial was conducted according to a randomized complete block design with five replicates; however, due to unforeseen circumstances, one value is missing for the ‘50N’ treatment in the second block.
- Fert (character): fertilization treatment
- Block (numeric): code for the block
- P2O5 (numeric): content of P2O5 in leaves (%)
mixture
A pot experiment to compare weed control efficacy of two herbicides used alone and in a mixture. A control was also added as a reference, and thus, the four treatments were (i) Metribuzin, (ii) Rimsulfuron, (iii) Metribuzin + rimsulfuron, and (iv) the untreated control. Sixteen uniform pots were prepared and sown with Solanum nigrum. When the plants reached the 4-true-leaf stage, the pots were sprayed with the above herbicide solution following a completely randomized design with four replicates. Three weeks after the treatment, the plants in each pot were harvested and weighed: the lower the weight, the higher the efficacy of herbicides. The dataset is taken from (A. Onofri, Covarelli, and Tei 1995).
- Treat (character): herbicide treatment
- Weight (numeric): dry weight of weed plants (grams per pot)
NGenotype and NGenotypeFull
Fifteen genotypes of durum wheat were compared under two different N fertilization strategies (in the dataset ‘NGenotype’ the number of genotypes has been reduced to 5). The dataset was generated through Monte Carlo simulation that started from the original data in Fabio Stagnari et al. (2013).
- Block (numeric): code for the block
- Genotype (character): code for the genotype
- Nitrogen (numeric): code for the fertilization strategy
- Yield (numeric): wheat yield in tons per hectare
NWheat
A N fertilization experiment was conducted in wheat using a randomized complete block design, with four N doses and four replicates. The dataset was generated through Monte Carlo simulation.
- Dose (numeric): N fertilization rate (Kg/ha)
- Block (numeric): code for the block
- Yield (numeric): crop yield in tons per hectare
Oat1L
A field experiment was laid out as a CRD with three replicates to compare five genotypes of oat in one location. The data was generated through Monte Carlo simulation.
- Genotype (character): code for the genotype
- Yield (numeric): oat yield in tons per hectare
Oat2L
The same experiment as for ‘Oat1L’ was repeated in a second location, also with a completely randomized layout. Also the data for the second location were generated through Monte Carlo simulation.
- Location (character): code for the location
- Genotype (character): code for the genotype
- Yield (numeric): oat yield in tons per hectare
orangeIrrigation
An irrigation experiment was conducted in an orange grove in Southern Italy, comprising five irrigation systems in five complete blocks. The dataset was simulated by Monte Carlo methods.
- Method (character): irrigation systems
- Block (numeric): code for the block
- Yield (numeric): orange yield in tons per hectare
pea_MultiLoc
A multi-location genotype experiment in field pea, to compare 17 genotypes in 12 locations. In each location, the experiment was laid out as an RCBD, with four blocks. The dataset was generated through Monte Carlo simulation, starting from the data in Monotti et al. (2009).
- Id (numeric): code for the plot
- Loc (character): code for the location
- Var (character): code for the genotype
- Block (character): code for the block
- Yield (numeric): pea yield in tons per hectare
Rainfall2022
Daily rainfall amounts (mm) in central Italy in 2022 from a local meteorological station.
- DOY (numeric): Day Of the Year
- PRP (numeric): amount of daily rainfall (mm per day)
recropS
Rimsulfuron was applied at the recommended rate on bare soil, and untreated plots were included for comparison. Forty days after the treatment, sugarbeet, rape and soybean were sown on treated and untreated plots, and the weight of the plants was determined four weeks after sowing. The experiment was laid down as a strip-plot in complete blocks: in each block, the herbicide treatments (rimsulfuron and the untreated control) were randomly allocated to each of two columns, while the three crops were randomly allocated to each of three rows (Onofri, unpublished data).
- Herbicide (Character): name of the herbicide treatment
- Crop (character): name of the crop
- Block (numeric): code for the block
- CropBiomass (numeric): weight of crop biomass four weeks after sowing
rimsulfuron
A herbicide experiment where rimsulfuron was used with different timings and in different mixtures and compared to several other herbicide solutions for weed control in maize. At the end of the crop cycle, herbicide efficacy was assessed by measuring the coverage of weeds (%) and the yield of maize (in 100 kg per hectare).
- Herbicide (character): name of the herbicide treatment
- Plot (numeric): code for the plot
- Code (numeric): code for the herbicide
- Block (numeric): code for the block
- Column (numeric): code for the column within each block
- WeedCover (numeric): coverage of weeds (in %)
- Yield (numeric): maize yield (in 100 kg per hectare)
sadocchi_man
Exemplary and simple dataset for multivariate analyses, such as MANOVA. It refers to a completely randomised, two-factor factorial experiment, with two numeric responses. The dataset was taken from Sadocchi (1987).
- a (numerical coding): levels for the experimental factor ‘a’
- b (numerical coding): levels for the experimental factor ‘b’
- y1 (numeric): 1st quantitative response variable
- y2 (numeric): 2nd quantitative response variable
Sinapis
A study was conducted to evaluate the effect of increasing densities of a weed (Sinapis arvensis) on sunflower yield (tons per hectare). The dataset was taken from A. Onofri and Tei (1994a).
- density (numeric): weed density in plants per square meter
- block (numeric): code for the block
- yield (numeric): sunflower grain yield in tons per hectare
SowingTime
Six faba bean genotypes were tested at two sowing times using a split-plot design in four complete blocks. Sowing times were randomized to main-plots within blocks, and genotypes were randomized to sub-plots within main-plots and blocks. The dataset was taken from F. Stagnari et al. (2007).
- Plot (numeric): code for the plot
- SowingTime (character): sowing season
- Genotype (character): name of the genotype
- Block (numeric): code for the block
- Yield (numeric): crop grain yield in tons per hectare
starchGrain_g
This dataset refers to an experiment which aimed to compare the diameters of starch grains from tubers of two potato producers. Starch grains were sampled from tubers collected from the production fields of the producers. The dataset shows the counts of starch grains assigned to one of five diameter classes (<4, [4−8[, [8−12[, [12−16[,≥16 \(\mu m\)). For each producer, the diameters were measured from twelve photos taken with a microscope. The original dataset is in a grouped form and one record represents a photo (12 photos per each producer). The dataset was taken from Andrea Onofri, Piepho, and Kozak (2019)
- Group (character): the producer
- Photo (numeric): code for each photo
- c1-c5 (numeric): the counts of individual in each class
starchGrain_u
Same dataset as ‘StarchGrain_g’, but in ungrouped form (one row for each starch grain).
- Photo (numeric): code for each photo
- Group (character): the producer
- Class (character): the diameter class to which each grain belongs (c1 to c5)
starchGrain_s
Same dataset as ‘StarchGrain_g’, but in censored ‘long’ form (one row for each starch grain), where the class is represented by using the lower and upper diameter values whithin which the real diameter value is comprised.
- Photo (numeric): code for each photo
- Group (character): the producer
- sizeLow (numeric): the lower limit for the diameter class
- sizeUp (numeric): the upper limit for the diameter class
- Class (character): the diameter class to which each grain belongs (c1 to c5)
sugarsMedia
Plant tissues of tomato were grown in tissue cultures using three types of media, each based on the addition of a specific amount of either glucose, fructose, or sucrose to the control. The experiment was conducted using a completely randomized design with five replicates, and the growth of cells was recorded for each subject. The dataset was generated through Monte Carlo simulation, starting from the original data in Kuehl (2000).
- Sugar (character): type of substrate
- Growth (numeric): tissue growth in mm
Sunflower
Water extracts of two sunflower varieties were compared in terms of phytotoxicity to the same test-plant (Sinapis alba). Several lots of 50 seeds of S. alba were put in Petri dishes and moistened with water extracts at increasing concentrations (three replicated Petri dishes for each of four concentration levels, including the untreated control). Radicle lengths were determined seven days after the treatment. The experiment was planned with a CRD.
- Dose (numeric): concentration of extracts (g d.m. in 100 mL of water)
- Rep (numeric): code for the replicate
- Var (character): name of sunflower varieties
- Length (numeric): root lengths in mm
Timings_77490
A field experiment with maize in which rimsulfuron was used post-emergence in maize at 4 different timings and compared to the untreated unweeded control. The aim was to assess the selectivity of the herbicide for the crop, depending on the moment of intervention. One value is missing due to an unforeseen event. The dataset is taken from Onofri (unpublished data).
- Plot (numeric): code for the plot
- Timing (character): timing of weed control as the number of maize leaves
- Block (numeric): code for the the block
- Height_30 (numeric): height of maize (cm) on 30 June (cm)
- Weight_30 (numeric): average weight of maize plants (g per plant) on 30 June
- FinalYield (numeric): grain yield of maize in kg/ha \(\times\) 100
TKW
An experiment with 30 genotypes in three blocks, where the Weight of Thousand Kernels (TKW) was recorded in three sub-samples per plot (Ciriciofolo, unpublished data)
- Plot (numeric): code for the plot
- Block (numeric): code for the block
- Genotype (character): name of the genotype
- Sample (numeric): code for the subsample in each plot
- TKW (numeric): weight of 1000 wheat kernels (g)
Tripleuspermum
Plants of Tripleuspermum inodorum were treated with a sulphonylurea herbicide (tribenuron-methyl) at increasing doses, and the fresh weight of the treated plants per pot was recorded 3 weeks after treatment. The experiment was completely randomized with three replicates and conducted at the Department of Integrated Pest Management, University of Aarhus, Denmark (Pannacci, Pettorossi, and Tei 2013).
- Dose (numeric): Dose of tribenuron-methyl (in g/ha)
- FreshWeight (numeric): weight of plants 3 weeks after the treatment (in grams per pot)
WeedCounts
Counts of three weed species (CHEAL: Chenopodium album, CHEHY: C. hybridum, and CHEPO: C. polyspermum) in twenty random quadrats in a field of central Italy (Onofri, unpublished data).
- Species (character): the weed species, identified by its Bayer code
- Rep (numeric): code for the quadrat
- Count (numeric): count of the number of plants per quadrat
WeedCover
Results of a survey, where the yield of maize was determined, as affected by the early determined covering of weeds. The dataset was simulated by Monte Carlo methods.
- WeedCover (numeric): ground covering of weeds (%)
- Yield (numeric): yield of maize in tons per hectare
WinterWheat and WinterWheat2002
A multi-year experiment was conducted to compare eight winter wheat genotypes, with an RCBD with three replicates in the hills of central Italy and repeated for seven years (A. Onofri and Ciriciofolo 2007). The dataset ‘WinterWheat2022’ only contains the data from the 2002 season.
- Plot (numeric): code for the plot
- Block (numeric): code for the block within each year
- Genotype (character): name of the genotypes
- Yield (numeric): grain yield in tons per hectare
- Year (numeric): the year in which the yield was measured