Statistical Analysis of Genetic and Phenotypic Data for Breeders


Who should attend? CGIAR center researchers and staff, National Programs researchers, CIMMYT’s partners, postgraduate students, molecular marker specialists, bioinformaticians, biometrician/statisticians, and visitors who are interested in the analysis of phenotypic and genotypic information in Breeding Programs. Participants should be familiar with basic theories and methods in plant breeding, plant genetics and statistics. Participants should be familiar with R.  

Primary lecturers: Lectures will be delivered by Juan Burgueño, José Crossa, Fernando Toledo, Paulino Pérez, Ángela Pacheco, and Gregorio Alvarado.

Costs: There will a charge of 400 USD for the course to cover indirect costs (classroom, audiovisual service). Each participant will have to cover his/her own travelling and accommodation expenses (including lunch and dinner).

Contact details: Please address any enquires to Víctor Hidalgo Velázquez, Tel: +52 (55) 58042004 ext 2292 or by e-mail: For any scientific-academic issues, please contact Dr. Juan Burgueño, E-mail:

Detailed Program

Day 1 - Introduction to Generalized Linear Mixed Models (GLMM)

  • Definition and concepts of GLMM.
  • Analysis of experimental designs with GLMM.

Day 2 - Experimental Design

  • Concepts in Experimental Design.
  • Incomplete Block Designs.
  • Augmented designs.

Day 3 - Statistical analysis of multi-environmental trials (META-R)

  • Individual analysis of trials.
  • Combined analysis.
  • Spatial analysis.

- Selection Indexes (RINDSEL)

  • Multi-trait and multi-environment models
  • Phenotypic Selection Indices, Smith, ESIM.
  • Restricted Selection Indices, Kempthorne and Nordskog, RESIM.
  • Molecular Selection Index based on Markers Assisted Selection (MAS).

Day 4 - Analysis of the interaction genotype by environment (GEA-R)

  • Descriptive analysis.
  • Additive Main Effects and Multiplicative Interaction Model (AMMI).
  • Site Regression Model (SREG).
  • Partial Least Square analysis.
  • Stability Analysis, Eberhart and Russell, Francis, Shukla, Tai, Perkins & Jinks , Finlay & Wilkinson, Wrickes, Superiority measure, Nassar&Huehn).
  • Factorial Regression in Multi-Environmental Experiments.

Day 5 - Basic concepts of Quantitative Genetics

  • Basics of population and quantitative genetics
  • Concepts of identical by decent and inbreeding
  • Definition of breeding values.
  • Brief overview of mixed models (BLUE, BLUP, variance components, covariance structures).
  • Linear models for predicting breeding value from pedigree relationship (A) and genomic relationship matrix (G).

Day 6 - Introduction to R

  • Data management in R.
  • Basic algebraic and matrix operations.
  • Some basic libraries for data analysis.
  • Functions.

Day 7 - Management and analysis of genomic data

  • Data formats and missing values, data quality.
  • Allele and genotype frequency, Harry-Weinberg equilibrium, rareness and specificity.
  • Theoretical aspects of Diversity analysis.
  • Principal components analysis and multidimensional scaling. cluster analysis.

Day 8 - Bases of Genomic Selection

  • Why and how?
  • Parental Average and Mendelian sampling.
  • GS models for G×E (reaction norm models).
  • Estimation of GEBVs from marker effects.
  • RKHS and GBLUP.

Day 9 - Genome Selection with in BGLR

  • Computation of genomic relationship and genetic distances using markers.
  • Linear regression using BGLR (from shrinkage to variable selection methods).

- Review of methods for genome-enabled regression and prediction

  • GBLUP-Ridge regression.
  • RKHS.GBLUP and the Bayesian Alphabet (BL, BA, BB).
  • Estimation of marker effects.

Day 10 - Genome-Wide Association Mapping 

  • Basic concepts on GWAS.
  • Populations for GWAS studies.
  • Statistical models.

Day 11 - Practical laboratory in Diversity Analysis and Genome-Wide Association

  • Genetic Diversity Analysis using BIO-R.
  • Case study and pipeline for GWAS using Maize data.

Through lectures, practices and discussions, you will learn:

  • Basic concepts about Generalized Linear Mixed Models. Use of computing tools for analysis of genetic experiments, analysis of multi-environmental trials with and without spatial analysis and analysis of the interaction genotype by environment.
  • Basic quantitative genetic concepts applied in Genome Selection and Plant Breeding; provide predictive models and methods in genome selection, use of software for genome selection.
  • Analysis of genetic data, data management, diversity analysis, genome wide association studies.