A Bayesian genomic prediction model combining pedigree and incomplete genotyped individuals

Description of the topic

Quantitative trait variation may display a complex genetic architecture, which is the underlying basis of a phenotype, with few genes with large effects and several genes with small effects. The genes can show additive, dominance or/and epistatic effects and different types of interaction with the environment. Advances in plant genomics –and, more recently, in DNA sequencing of many species– coupled with the availability of statistical and biometric methods for analyzing genetic and phenotypic data with friendly software, have made it feasible to map and dissect complex quantitative trait variation. Genomic selection (GS) and prediction based on genome-wide single nucleotide polymorphism genotyping, pedigree and phenotypic data are very powerful tools for capturing small genetic effects dispersed over the genome; this allows predicting an individual’s phenotype. New methods and tools are continuously being developed to integrate GS into genetics research. One of the key issues with GS is the fact that there are usually many more individuals that have been phenotyped and have pedigree data than individuals with marker data. When attempting to make predictions, only lines with marker, pedigree and phenotypic data can be used, thus leaving out lines with pedigree and phenotypic data that have missing marker data. This project aims to develop a novel approach that uses Bayesian statistics to combine genotypic data with pedigree and phenotypic data and therefore use all available data. This project will lead to developing new algorithms and software for GS based on advanced statistical methods that will cause a paradigm shift in this field.

Work expectations

The phenotypic variation of a plant population measured across locations, seasons or years can be attributed to its genetics, the environment where it grows, and genotype × environment interaction. Biometric models are used to explain traits with continuous variation because algebraic equations facilitate the understanding of quantitative genetics, which is the study of complex traits affected by the action of multi-genes. Quantitative genetic models include genes having major or small effects and the non-genetic factors affecting a complex trait. Genomic prediction (GS) uses genome-wide single nucleotide polymorphisms, pedigree and phenotypic data, and is a very powerful tool to capture small genetic effects dispersed over the genome, which allows predicting an individual’s phenotype. New methods and tools are continuously being developed to integrate GS into genetics research. One of the key issues with GS is the missing values in the genotyped data, which makes it difficult to develop GS models with different numbers of lines with pedigree, marker, and phenotypic data. The overall aim of this project is to develop a theoretical model that includes pedigree and missing genotyping data in a unified approach, and test it with real data available from international multi-environment wheat breeding nursery trials. The specific objectives are: (1) to mix the A matrix obtained from the pedigree with the G matrix obtained from the markers into one unified H matrix. The Bayesian approach used to compute matrix H will be compared with other models used as references in GP; and (2) the prediction accuracy of our H matrix will be compared with the prediction accuracy of other methods used to compute matrix H.

Required skills

Bayesian analysis and computer programming in R.