Design and implementation of a processing scheme for phenotypic data within the context of physiological breeding

Description of the topic

Field phenotyping is probably one of the most important bottlenecks to be eliminated in order to optimize the exploration, evaluation and selection of the available genetic resources in wheat. Technological advances are helping to overcome this problem by providing tools that enable high-throughput measurements of plant traits. However, at the same time, huge amounts of data are being generated, and most breeding programs lack suitable schemes and tools to make adequate use of this information. Every year CIMMYT’s wheat physiology group evaluates large numbers of genotypes for several physiological traits in different environments. Fast and efficient analysis of such data is essential for proper data management and storage. Implementing a data management scheme is not a trivial task, for it requires a methodological approach to find the most operational and reliable design. The research topic suggested in this proposal consists of developing a data processing scheme that integrates all the steps from data collection to final compilation into a database. The main focus of this project will be the development of a tool for statistical analysis where the candidate will work with a number of current data sets. The final goal is to have a platform that allows quick access and analysis of phenotypic data, to improve the outcomes of other research projects and help decision making in our pre-breeding program.

Work expectations

Become familiar with the data collected by the wheat physiology group.

Make a list of the statistical analyses required by the group.

Develop a computer program using R or Python for data curation and statistical analyses (main objective).

Design a processing scheme for defining each step, from data collection up to the final database (based on the requirements of the wheat physiology group).

Propose and implement a workflow for effective integration of the steps identified in the previous point.

Required skills:


Required skills

Advanced knowledge of R, Python or a similar open-source programming language.

Advanced knowledge of statistics.

Functional English.