GenoFish

GenoFish - Evolution of genes and genomes after whole genome duplication

National program coordinated by Yann Guiguen and Julien Bobe (seleced in 2016)

This four year project (2017-2020) is carried out in collaboration with partners from:

  • CNRS: H. Roest Crollius, CNRS-IBENS
  • INRA: C. Donnadieu, GETPLAGE, & C. Klopp, SIGENAE
  • University of Lausanne: M. Robinson-Rechavi, Switzerland
  • University of Oregon: J.H. Postlethwait,  USA

Genome duplication is a recurring theme in vertebrate evolution; for instance, the human genome retains numerous gene family members that arose from one or two rounds of whole genome duplication (WGD) at the origin of vertebrates. These ‘extra’ genes are, in principle, available for the evolution of new functions that could drive the origin of novelties and thus contribute to the diversification of life on Earth. How genes evolve after genome duplication is thus a crucial question to understand the mechanisms by which genomes evolved and drive vertebrate development and physiology. Unfortunately, we do not yet have a sufficient understanding of vertebrate genomes to fully answer this question and the first rounds of vertebrate WGD are so ancient that they are difficult to study. To improve our understanding of evolution after WGD, this project will take advantage of the teleost-specific WGD (TGD) that arose 320-350 million years ago, after the split between Holostei and the lineage leading to Teleostei. This TGD is of special interest for the study of gene evolution because teleosts radiated shortly after the TGD resulting in many different extant lineages with a huge diversity. In addition, teleost genomes preserved a substantial number of duplicate genes while others lapsed back to single copy and this additional complexity was hypothesized as an explanation of the extraordinary radiation of this group to become half of all vertebrate species. With the advent of next-generation sequencing technologies, publicly available whole genome sequences in fish have increased dramatically in recent years. However, these fish genome resources still lack many important nodes in teleost diversity and evolution because, for instance, more than 80 % of species with sequenced genomes lie within the Euteleostei lineage. In addition to the paucity of an evolutionary-based whole genome resource, many of these recent sequences are also highly fragmented and therefore cannot be used to correctly infer synteny relationships over long genome fragments or to be certain that any individual gene is missing from the actual fish genome or just absent from the genome assembly. To solve this problem, our project will first use the cutting-edge sequencing approach of Single Molecule, Real- Time DNA (SMRT) sequencing to fill in these knowledge and resource gaps. We will provide high quality genomes in fish species carefully selected to fill key taxonomic positions with regards to teleost fish evolution. It must be noted that this objective would not have been feasible until the last few months at a reasonable cost and with such a quality high enough to allow comparative whole genome analysis. Results of this project should provide genome-wide answers on how often different gene copies are lost independently in different fish lineages and whether lineage-specific changes in duplicate gene content, gene regulation, or gene expression patterns is important for the evolution of the remarkable diversity among teleosts. In addition, and because these gene duplications also have a major impact on the quality of gene annotation in teleosts, this project will propose, supported by the results of our evolutionary-based analysis, the refinement of teleost gene nomenclature. Reforming nomenclature will link gene information across many vertebrate species, thereby bridging functional information from current major fish model species (zebrafish, medaka) to other biomedical or economically relevant fish species.