With the genomes of many microorganisms completely sequenced, and new ones emerging almost every month, science is faced with the challenge of understanding the function of all the newly discovered genes. The sequence of the entire genome of Saccharomyces cerevisiae has revealed almost 6,000 protein-encoding genes, of which the function of fewer than half is known with any confidence, indicating the enormous task that is ahead. Interestingly, there is a growing number of open reading-frames (ORFs) in yeast that show sequence similarities to genes of other species, but in none of these is the function of the gene known. It appears that these genes have escaped the classical ("function-first") genetic approach, and their presence is revealed only by systematic sequencing. The classical genetic approach is characterized by its qualitative nature, i.e. by results that (in principle) give yes-or-no answers. However, the genes that have escaped such classical analyses will probably most commonly have quantitative, rather than qualitative, phenotypes. The field which is emerging to establish their role is known as functional genomics, and it differs from classical genetics both in the comprehensive and integrative nature of its analytical approach and the fact that it does not rely on a one-to-one relationship between gene and phenotype. Our approach to the functional genomics challenge is, like that of many others, to develop large-scale analyses of gene expression of mutants of defined genotype. To this end, data are being gathered at the level of the transcriptome, the proteome and, unusually, the metabolome. Other large-scale phenotype tests (with which we are also involved) include the assessment of developmental and morphological characteristics and the use of sensitive and quantitative growth rate tests, particularly in microbial systems, where both medium composition and sublethal concentrations of a large number of inhibitors are varied to create a physiological pattern. A related approach, somewhat equivalent to the analysis of knockout strains, has proved particularly beneficial in predicting the site of action of metabolic inhibitors.
The implicit functional genomics agenda, then, is that by comparison of the large-scale (co)expression of orphan genes with those of "known" genes, in different genetic backgrounds and under different environmental conditions, one may acquire clues as to the function of the orphans. At all events, the inevitable results of these types of approaches is the generation of large amounts of raw phenotypic data which alone are meaningless and which must be analysed properly so as to obtain a proper understanding of the biological problem of interest, and this statement is true for all organisms, especially those whose genome sequence is available. With one or two exceptions, only the most rudimentary unsupervised methods are being applied to this problem, which at root involve the analysis of inputdata of very high dimensionality with a view to obtaining information of very much lower dimensionality. Consequently we are developing powerful chemometric tools for the optimal assignment of gene function based on the very high-dimensional phenotypic data which may be acquired.
Organisms currently being studied here include: Streptomyces coelicolor A3(2) Escherichia coli and Saccharomyces cerevisiae, while we are also applying our technology to Mycobacterium tuberculosis and Arabidopsis thaliana. Note that the functional classification of unknown genes into classes requires that these classes are themselves homogeneous. This is presently not the case!
Here are some links to:
Last updated: March 1, 2013 at 12:14 pm