The main aim of my current fellowship to explore multiobjective approaches to protein structure prediction. One of the research questions I am particularly interested in in this context is why and whether such an approach is advantageous compared to single-objective optimization. This is something that, in my opinion, has not yet been convincingly analyzed and/or demonstrated in previous work. I am attempting to address this problem both from a theoretical and empirical perspective.
A lay summary of the PSP problem and a slightly more detailed research abstract are given below.
Proteins are at the heart of almost all cellular processes, and the categorization of the function of unknown proteins and the design of novel proteins are therefore of fundamental importance in the development of new drugs and therapies. Knowledge of the three-dimensional structure of a protein is crucial in this context, as the function of a protein directly derives from its structural properties. While, as a result of the ongoing sequencing efforts, the number of known protein sequences is quickly increasing, the increase in the number of known protein structures proceeds much slower. This is due to the complex and time-consuming experimental methods required to determine the structure of a given protein. Methods of de novo protein structure prediction aim to reduce this bottleneck by predicting the structure of a protein from its sequence using computational methods. This is a computationally challenging problem and, despite recent progress in the field, protein structure prediction remains one of the `grand challenges' in the area of computational biology. This project will use state-of-the-art optimization techniques from the field of computational intelligence and explore their potential in the context of de novo protein structure prediction.
Proteins are (macro)molecules consisting of chains of aminoacids, which are at the heart of almost all cellular processes. The functional properties of a protein derive from its three-dimensional shape (tertiary structure), which, in turn, is, predominantly, determined by its sequence of aminoacids (primary structure): specifically, the native tertiary structure of a protein is assumed to correspond to its lowest free-energy conformation. In theory, this direct relation between sequence and structure of a protein, which has been known for several decades, allows for in silico structure/function prediction (for a given sequence), as well as the design of new proteins (for a given structure/function). In the context of protein structure prediction (PSP), two fundamentally different approaches can be identified; those based on comparative modeling and those based on de novo modeling. Comparative modeling approaches predict structure based on that of homologous proteins: they are therefore only applicable in the presence of proteins with a high sequence similarity and of known structure. In contrast to this, de novo modeling approaches can be applied to any protein sequence, but are, currently, less effective. The inherent difficulty of de novo protein structure prediction arises from two different issues: (1) the intricacy of formulating an energy function that realistically models the different local and global interactions contributing to protein folding, and (2) the size of the space of possible conformations, which cannot be explored exhaustively. Progress in de novo protein structure prediction therefore crucially relies on progress both in the design of appropriate more accurate energy functions and the development of specialized efficient sampling methods. The research proposed relates to both of the above issues in de novo protein structure prediction. Traditionally, the PSP problem has been cast as a single-objective optimization problem, in which a single energy function - integrating a number of different types of interactions - is optimized, subject to a number of knowledge-based constraints. However, several of the components within these single-objective energy functions are generally conflicting, and their integration within one objective function is not the most natural choice. A more desirable approach is the treatment of the conflicting terms as individual objectives, and the identification of the set of efficient trade-offs for the resulting multiobjective optimization problem. Such a multiobjective formulation of the PSP problem is expected to afford an improved guidance in the search space and, additionally, offers a mechanism for the flexible handling of constraints.