Recent Advances in Machine Learning Variant Effect Prediction Tools for Protein Engineering

Year
2022
Type(s)
Author(s)
Jesse R. Horne and Diwakar Shukla
Source
Industrial & Engineering Chemistry Research, Volume 61, Issue 19, Pages 6235–6245, 2022.
BibTeX
BibTeX

Proteins are nature’s molecular machinery and comprise a diverse set of roles while consisting of chemically similar building blocks. In recent years, protein engineering and design have become important research areas, with many applications in the pharmaceutical, energy, and biocatalysis fields, among others—where the aim is to ultimately create a protein given desired structural and functional properties. To assist in such protein engineering pursuits, it is often critical to model the relationship between a protein’s sequence, folded structure, and its biological function. Large challenges remain, however, in concretely mapping an amino acid sequence to specific protein properties and biological activities. Mutations may enhance or diminish molecular protein function, and the epistatic interactions between mutations results in an inherently complex mapping between genetic modifications and protein function. Therefore, estimating the quantitative effects of mutations on protein function(s) remains a grand challenge of biology, bioinformatics, and many related fields, and would rapidly accelerate protein engineering tasks when successful. Such estimation is often known as variant effect prediction (VEP). Progress has, however, been demonstrated in recent years with the development of machine learning (ML) methods in modeling the relationship between mutations and protein function. In this review, recent advances in variant effect prediction (VEP) are discussed as tools for protein engineering, focusing on techniques incorporating gains from the broader ML community and challenges in estimating bimolecular functional differences. Primary developments highlighted include convolutional neural networks, graph neural networks, and natural language embedding for protein sequences.

Leave a Reply

Your email address will not be published. Required fields are marked *