Products

Artificial Intelligence Based Methods

LassoESM: A tailored large language model for Lasso peptides

We developed a lasso peptide-specific language model (LassoESM) by leveraging advances in pre-trained PLMs to aid the prediction of lasso peptide related properties.

Xuenan Mi, Susanna L. Barrett, Douglas A. Mitchell* and Diwakar Shukla*, LassoESM: A tailored language model to enhance lasso peptide property prediction. Nature Communications, In press, 2025. Publication Link

An open-source implementation of LassoESM is available at https://github.com/ShuklaGroup/LassoESM

 

PeptideESM: A tailored large language model for peptides

We developed a  peptide-specific language model (PeptideESM) by leveraging advances in pre-trained PLMs to aid the prediction of peptide related properties.

Joseph D. Clark, Xuenan Mi, Douglas A. Mitchell and Diwakar Shukla*, Substrate prediction for RiPP biosynthetic enzymes via masked language modeling and transfer learning. Digital Discovery, Volume 4, Pages 343-354, 2025. Publication Link

An open-source implementation of PeptidesESM is available at https://github.com/ShuklaGroup/LazBFDEF

 

TLmutation: Protein Variant Effect Predictor based on transfer learning

TLmutation leverages deep mutational scanning datasets to predict the functional consequences of mutations in homologous proteins. 

Zahra Shamsi, Matthew Chan and Diwakar Shukla*, TLmutation: Predicting the Effects of Mutations Using Transfer Learning. The Journal of Physical Chemistry B, Volume 124, Issue 19, Pages 3845-3854, 2020. Publication Link

An open-source implementation of TLmutation is available at https://github.com/ShuklaGroup/TLMutation

 

Molecular Dynamics Simulation Based Methods

Ensemble Adaptive Sampling Scheme (EASE)

We developed a framework for identifying the optimal sampling policy through metric-driven ranking. Our approach systematically evaluates the policy ensemble and ranks the policies based on their ability to explore the conformational space effectively. This approach takes an ensemble of adaptive sampling policies and identifies the optimal policy for the next round based on current data.

Hassan Nadeem and Diwakar Shukla*, Ensemble Adaptive Sampling Scheme: Identifying an Optimal Sampling Strategy via Policy Ranking. Journal of Chemical Theory & Computation, Volume 21, Issue 9, Pages 4626–4639, 2025. Publication Link

An open-source implementation of policy ranking framework and EASE is available at https://github.com/ShuklaGroup/EASE.

 

Maximum Entropy VAMPNets (MaxEnt VAMPNets)

Inspired by the active learning approach of uncertainty-based sampling, we also present MaxEnt VAMPNet. This technique consists of restarting simulations from the microstates that maximize the Shannon entropy of a VAMPNet trained to perform the soft discretization of metastable states.

Diego E. Kleiman and Diwakar Shukla*, Active Learning of the Conformational Ensemble of Proteins Using Maximum Entropy VAMPNets. Journal of Chemical Theory & Computation, Volume 19, Issue 14, Pages 4377–4388, 2023. Publication Link

An open-source implementation of MaxEnt VAMPNets is available on Github. https://github.com/ShuklaGroup/MaxEntVAMPNet.

 

Multiagent Reinforcement Learning-Based Adaptive Sampling (MA-REAP)

We have developed two algorithms inspired by multiagent RL that extend the functionality of closely related techniques (REAP and TSLC) to situations where the sampling can be accelerated by learning from different regions of the energy landscape through coordinated agents.

Diego E. Kleiman and Diwakar Shukla*, Multiagent Reinforcement Learning-Based Adaptive Sampling for Conformational Dynamics of Proteins. Journal of Chemical Theory & Computation, Volume 18, Issue 9, Pages 5422–5434, 2022. Publication Link

An open-source implementation of multi-agent REAP is available on Github. https://github.com/ShuklaGroup/MA_REAP

 

Reinforcement Learning-Based Adaptive Sampling (REAP)

The REAP algorithm uses concepts from the field of reinforcement learning, a subset of machine learning, which rewards sampling along important degrees of freedom and disregards others that do not facilitate exploration or exploitation.

Zahra Shamsi, Kevin J. Cheng and Diwakar Shukla*, Reinforcement Learning Based Adaptive Sampling: REAPing Rewards by Exploring Protein Conformational Landscapes. Journal of Physical Chemistry B, Volume 122, Issue 35, Pages 8386–8395, 2018. Publication Link

An open-source implementation of REAP is available on Github. https://github.com/ShuklaGroup/REAP-ReinforcementLearningBasedAdaptiveSampling

 

Optimal Probes

Optimal Probes is a software package to predict the smallest set of residue pairs for site-directed spin labeling for DEER/EPR experiments that best capture the slowest dynamics in a protein of interest. It uses molecular dynamics (MD) simulation datasets and a hyperparameter optimization strategy for a Markov state model (MSM).

Shriyaa Mittal and Diwakar Shukla*. “Predicting Optimal DEER Label Positions to Study Protein Conformational Heterogeneity”. Journal of Physical Chemistry B, Volume 121, Issue 42, Pages 9761–9770, 2017.

An open-source implementation of Optimal Probes is available on Github. https://github.com/ShuklaGroup/optimalProbes/wiki

 

Evolutionary Coupling Guided Adaptive Sampling (ECAS) 

In this method, we assess the use of distances between evolutionarily coupled residues as natural choices for reaction coordinates which can be incorporated into Markov state model-based adaptive sampling schemes and potentially used to predict not only functional conformations but also pathways of conformational change, protein folding, and protein-protein association.

Zahra Shamsi, Alexander S. Moffett and Diwakar Shukla*, Enhanced unbiased sampling of protein dynamics using evolutionary coupling information. Scientific Reports, volume 7, Article number: 12700, 2017.