The SKMT Algorithm: A method for assessing and comparing underlying protein entanglement

PLOS COMPUTATIONAL BIOLOGY(2023)

Cited 0|Views0
No score
Abstract
We present fast and simple-to-implement measures of the entanglement of protein tertiary structures which are appropriate for highly flexible structure comparison. These are performed using the SKMT algorithm, a novel method of smoothing the C alpha backbone to achieve a minimal complexity curve representation of the manner in which the protein's secondary structure elements fold to form its tertiary structure. Its subsequent complexity is characterised using measures based on the writhe and crossing number quantities heavily utilised in DNA topology studies, and which have shown promising results when applied to proteins recently. The SKMT smoothing is used to derive empirical bounds on a protein's entanglement relative to its number of secondary structure elements. We show that large scale helical geometries dominantly account for the maximum growth in entanglement of protein monomers, and further that this large scale helical geometry is present in a large array of proteins, consistent across a number of different protein structure types and sequences. We also show how these bounds can be used to constrain the search space of protein structure prediction from small angle x-ray scattering experiments, a method highly suited to determining the likely structure of proteins in solution where crystal structure or machine learning based predictions often fail to match experimental data. Finally we develop a structural comparison metric based on the SKMT smoothing which is used in one specific case to demonstrate significant structural similarity between Rossmann fold and TIM Barrel proteins, a link which is potentially significant as attempts to engineer the latter have in the past produced the former. We provide the SWRITHE interactive python notebook to calculate these metrics. There is much interest in the development of quantitative methods to compare different protein structures or identify common substructures across protein families. As our understanding of the flexible and dynamic nature of protein structures advances it will be necessary to develop methods for comparing protein structure which accounts for this flexibility. This can be achieved by assessing and comparing the underlying shape of protein structures which are not obfuscated by the small scale (primary and secondary) complexity of the structure, and instead focus on their large scale (tertiary) entanglement. Here we present such a novel set of quantitative measures by smoothing and simplifying the amino-acid backbone into a minimal representation of its true flexibility. We demonstrate these measures of a protein chain's self-entanglement have a number of critical properties which make them potentially impactful. First, by studying the distribution of entanglement across a wide sample of proteins, we show that there exists a minimum expected amount (a lower bound) of entanglement given the protein's length. This bound is shown to be useful in ensuring realistic predictions from experimental structural determination methods. Second, using fundamental properties of this entanglement measure, we identify the presence of helical structures across various length scales in proteins, which provide stability to the structure. Third, we show they can be used to highlight significant structural similarity between two families of proteins currently classed as distinct, but which have been shown to share a surprising experimental link. Finally, we provide an interactive python notebook to compute these measures for a given protein.
More
Translated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined