Improved Python Package for DNA Sequence Encoding using Frequency Chaos Game Representation

Abhishek Halder, Piyush, Bernadette Mathew,Debarka Sengupta

biorxiv(2024)

引用 0|浏览0
暂无评分
摘要
Frequency Chaos Game Representation (FCGR), an extended version of Chaos Game Representation (CGR), emerges as a robust strategy for DNA sequence encoding. The core principle of the CGR algorithm involves mapping a one-dimensional sequence representation into a higher-dimensional space, typically in the two-dimensional spatial domain. This paper introduces a use case wherein FCGR serves as a kmer frequency-based encoding method for motif classification using a publicly available dataset. Availability and implementation: The FCGR python package, use case, along with additional functionalities, is available in the GitHub. Our FCGR package demonstrates superior accuracy and computational efficiency compared to a leading R-based FCGR library {lochel2020deep}, which is designed for versatile tasks, including proteins, letters, and amino acids with user-defined resolution. Nevertheless, it is important to note that our Python package is specifically designed for DNA sequence encoding, where the resolution is predetermined based on the kmer length. It is a drawback of our current package compared to the state-of-the-art R-based kaos package. ### Competing Interest Statement The authors have declared no competing interest.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要