Acyclic Identification of Aptamer from Over-Represented Libraries Using Hash Functions

Bioengineering Conference(2013)

引用 0|浏览20
暂无评分
摘要
In recent years, with the advent of fast sequencing technology, the genomic database is growing rapidly. Researchers in bioinformatics field are expecting faster and more accurate tools to effectively analyze the gigantic data sets. In the context of aptamer search, the goal is to search for the over-represented DNA sequences compared with random background libraries on the same chip. Hash functions are widely used in substring comparison, sequence alignment and clustering tools. We have developed a light-weighted tool that takes advantage of the hash functions to reduce the size of genomic data and conducts k-neighbor searches on the centroid sequence. This greatly improves the efficiency of the search compared with the existing tool. Furthermore, the calculation of k-neighbor hash values decreases the mutant searching overhead. In a dataset of 1 million sequences, the program accurately counted the frequency of the Human alpha-Thrombin sequence and found the mutant versions of the target sequence in less than 40 seconds, whereas the existing method takes 8280 seconds (2 hours 13 minutes).
更多
查看译文
关键词
dna,bioinformatics,genomics,molecular configurations,organic compounds,dna sequences,acyclic aptamer identification,aptamer search,bioinformatic field,centroid sequence,clustering tools,fast sequencing technology,genomic data size,genomic database,gigantic data sets,hash functions,human alpha-thrombin sequence,k-neighbor hash values,k-neighbor searches,light-weighted tool,random background libraries,sequence alignment,apatmer,hash,overrepresented library,sequential analysis,biomedical engineering
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要