Relative Scalability of NoSQL Databases for Genotype Data Manipulation.

Arthur Lorenzi Almeida,Vinícius Junqueira Schettino, Thiago Jesus Rodrigues Barbosa, Pedro Fernandes Freitas, Pedro Gabriel Silva Guimarães,Wagner Arbex

RITA(2018)

引用 23|浏览0
暂无评分
摘要
Genotype data manipulation is one of the greatest challenges in research fields such as population genetics, bioinformatics and genomics mainly because of high dimensionality and unbalancing characteristics. These peculiarities explain why relational database management systems (RDBMS), the de facto standard storage solution, have not been presented as the best tools for this kind of data. However, the Big Data advent has been pushing the development of modern database systems that might be able to overcome RDBMS deficiencies. In this context, we extended our previous works on the evaluation of relative performance among NoSQLs engines from different families, adapting the schema design in order to achieve better performance based on its conclusions, thus being able to store more SNP markers for each individual. Using Yahoo! Cloud Serving Benchmark (YCSB) benchmark framework, we assessed each database system over hypothetical genotype  data (SNP markers). Results indicate that Tarantool is approximately 21,8% more efficient than MongoDB when storing 770,000 SNP markers, but MongoDB is less impacted by the increase of SNP markers per individual.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要