Are Multidimensional Boolean Patterns Dominating Microbiome and Microbial Genome Data?

Research Square (Research Square)(2022)

引用 0|浏览1
暂无评分
摘要
Abstract Background: Virtually every biological system is governed by the complex relations among its components. Identifying such relations requires a rigorous or heuristics-based search for patterns among variables/features of a system. A number of algorithms have been developed to identify two-dimensional (involving two variables) patterns employing correlation, covariation, mutual information, etc. It seems obvious, however, that comprehensive descriptions of complex biological systems may also include more complicated multidimensional relations, which can only be described using patterns that simultaneously embrace 3, 4, and more variables. The main challenges in the search for such multidimensional patterns include: (a) computational complexity of the search; (b) distinction of statistically significant patterns from false patterns which can be observed in large data sets simply by chance; and (3) integration of heterogeneous data types (numerical, Boolean, categorical, etc.) in a single pattern.Results: This manuscript presents an attempt to address some of these challenges by defining multidimensional Boolean patterns in a way permitting to: (a) accommodate heterogeneous multi-omics data, (b) formulate criteria for separating trivial from non-trivial patterns, and (c) identify conditions, required for a given pattern to predict the values of selected feature(s). Additionally, the proposed definition of the pattern’s strength (pattern’s score) and minimal population threshold permits estimation of the statistical significance of detected patterns using scores distributions of artificial datasets created by randomizing original data.Conclusion: To test the proposed approach we performed a search for all possible 2-, 3-, and 4-dimensional patterns in historical data from the Human Microbiome Project (15 body sites) and collection of H. pylori genomes associated with gastric ulcers, gastritis, and duodenal ulcers. In all datasets under consideration, we were able to identify hundreds of statistically significant multidimensional patterns. These results suggest that such patterns may dominate the landscape of microbial genomics/microbiomics systems.
更多
查看译文
关键词
microbial genome data,microbiome,patterns
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要