VariantSpark , A Random Forest Machine Learning Implementation for Ultra High Dimensional Data

biorxiv(2019)

引用 2|浏览9
暂无评分
摘要
The demands on machine learning methods to cater for ultra high dimensional datasets, datasets with millions of features, have been increasing in domains like life sciences and the Internet of Things (IoT). While are suitable for “wide” datasets, current implementations such as lack the ability to scale to such dimensions. Recent improvements by begin to address these limitations but do not extend to . This paper introduces , a novel implementation on top of and part of the platform, which parallelises processing of all nodes over the entire forest. is 9 and up to 89 times faster than and , respectively, and is the first method capable of scaling to millions of features.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要