pyRforest: A comprehensive R package for genomic data analysis featuring scikit-learn Random Forests in R

biorxiv(2024)

Cited 0|Views0
No score
Abstract
Random Forest models are widely used in genomic data analysis and can offer insights into complex biological mechanisms, particularly where features influence the target in interactive, non-linear, or non-additive ways. Currently, some of the most efficient random forest methods, in terms of computational speed, are implemented in Python. However, many biologists use R for genomic data analysis, as R offers a unified platform for performing additional statistical analysis and visualization. Here we present an R package, pyRforest , which integrates Python scikit-learn \`RandomForestClassifier\` algorithms into the R environment. pyRforest inherits the efficient memory management and parallelization of Python, and is optimized for classification tasks on large genomic datasets, such as those from RNA-seq. pyRforest offers several additional capabilities, including a novel rank-based permutation method for biomarker identification. This method can be used to estimate and visualize p-values for individual features, allowing the researcher to identify a subset of features for which there is robust, statistical evidence of an effect. In addition, pyRforest includes methods for the calculation and visualization of SHapley ADditive Explanations (SHAP) values. Finally, pyRforest includes support for comprehensive downstream analysis for gene ontology and pathway enrichment. pyRforest thus improves the implementation and interpretability of random forest models for genomic data analysis by merging the strengths of Python with R. pyRforest can be downloaded at: with an associated vignette at . ### Competing Interest Statement The authors have declared no competing interest.
More
Translated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined