A single-cell RNA-seq Training and Analysis Suite using the Galaxy Framework
GigaScience(2020)
Abstract
Background The vast ecosystem of single-cell RNA-seq tools has until recently been plagued by an excess of diverging analysis strategies, inconsistent file formats, and compatibility issues between different software suites. The uptake of 10x Genomics datasets has begun to calm this diversity, and the bioinformatics community leans once more towards the large computing requirements and the statistically-driven methods needed to process and understand these ever-growing datasets.
Results Here we outline several Galaxy workflows and learning resources for scRNA-seq, with the aim of providing a comprehensive analysis environment paired with a thorough user learning experience that bridges the knowledge gap between the computational methods and the underlying cell biology. The Galaxy reproducible bioinformatics framework provides tools, workflows and trainings that not only enable users to perform one-click 10x preprocessing, but also empowers them to demultiplex raw sequencing from custom tagged and full-length sequencing protocols. The downstream analysis supports a wide range of high-quality interoperable suites separated into common stages of analysis: inspection, filtering, normalization, confounder removal and clustering. The teaching resources cover an assortment of different concepts from computer science to cell biology. Access to all resources is provided at the [singlecell.usegalaxy.eu][1] portal.
Conclusions The reproducible and training-oriented Galaxy framework provides a sustainable HPC environment for users to run flexible analyses on both 10x and alternative platforms. The tutorials from the Galaxy Training Network along with the frequent training workshops hosted by the Galaxy Community provide a means for users to learn, publish and teach scRNA-seq analysis.
Key Points
### Competing Interest Statement
The authors have declared no competing interest.
* ### List of abbreviations
DOI
: Digital Object Identifier
GTN
: Galaxy Training Network
HDF5
: Hierarchical Data Format 5
HPC
: High Performance Computing
PAGA
: Partition-based Graph Abstraction
PCA
: Principal Component Analysis
scRNA
: Single-Cell RNA
tSNE
: t-distributed Stochastic Network Embeddings
UMAP
: Uniform Manifold Approximation and Projection
UMI
: Unique Molecular Identifier
[1]: http://singlecell.usegalaxy.eu
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined