Biclustering data analysis: a comprehensive survey.

Eduardo N Castanho, Helena Aidos, Sara C Madeira

Briefings in bioinformatics(2024)

Cited 0|Views0
No score
Abstract
Biclustering, the simultaneous clustering of rows and columns of a data matrix, has proved its effectiveness in bioinformatics due to its capacity to produce local instead of global models, evolving from a key technique used in gene expression data analysis into one of the most used approaches for pattern discovery and identification of biological modules, used in both descriptive and predictive learning tasks. This survey presents a comprehensive overview of biclustering. It proposes an updated taxonomy for its fundamental components (bicluster, biclustering solution, biclustering algorithms, and evaluation measures) and applications. We unify scattered concepts in the literature with new definitions to accommodate the diversity of data types (such as tabular, network, and time series data) and the specificities of biological and biomedical data domains. We further propose a pipeline for biclustering data analysis and discuss practical aspects of incorporating biclustering in real-world applications. We highlight prominent application domains, particularly in bioinformatics, and identify typical biclusters to illustrate the analysis output. Moreover, we discuss important aspects to consider when choosing, applying, and evaluating a biclustering algorithm. We also relate biclustering with other data mining tasks (clustering, pattern mining, classification, triclustering, N-way clustering, and graph mining). Thus, it provides theoretical and practical guidance on biclustering data analysis, demonstrating its potential to uncover actionable insights from complex datasets.
More
Translated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined