Developing Biceps to completely compute in subquadratic time a new generic type of bicluster in dense and sparse matrices

Data Mining and Knowledge Discovery(2022)

引用 0|浏览10
暂无评分
摘要
Given an m -by- n real matrix, biclustering aims to discover relevant submatrices. This article defines a new type of bicluster. In any of its columns, the values in the rows of the bicluster must be all strictly greater than those in the rows absent from it, hence the discovery of a binary clustering of the rows in the restricted context of the columns of the bicluster. To only keep the best bicluster among those carrying redundant information, its rows must not be a subset or a superset of the rows of another bicluster of greater or equal quality. Any computable function can be chosen to assign qualities to the biclusters. In that respect, the proposed definition is generic. Dynamic programming and appropriate data structures allow to exhaustively list the biclusters satisfying it within O(m^2n + mn^2) time, plus the time to compute O ( mn ) qualities. After some adaptations, the proposed algorithm, Biceps, remains subquadratic if its complexity is expressed in function of m_non-minn , where m_non-min is the maximal number of non-minimal values in a column, i. e., for sparse matrices. Experiments on three real-world datasets demonstrate the effectiveness of the proposal in different application contexts. They also show its good theoretical efficiency is practical as well: two minutes and 5.3 GB of RAM are enough to list the desired biclusters in a dense 801-by-20,531 matrix; 3.5s and 192 MB of RAM for a sparse 631,532-by-174,559 matrix with 2,575,425 nonzero values.
更多
查看译文
关键词
Exclusive-columns biclustering, Exhaustive enumeration, Dynamic programming, Subquadratic complexity
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要