Algorithms for Graph Similarity and Subgraph Matching

mag(2011)

引用 122|浏览9
暂无评分
摘要
We deal with two independent but related problems, those of graph similarity and subgraph matching, which are both important practical problems useful in several fields of science, engineering and data analysis. For the problem of graph similarity, we develop and test a new framework for solving the problem using belief propagation and related ideas. For the subgraph matching problem, we develop a new algorithm based on existing techniques in the bioinformatics and data mining literature, which uncover periodic or infrequent matchings. We make substantial progress compared to the existing methods for both problems. 1 Problem Definitions and Statement of Contributions 1.1 Graph Similarity Problem 1 1 Given: two graphs G1(n1, e1) and G2(n2, e2), with possibly different number of nodes and edges, and the mapping between the graphs’ nodes. Find: (a) an algorithm to calculate the similarity of the two graphs, which returns (b) a measure of similarity (a real number between 0 and 1) that captures intuition well. Innovations: a) We develop a method involving belief propagation, unseen in literature, to solve this problem b) The method (and its fast linearized approximate version) gives extremely agreeable results c) Except for scalability, we know of no shortcomings of this method. 1.2 Subgraph Matching Problem 2 Given: a graph time series, where there are T number of graphs. Find: (a) An algorithm to find approximate subgraphs that occur in a subset of the T graphs. (b) Where the approximate subgraphs may not occur in the majority of the time points, but in local sections of the time series Innovations: a) We develop a principled approach to selecting the important time components from which subgraphs should be mined. Our method is also tailored for the problem of selecting subgraphs in biological networks. For this, we use sparse PCA which has not been for this application domain. b) Scalability: Our method is both fast and scalable to real biological data (1000s of nodes). However, it has not been demonstrated whether it can scale to extremely large networks of more than 10 000 nodes. c) The method gives results that are easy to interpret and biologically sensible. Disclaimer of interests intersecting with course project Aaditya may use the PhoneCall dataset for his DAP. Danai is interested in graph similarity and belief propagation for research. Ankur has used tensors for his research, but in a different context. Jing has used CODENSE before, and is interested in improving it for research purposes. None of the authors have other course projects this term. The following are the papers read for this course (refer to the numbering in the references section): Jing [21], [28], [22], Ankur [20], [18], [9], Aaditya [5], [26], [27], Danai [10], [14], [15].
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要