Nonparametric Combinatorial Sequence Models

Lecture Notes in Computer ScienceResearch in Computational Molecular Biology(2011)

引用 0|浏览0
暂无评分
摘要
This work considers biological sequences that exhibit combinatorial structures in their composition: groups of positions of the aligned sequences are “linked” and covary as one unit across sequences. If multiple such groups exist, complex interactions can emerge between them. Sequences of this kind arise frequently in biology but methodologies for analyzing them are still being developed. This paper presents a nonparametric prior on sequences which allows combinatorial structures to emerge and which induces a posterior distribution over factorized sequence representations. We carry out experiments on three sequence datasets which indicate that combinatorial structures are indeed present and that combinatorial sequence models can more succinctly describe them than simpler mixture models. We conclude with an application to MHC binding prediction which highlights the utility of the posterior distribution induced by the prior. By integrating out the posterior our method compares favorably to leading binding predictors.
更多
查看译文
关键词
Sequence models,Chinese restaurant process,Chinese restaurant franchise,MHC binding,mixture models
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要