Binary Coding in Stream.

arXiv: Data Structures and Algorithms(2015)

引用 23|浏览13
暂无评分
摘要
Big data is becoming ever more ubiquitous, ranging over massive video repositories, document corpuses, image sets and Internet routing history. Proximity search and clustering are two algorithmic primitives fundamental to data analysis, but suffer from the curse of dimensionality on these gigantic datasets. A popular attack for this problem is to convert object representations into short binary codewords, while approximately preserving near neighbor structure. However, there has been limited research on constructing codewords in the or settings often applicable to this scale of data, where one may only make a single pass over data too massive to fit in local memory. In this paper, we apply recent advances in matrix sketching techniques to construct binary codewords in both streaming and online setting. Our experimental results compete outperform several of the most popularly used algorithms, and we prove theoretical guarantees on performance in the streaming setting under mild assumptions on the data and randomness of the training set.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要