LAORAM: A Look Ahead ORAM Architecture for Training Large Embedding Tables

PROCEEDINGS OF THE 2023 THE 50TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE, ISCA 2023(2023)

引用 0|浏览4
暂无评分
摘要
Memory access patterns have been demonstrated to leak critical information such as security keys and a program's spatial and temporal information. This information leak poses a significant privacy challenge in machine learning models with embedding tables. Embedding tables are used to learn categorical features from training data. The address of an embedding table entry carries privacy sensitive information since the address of an entry discloses features associated with a user. Oblivious RAM (ORAM), and its enhanced variants, such as PathORAM, have emerged as viable solutions to hide leakage from memory access streams. PathORAM fetches an entire path of memory blocks for every memory fetch request, thereby leading to substantial bandwidth and performance overheads. In this work, we present Look Ahead ORAM (LAORAM), an ORAM framework designed to protect user privacy during embedding table training. LAORAM exploits the unique property of ML training, namely the training samples that are going to be used in the future are known beforehand. LAORAM preprocesses the training samples to identify the memory blocks which are accessed together in the near future. LAORAM combines multiple blocks accessed together as superblocks and tries to assign all blocks in a superblock to few paths. Thus, future accesses to a collection of blocks can be satisfied from a few paths, effectively reducing the number of reads and writes to the ORAM. To further increase performance, LAORAM uses a fat-tree structure for PathORAM, i.e. a tree with variable bucket size, effectively reducing the number of background evictions required, which improves the stash usage. We have evaluated LAORAM using both a recommendation model (DLRM) and an NLP model (XLM-R) embedding table configurations. LAORAM performs 5 times faster than PathORAM on a recommendation dataset (Kaggle) and 5.4 times faster on an NLP dataset (XNLI) while guaranteeing the same security guarantees as the original PathORAM.
更多
查看译文
关键词
Memory,ORAM,Security,Recommendation Systems,Embedding Tables
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要