Modifying Transactional Databases to Hide Sensitive Association Rules

INFORMATION SYSTEMS RESEARCH(2022)

引用 3|浏览28
暂无评分
摘要
Firms have been sharing transactional data with business partners ever since electronic data interchange was introduced to the retail industry in the 1980s. The potential benefits of data sharing notwithstanding, there has been continued reluctance on the part of data owners to share their data, for fear of sensitive information potentially making its way to competitors. Approaches that can help hide sensitive information could alleviate such concerns and increase the number of firms that are willing to share. Sensitive information in transactional databases often manifests itself in the form of association rules. Association rules can be concealed by altering transactions such that these sensitive rules stay hidden when the data are mined. The problem of hiding sensitive association rules is NP-hard, and to date, it has only been addressed via heuristic approaches. In this paper, we introduce a nonlinear integer formulation to hide sensitive association rules while maximizing the accuracy of the altered database. We then separate it into two problems: the sanitization problem, which hides sensitive association rules from a specific transaction while altering the transaction as minimally as possible, and the accuracy maximization problem, which maximizes the accuracy of the altered database, given a solution to the sanitization problem. We show how the sanitization problem can be represented as an integer program and propose a heuristic based on intuition from this formulation to solve it. Next, we formulate the accuracy maximization problem as a nonlinear integer program, show how it can be linearized, and derive various results that help reduce the size of the problem to be solved. Computational experiments are conducted on real and synthetic data sets, the largest of which has 100 million transactions. Our results show that although the nonlinear integer formulations are not practical, the linearizations and problem-reduction steps make a significant impact on solvability and solution time. We also find that there are substantial gains to be realized vis-a-vis existing approaches in terms of both solution quality and solution time and that hiding the rules directly (rather than by hiding the associated itemsets) can result in significantly fewer transactions being sanitized.
更多
查看译文
关键词
database sharing,maximizing accuracy,association rule hiding,data quality
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要