A Public and Reproducible Assessment of the Topics API on Real Data
arxiv(2024)
摘要
The Topics API for the web is Google's privacy-enhancing alternative to
replace third-party cookies. Results of prior work have led to an ongoing
discussion between Google and research communities about the capability of
Topics to trade off both utility and privacy. The central point of contention
is largely around the realism of the datasets used in these analyses and their
reproducibility; researchers using data collected on a small sample of users or
generating synthetic datasets, while Google's results are inferred from a
private dataset. In this paper, we complement prior research by performing a
reproducible assessment of the latest version of the Topics API on the largest
and publicly available dataset of real browsing histories. First, we measure
how unique and stable real users' interests are over time. Then, we evaluate if
Topics can be used to fingerprint the users from these real browsing traces by
adapting methodologies from prior privacy studies. Finally, we call on web
actors to perform and enable reproducible evaluations by releasing anonymized
distributions. We find that 46
are uniquely re-identified across websites after only 1, 2, and 3 observations
of their topics by advertisers, respectively. This paper shows on real data
that Topics does not provide the same privacy guarantees to all users, further
highlighting the need for public and reproducible evaluations of the claims
made by new web proposals.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要