Analysis of Website Consistency and Implications Regarding Domain Trustworthiness

Seth Lifland,Daniel Jackowitz

semanticscholar(2015)

引用 0|浏览0
暂无评分
摘要
We are all aware that some websites are more trustworthy than others, but website trustworthiness is a difficult concept to define. This paper examines website consistency as a potential metric for website trustworthiness. We define website consistency as the degree to which a website is formed from identical resources across multiple fetches of the website. There is more to website consistency than the pure visual appearance of a webpage; it is possible for a website to refer to the same resource with multiple URLs, thereby encoding extraneous identifying information in the URL itself. We define a perfectly consistent website as one such that every resource referred to by the top level URL is identical and maps to identical contents across all fetches of the site. Our input data set is the Alexa Top 100 International sites list. We initially fetch each site several times using a scriptable browser and for each fetch, store a list of resources requested by the top-level site. A processing script then analyzes the results of all fetches to compute statistics about the consistency of each website. Additionally, in the case of slightly-varying resource URLs that map to the same contents the processing script attempts to reduce this set of “synonym” URLs to a single URL with only the parts that are common to all variations and determines whether fetching the “reduced” URL yields the same resource as any URL in the original “synonym set.” Our preliminary results demonstrate that website consistency varies widely across sites in our data set. Additionally, the success rate of reducing sets of synonym URLs to a single URL and acquiring a resource with the same contents varies widely. Future work is needed to adequately account for these variations, although we offer some conjectures as to their possible causes. Website Consistency Analysis Seth Lifland 3
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要