Examining the Challenges in Archiving Instagram
CoRR(2024)
Abstract
To prevent the spread of disinformation on Instagram, we need to study the
accounts and content of disinformation actors. However, due to their malicious
nature, Instagram often bans accounts that are responsible for spreading
disinformation, making these accounts inaccessible from the live web. The only
way we can study the content of banned accounts is through public web archives
such as the Internet Archive. However, there are many issues present with
archiving Instagram pages. Specifically, we focused on the issue that many
Wayback Machine Instagram mementos redirect to the Instagram login page. In
this study, we determined that mementos of Instagram account pages on the
Wayback Machine began redirecting to the Instagram login page in August 2019.
We also found that Instagram mementos on Archive.today, Arquivo.pt, and
Perma.cc are also not well archived in terms of quantity and quality. Moreover,
we were unsuccessful in all our attempts to archive Katy Perry's Instagram
account page on Archive.today, Arquivo.pt, and Conifer. Although in the
minority, replayable Instagram mementos exist in public archives and contain
valuable data for studying disinformation on Instagram. With that in mind, we
developed a Python script to web scrape Instagram mementos. As of August 2023,
the Python script can scrape Wayback Machine archives of Instagram account
pages between November 7, 2012 and June 8, 2018.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined