The ephemerality of online data: Are our photos safe?
The flipside to our seemingly indelible digital footprint are the images, emails and posts that suddenly disappear from the web
On September 1, 2017, Tuenti — the Spanish social media network launched in 2006, which for a time was the platform par excellence for teenagers — shut down permanently. With it, the more than six billion photos that users had uploaded disappeared. The network had warned this would happen and provided users with a tool to download their albums, but there were many people who, due to forgetfulness, carelessness or not being aware of what was to come, lost those photos.
Anyone who has more than a decade of experience on the internet will be able to tell stories about data that they once believed to be eternal disappearing into the ether. Emails evaporate when you stop using an email service (even if you don’t close your account); messages and posts cease to exist when a forum disappears; blogs are deleted when the platform that hosted them closes; photos are erased when the company imposes a new limit; migration failures cause 50 million songs to be lost.
The latest example is from MySpace, which in 2019 announced that oops, something had gone wrong and it was impossible to recover those lost songs, uploaded to the service between 2009 and 2015. A few days later, however, the organization for the preservation of the web Internet Archive published a catalog with almost 500,000 of those audio files. In other words, users had lost access and MySpace had not made backup copies, but much of the data was hosted elsewhere. In this case, an academic group had downloaded all that music a few years earlier and sent it to the Internet Archive. But if the person who has lost their photos or emails does not know that they are on other servers or does not have access to them, they will feel that they have effectively disappeared. Which, on the other hand, is not so unlikely.
“All our online content will disappear completely sooner or later,” says Daniel Gayo Avello, professor in the Area of Computer Languages and Systems at the University of Oviedo in Spain. How long it will take for it to disappear, he explains, will depend on how actively we work to preserve it. “If all my photographs, videos, messages and emails are on some platform, their permanence depends, obviously, on the terms of use and the very survival of the platform. For example, depending on the terms of use, my content may disappear after a while of not accessing my account (i.e. I wouldn’t trust my Hotmail emails to still be there). On the other hand, if the company that owns the platform decides so, that content can disappear from one day to the next,” he elaborates.
Believing that the personal story we have been uploading or publishing in different corners of the internet will always be there is a somewhat naive attitude. Gayo Avello compares the web to a forest. “It can have been in one place for centuries and, although some of its trees may be centuries old, most of them are not. Trees grow, change, die, and the forest sometimes grows, but at other times it diminishes, either by chance events or by intentional actions. It’s the same with the web, some websites come and some go,” he explains.
There are figures on this: a recent Pew Research report indicates that 25% of the websites that existed at some point between 2013 and 2023 no longer exist. If we look at the oldest ones, those from 2013, the rate of disappearance rises to 38%. By way of example, Gayo Avello points to the Million Dollar Page website, a relic from almost 20 years ago that sought “a form of monetization that today seems quite childish: selling each pixel of a 1000x1000 pixel banner for $1. Each advertiser could buy the portion they wanted and have a link to their site. In 2014, less than 10 years after its launch, more than 20% of the targeted sites no longer existed,” he explains.
Returning to our personal files that are hosted on different services, should we begin to fear their disappearance? Are the images we have, for example, in Google Photos, in danger? Lorena González Manzano, cybersecurity specialist and member of the Computer Security Las (COSEC) working group at Carlos III University in Spain, explains that “nothing is 100% secure and it can always be attacked.” However, “if the service provider is trustworthy or a large company (for example, Google), we can assume it is reasonably secure.”
A cyberattack could end up with data deleted, but it’s more common for the host to have “systems to prevent, both in the event of a cyberattack and in the event of a service being down, user data from being lost.” Furthermore, González Manzano continues, the attackers’ objective is usually not to delete the data, but simply to access it. “However, there could be ransomware-type attacks that access the service where our data is hosted, encrypt it and ask for money, either from us or from the company, in order to recover it or not to disclose it or make it public,” she says.
Disappearing data
The disappearance of websites and personal posts also means the loss of very valuable sources of documentation when writing the history of these decades. In order to preserve at least some of the richness of the web, organizations like Archive Team have been archiving web content for years. This content includes blogs (if they are associated with inactive Google accounts, they may disappear), YouTube videos, and public and relevant messages on Telegram.
“The main problem with working in digital environments is the ephemerality of data,” says Elisa García Mingo, doctor in Social Anthropology and professor at the Faculty of Political Sciences and Sociology at the Complutense University of Madrid. “We notice it because we are seeing it disappear in our research: an account that you follow, a website...,” she says.
A large part of scientific knowledge is also at risk. According to a study published earlier this year into how digital copies of scholarly articles are archived (in many cases no physical copy exists anymore), one third of publishers did not seem to have any kind of archival activity in place to preserve them. (And with copies in at least three archives, less than 1% of scholarly journal publishers did.)
On the other hand, talking about digital ephemerality does not mean that there is not also the opposite issue: the right to be forgotten. “By saving or publishing content while being aware of its permanence, you create an archive that you have no control over. It’s like having an archive, but not having control of the building it’s housed in, you don’t even have access to the staff who are managing it,” explains García Mingo, who studies digital sexual violence among young people.
How to preserve what we do want to save
In digital archiving, there are almost as many styles as there are people. García Mingo explains that it is a bit like what happened with physical documents. “There were people who, when developing the photos, selected them, organized them and made a very elaborate album, and those who simply put them in a box of cookies,” she says. The same thing happens in the digital world. “There are people who make an archive without archival awareness, and there are people who have a very high level of digital archiving. They are the two poles: from a giant trail that you leave in a kind of conscious chaos to the most elaborate practices, all the people who every year make an album or a calendar or a video summary,” she explains.
If what we want is to ensure that we will never want the shock of losing photos, emails or documents that we wanted to keep, the level of archiving must be raised a little higher. “The Library of Congress of the United States coined an acronym, IDOM, sometimes IDEOM, which means ‘identify, decide, export, organize and make copies’,” says Daniel Gayo Avello. Although the idea is simple, it requires “effort and perseverance.”
The expert explains the steps:
- “We must identify all the digital content we have and where (for example, photographs, videos, audio, messages, websites, other types of digital files, etc.).”
- Decide “what content is most important (for example, do we really need the 200 photos we took on that trip? Do I need a copy of all my emails?).”
- Depending on the content, we may need to export it: “emails, WhatsApp messages, our tweet archive…”.
- Organize material, which involves “giving meaningful names to files and organizing them into directory structures.” This part is key to later finding what we are looking for (Gayo Avello admits that he skips it, but then it takes him a long time to locate what he wants).
- Make copies. “The 3-2-1 rule can apply here: at least three copies of the data, using at least two different storage systems and with at least one copy in another physical location.”
What’s more, all this must be updated and maintained to ensure we are not saving content in obsolete formats that can no longer be opened.
Lorena González Manzano recommends encrypting sensitive data on external drives. On the other hand, if we do not want to rely on any service, “we can buy a hard drive to store the data ourselves or, better yet, a NAS, which is a high-capacity hard drive that recovers the data even if some of it is damaged, for example, by a loss of current/light.”
Sign up for our weekly newsletter to get more English-language news coverage from EL PAÍS USA Edition