Internet Archive – Dataconomy https://dataconomy.ru Bridging the gap between technology and business Mon, 14 Oct 2024 09:10:35 +0000 en-US hourly 1 https://dataconomy.ru/wp-content/uploads/2022/12/DC-logo-emblem_multicolor-75x75.png Internet Archive – Dataconomy https://dataconomy.ru 32 32 Internet Archive data breach status: BRB within days https://dataconomy.ru/2024/10/14/internet-archive-data-breach-outage-hacked/ Mon, 14 Oct 2024 09:10:35 +0000 https://dataconomy.ru/?p=59199 The Internet Archive is gearing up to return “within days,” following a cyberattack that temporarily shut down the platform’s vast digital library and the Wayback Machine, exposing millions of users to data theft. Founder Brewster Kahle confirmed the news as his team worked around the clock to secure the site and strengthen its defenses against […]]]>

The Internet Archive is gearing up to return “within days,” following a cyberattack that temporarily shut down the platform’s vast digital library and the Wayback Machine, exposing millions of users to data theft. Founder Brewster Kahle confirmed the news as his team worked around the clock to secure the site and strengthen its defenses against future breaches. But what exactly happened, and how should we rethink our assumptions about the security of public digital resources?

What’s the status of the Internet Archive data breach?

Last week, we reported the Internet Archive data breach that compromised over 31 million user accounts, sending shockwaves through the Archive’s global community. The scale of the attack was revealed when visitors to the site were met with a pop-up warning of the breach. It didn’t take long for Troy Hunt, founder of the widely used Have I Been Pwned (HIBP) platform, to confirm the severity of the situation. Hunt received a file containing sensitive data from the breach, which included email addresses, screen names, and bcrypt-hashed passwords—an all-too-familiar pattern in today’s world of cyberattacks, according to a BleepingComputer report. It was a serious reminder for millions of users that no platform, not even one as respected and resource-rich as the Internet Archive, is immune to these threats.

This was no simple breach. The attackers orchestrated a coordinated effort that included a Distributed Denial of Service (DDoS) attack alongside the data theft, briefly taking the Internet Archive offline. While we’ve grown used to hearing about such attacks on tech giants and social platforms, the breach of an organization like the Internet Archive—often regarded as a digital safe haven for historical records and research—hits differently. The implications stretch beyond the exposure of personal data; it’s a direct threat to the preservation of digital history.

Internet Archive data breach
The Internet Archive is gearing up to return within days (Image credit)

But what’s even more unsettling is the unclear motive behind the attack. Jason Scott, an archivist at the Internet Archive, took to Mastodon to share that the attackers didn’t issue demands or ask for a ransom. Instead, they seemed driven purely by the desire to disrupt and destabilize, leaving the digital community wondering what’s next? The group claiming responsibility, identified as SN_Blackmeta on X, had previously targeted the Internet Archive in May. This suggests a pattern of ongoing attempts to undermine the platform’s operations, raising concerns that we’re only seeing the beginning of a sustained campaign of attacks.

Brewster Kahle and his team worked hard to quickly mitigate the damage. They disabled the compromised JavaScript library used to deface the site and began upgrading their overall security protocols. Kahle hinted that the aggressive nature of the hackers means that more attacks could be looming on the horizon but assured the public that user data remains safe despite the current downtime.

For now, visitors to the Internet Archive’s site will be greeted by a notice stating that it’s “temporarily” offline, with no access to the Wayback Machine. According to Kahle, this is a necessary precaution while the team works to bolster its defenses. “The data is safe. Services are offline as we examine and strengthen them. Sorry, but needed,” he explained in a public statement, estimating that the platform will be back up in days rather than weeks.

Internet Archive data breach
The current status of Internet Archive (Image credit)

The broader question this breach raises is: How secure is the digital record of our collective memory? The Internet Archive is no ordinary platform—it’s a critical repository of knowledge, culture, and history. When such an essential service is compromised, it shakes our confidence in the Archive’s ability to protect itself and in the security of all public digital archives. This isn’t just about one attack; it’s about the growing vulnerability of the systems we rely on to preserve the past.


The internet of trusted things


As the Archive works to recover, users must face an uncomfortable truth. The platforms we trust with our personal data—and, more crucially, our collective digital heritage—are under constant threat. The real question isn’t whether the Internet Archive will be back online soon (it will), but whether we’re prepared for what comes next. Will we hold platforms accountable for their security failures, or will we allow the threat of cyberattacks to become just another part of digital life?

The Internet Archive will soon be operational again, but the repercussions of this breach will likely resonate long after its services are restored. This attack is a warning, and we’d be wise to pay attention. The question is, will we?


Featured image credit: Kerem Gülen/Unsplash

]]>
Details of Internet Archive breach reveal 31 million accounts got compromised https://dataconomy.ru/2024/10/10/details-of-internet-archive-breach/ Thu, 10 Oct 2024 08:54:04 +0000 https://dataconomy.ru/?p=59090 The Internet Archive breach has resulted in the exposure of 31 million user accounts, leaving many concerned about the security of their personal information. The breach was first revealed on Wednesday when visitors to the Internet Archive site encountered a pop-up warning about the attack. The message referred users to Have I Been Pwned (HIBP), […]]]>

The Internet Archive breach has resulted in the exposure of 31 million user accounts, leaving many concerned about the security of their personal information. The breach was first revealed on Wednesday when visitors to the Internet Archive site encountered a pop-up warning about the attack.

The message referred users to Have I Been Pwned (HIBP), a platform where individuals can check if their information has been compromised in data leaks. HIBP’s operator, Troy Hunt, confirmed that he received a file containing data from the Internet Archive breach, which included email addresses, screen names, and bcrypt-hashed passwords.

This cyberattack coincided with a Distributed Denial of Service (DDoS) attack, which further disrupted the Internet Archive’s services. As a result, the site briefly went offline, with visitors encountering a message stating that its services were temporarily unavailable.

Timeline of the Internet Archive breach

The Internet Archive breach was uncovered when HIBP received and validated a file containing sensitive data from the archive’s users. Hunt cross-checked the data and notified the Internet Archive about the breach on October 6th. While the Archive was in the process of handling the situation, the site was also hit by a DDoS attack, slowing down its operations and making it difficult for users to access the platform.

Jason Scott, an archivist at the Internet Archive, noted on Mastodon that the attackers didn’t make any specific demands. The group behind the breach seemed more focused on causing disruption, with no clear motive for the attack.

Internet Archive breach
Jason Scott stated that the attackers did not make any specific demands during the breach (Image credit)

Security measures following the Internet Archive breach

Brewster Kahle, founder of the Internet Archive, confirmed the Internet Archive breach and outlined the immediate steps taken to secure the platform. The Internet Archive team disabled the compromised JavaScript library that was used to deface the site, while also upgrading their overall security measures. Kahle also hinted that more attacks could be on the horizon, given the aggressive nature of the hackers.

An X account called SN_Blackmeta claimed responsibility for the DDoS attack and the Internet Archive breach, and even suggested that more attacks were planned. This group had previously targeted the Internet Archive in May, indicating a pattern of recurring disruption attempts aimed at the platform.

The aftermath of Internet Archive data breach

One of the most concerning aspects of the Internet Archive breach is that 54% of the affected accounts were already compromised in previous data breaches, according to HIBP. This raises the risk of further security threats for users who may have reused passwords across multiple platforms.

Even though the Internet Archive is back online, the platform continues to work on improving its security and restoring full functionality.

Users are advised to follow the Internet Archive’s official X account for updates on the recovery process.

Is it safe to use the Internet Archive?

Using the Internet Archive can still be considered relatively safe, but there are important factors to keep in mind, especially in light of recent security incidents like the Internet Archive breach.

Following the breach, the Internet Archive has taken steps to enhance its security. They disabled the compromised JavaScript library and upgraded their overall security measures to prevent future incidents. If you have an account with the Internet Archive, it’s crucial to change your password immediately, especially if you reuse passwords across multiple platforms. Using a unique, strong password for each account is a good practice.

The breach involved the exposure of personal information, including email addresses and hashed passwords. If your account information was part of the breach, it’s wise to monitor your email and accounts for any suspicious activity.


Featured image credit: Emre Çıtak/Ideogram AI

]]>
Hachette v. Internet Archive: If the Archive were an AI tool, would the ruling change? https://dataconomy.ru/2024/09/05/hachette-v-internet-archive-ai/ Thu, 05 Sep 2024 11:11:05 +0000 https://dataconomy.ru/?p=57744 The Internet Archive has lost a significant legal battle after the US Court of Appeals upheld a ruling in Hachette v. Internet Archive, stating that its book digitization and lending practices violated copyright law. The case stemmed from the Archive’s National Emergency Library initiative during the pandemic, which allowed unrestricted digital lending of books, sparking […]]]>

The Internet Archive has lost a significant legal battle after the US Court of Appeals upheld a ruling in Hachette v. Internet Archive, stating that its book digitization and lending practices violated copyright law. The case stemmed from the Archive’s National Emergency Library initiative during the pandemic, which allowed unrestricted digital lending of books, sparking backlash from publishers and authors. The court rejected the Archive’s fair use defense, although it acknowledged its nonprofit status. This ruling strengthens authors’ and publishers’ control over their works. But it immediately reminds me of how AI tools train and use data on the Internet, including books and more. If the nonprofit Internet Archive’s work is not fair use, how do the paid AI tools use this data? 

Despite numerous AI copyright lawsuits, text-based data from news outlets usually doesn’t result in harsh rulings against AI tools, often ending in partnerships with major players.

You might think it’s different and argue that the Internet Archive directly uses books, but even though AI tools rely on all the data they have to generate your essay, you can still get specific excerpts or more detailed responses from them if you use a well-crafted prompt.

Hachette v. Internet Archive: US Court of Appeals rules against Internet Archive's book lending, remind me issues for AI's use of copyrighted data.The Hachette v. Internet Archive case highlights significant concerns about how AI models acquire training data, especially when it involves copyrighted materials like books. AI systems often rely on large datasets, including copyrighted texts, raising similar legal challenges regarding unlicensed use. If courts restrict the digitization and use of copyrighted works without permission, AI companies may need to secure licenses for the texts used in training, adding complexity and potential costs. This could limit access to diverse, high-quality datasets, ultimately affecting AI development and innovation.

Additionally, the case underlines the limitations of the fair use defense in the context of transformative use, which is often central to AI’s justification for using large-scale text data. If courts narrowly view what constitutes fair use, AI developers might face more restrictions on how they access and use copyrighted books. This tension between protecting authors’ rights and maintaining open access to knowledge could have far-reaching consequences for the future of AI training practices and the ethical use of data.

Hachette v. Internet Archive: US Court of Appeals rules against Internet Archive's book lending, remind me issues for AI's use of copyrighted data.Need a deeper dive into the case? Here is everything you need to know about it.

Hachette v. Internet Archive explained

Hachette v. Internet Archive is a significant legal case that centers around copyright law and the limits of the “fair use” doctrine in the context of digital libraries. The case began in 2020, when several large publishing companies—Hachette, HarperCollins, Penguin Random House, and Wiley—sued the Internet Archive, a nonprofit organization dedicated to preserving digital copies of websites, books, and other media.

The case focused on the Archive’s practice of scanning books and lending them out online.

Hachette v. Internet Archive: US Court of Appeals rules against Internet Archive's book lending, remind me issues for AI's use of copyrighted data.The story behind the Internet Archive lawsuit

The Open Library project, run by the Internet Archive, was set up to let people borrow books digitally. Here’s how it worked:

  • The Internet Archive bought physical copies of books.
  • They scanned these books into digital form.
  • People could borrow a digital version, but only one person at a time could check out a book, just like borrowing a physical book from a regular library.

The Internet Archive thought this was legal because they only let one person borrow a book at a time. They called this system Controlled Digital Lending (CDL). The idea was to make digital lending work just like physical library lending.

When the COVID-19 pandemic hit in early 2020, many libraries had to close, making it hard for people to access books. To help, the Internet Archive launched the National Emergency Library (NEL) in March 2020. This program changed things:

  • The NEL allowed multiple people to borrow the same digital copy of a book at the same time. This removed the one-person-at-a-time rule.
  • The goal was to give more people access to books during the pandemic, especially students and researchers who were stuck at home.

While the NEL was meant to be temporary, it upset authors and publishers. They argued that letting many people borrow the same digital copy without permission was like stealing their work.

Publishers’ riot

In June 2020, the big publishers sued the Internet Archive. They claimed:

  • The Internet Archive did not have permission to scan their books or lend them out digitally.
  • By doing this, the Internet Archive was violating their copyright, which gives them the exclusive right to control how their books are copied and shared.
  • The NEL’s approach, which let many people borrow digital copies at once, was especially harmful to their business and was essentially piracy.

Hachette v. Internet Archive: US Court of Appeals rules against Internet Archive's book lending, remind me issues for AI's use of copyrighted data.The publishers argued that the Internet Archive’s actions hurt the market for their books. They said people were getting free digital versions instead of buying ebooks or borrowing from licensed libraries.

Internet Archive’s defense

The Internet Archive defended itself by claiming that its work was protected by fair use. Fair use allows limited use of copyrighted material without permission for purposes like education, research, and commentary. The Archive made these points:

  • They were providing a transformative service by giving readers access to physical books in a new, digital form.
  • They weren’t making a profit from this, as they’re a nonprofit organization with the mission of preserving knowledge and making it accessible.
  • The NEL was a temporary response to the pandemic, and they were trying to help people who couldn’t access books during the crisis.

They also pointed to their Controlled Digital Lending system as a way to respect copyright laws. Under CDL, only one person could borrow a book at a time, just like in a physical library.

The court’s decisions

District Court Ruling (March 2023)

In March 2023, a federal court sided with the publishers. Judge John G. Koeltl ruled that the Internet Archive’s actions were not protected by fair use. He said:

  • The Internet Archive’s digital lending was not transformative because they weren’t adding anything new to the books. They were simply copying them in digital form, which wasn’t enough to qualify for fair use.
  • The court also found that the Archive’s lending hurt the market for both printed and digital versions of the books. By offering free digital copies, the Internet Archive was seen as competing with publishers’ ebook sales.
  • The court concluded that the Archive had created derivative works, which means they made new versions of the books (digital copies) without permission.

Hachette v. Internet Archive: US Court of Appeals rules against Internet Archive's book lending, remind me issues for AI's use of copyrighted data.Appeals Court Ruling (August 2023)

The Internet Archive appealed the decision to a higher court, the US Court of Appeals for the Second Circuit, hoping to overturn the ruling. However, the appeals court also ruled in favor of the publishers but made one important clarification:

  • The court recognized that the Internet Archive is a nonprofit organization and not a commercial one. This distinction was important because commercial use can often weaken a fair use defense, but in this case, the court acknowledged that the Archive wasn’t motivated by profit.
  • Despite that, the court still agreed that the Archive’s actions weren’t protected by fair use, even though it’s a nonprofit.

Bottom line

The Hachette v. Internet Archive case has shown that even nonprofits like the Internet Archive can’t freely digitize and lend books without violating copyright laws. This ruling could also affect how AI companies use copyrighted materials to train their systems. If nonprofits face such restrictions, AI tools might need to get licenses for the data they use. Even if they have already started to make some deals, I wonder, what about the first entries?


Featured image credit: Eray Eliaçık/Bing

]]>