

If this were true (which is nearly impossible since you said “all”), stuff like Anubis wouldn’t exist since you could just toss up a crowd-sourced robots.txt and be done with it.


If this were true (which is nearly impossible since you said “all”), stuff like Anubis wouldn’t exist since you could just toss up a crowd-sourced robots.txt and be done with it.


This issue is largely manifesting through AI scraping right now. Additionally, many intentionally ignore robots.txt. Currently, LLM scrapers are basically just bad actors on the internet. Courts have also ruled in favor of a number of AI companies when sued in the US, so it’s unlikely anything will change. Effectively, if you don’t like the status quo, stuff like this is one of your few options.
This isn’t even mentioning of course whether we actually want these companies to improve their models before resolving the problems of energy consumption and potential displacement of human workers.


Corporations want the existing copyright system for their own products but simultaneously want to freely scrape data from everyone else.
I mean, they say earlier that music is actually well-preserved, but it’s disproportionately popular music. If the goal is then to preserve everything, I’d expect them to go for stuff that isn’t likely to be in some random audiophile’s collection or whatever then.
- Over-focus on the most popular artists. There is a long tail of music which only gets preserved when a single person cares enough to share it. And such files are often poorly seeded.
- We primarily used Spotify’s “popularity” metric to prioritize tracks. View the top 10,000 most popular songs in this HTML file (13.8MB gzipped).
- For popularity>0, we got close to all tracks on the platform. The quality is the original OGG Vorbis at 160kbit/s. Metadata was added without reencoding the audio (and an archive of diff files is available to reconstruct the original files from Spotify, as well as a metadata file with original hashes and checksums).
- For popularity=0, we got files representing about half the number of listens (either original or a copy with the same ISRC). The audio is reencoded to OGG Opus at 75kbit/s — sounding the same to most people, but noticeable to an expert.
Perhaps I’m reading this wrong, but is this not a little backwards? Since unpopular music is poorly preserved, shouldn’t the focus be on getting the least popular music first?
So, is it basically treating every message as a “group” message where it sends it to some system WhatsApp account and then also to your intended receiver? This is what I’m assuming based on them supposedly being able to see deleted messages. Also would let them say it’s technically still “E2EE” since it’s indeed E2EE to your receiver, but it’s also E2EE to them as well.