Google Built Its Empire Scraping The Web. Now It’s Suing To Stop Others From Scraping Google

Sunshine (she/her)@piefed.ca · 19 days ago

Google Built Its Empire Scraping The Web. Now It’s Suing To Stop Others From Scraping Google

mesa@piefed.social · 19 days ago

Google and OpenAI sucks:

Google’s legal theory has another significant problem: the requirement that a TPM must “effectively control” access. Just last week, a court rejected Ziff Davis’s attempt to turn robots.txt into a 1201 violation when OpenAI allegedly ignored its crawling restrictions. The court’s reasoning is directly applicable here:

OpenAI slamed my small server into the ground, until I put fail2ban on top. It was really bad, like thousands of requests per second bad.

apftwb@lemmy.world · 19 days ago

How does fail2ban prevent scrapping? My understanding was that fail2ban works on failed login attempts.

mesa@piefed.social · 19 days ago

There’s some premade scripts out there that make it do more. I have it hooked up to nginx and other such logs. Its common enough in login attempts for login portals online, not just ssh. It can work with any grep-able log file.

I just took two scripts other people have made, verified they soon my mini PC and set it loose. Within about 10 min it caught most scrappers and banned the IPs.