I knew about what happened with the referrers when going from HTTP to HTTPS but had not heard about the meta referrer and have been wondering why, even though my own site is HTTPS I see so many bare domain referrers. Now I know. Thank you both to author and submitter.
Personally I switched from Ubuntu to Fedora a couple of days ago because I've had it with Canonical. It was the first time I switched distro almost since I started using Linux (though I've admined servers with other distros and other OSes in the meantime). I am satisfied with Fedora thus far.
1. Google's Web Crawlers are not "bypassing" paywall. It's the paywall that let's crawlers through. I.e. exactly the reverse of what the author implies with their headline.
2. The idea that this is somehow new is wrong. The way for a server to identify crawlers have "always" been to look at the user-agent, and, when done right, IP, verified either by net block owner or by doing PTR lookup and then checking that the A or AAA record for the claimed host points back at the same IPv4 or IPv6 address. Meanwhile, I do agree that paywalling is a more recent phenomenon, at least with regards to the extend it is popular among sites today, but the concept of presenting different data to crawlers and visitors arose much earlier and is something Google have been aware of and has made sure to delist such sites when found, whereas in fact Google has since then moved abit in the direction of allowing it in that they do so for Google News if declared as explained by others ITT.
So in my view, it seems that the author is jumping to incorrect conclusions based on an incomplete understanding of what's actually going on here. What then about the HN readership, how come this article became so highly voted and I don't see these issues raised by anyone else? Or maybe I'm just crazy?
> Google's Web Crawlers are not "bypassing" paywall. It's the paywall that let's crawlers through. I.e. exactly the reverse of what the author implies with their headline.
Don't nitpick. It's just a shortened version of How To "Be" a Google’s Web Crawler to Bypass Paywalls. You get it. I get it. Everyone gets it.
inb4 facebook resolves this issue by banning anyone who's connected 24/7. (that wouldn't solve the problem either way, btw -- a small group of people could conspire to pull this data at irregular intervals and then share the data with one-another to get a more complete picture while still staying reasonably undetectable if done right.)