The websites a Brave user browses are anonymously relayed to their servers for indexing/training. So, they crawl the web without a crawler and the website operators can't do anything about it.
I'm Sampson, from the Brave team. The Web Discovery Project is a clever approach. For Brave to compete with Google, and offer a truly novel index of the Web, a novel approach must be taken. The WDP is an opt-in, privacy-preserving approach which gives Brave a fighting chance against the Search incumbants. Due to our preference of "Can't be evil" over "Don't be evil," the WDP is not only designed with privacy and anonymity as a prerequisite, but it is also open-source for public scrutiny and evaluation: https://github.com/brave/web-discovery-project.
It's not a clever approach, it's basically scraping Google results because that's where your users are searching. You follow the bread crumbs from Google searches.
Cliqz entire history was based on this kind of thing, milking off other search engines by just deducting their ranking methods, it's parasitic. There's no cleverness about it.
I don't know a lot about this particular approach but your comment that it's just using Google results is blatantly false. It all depends on the search engine that the brave user is leveraging, or no search engine if they type in the URL directly into the header.
That's genius!