Cloudflare, the network provider that helps run many of the world’s websites, has had a couple of interesting announcements this year. Back in the spring, it announced its AI Labyrinth, a kind of honeypot trap that would confuse and, ultimately, stop AI web crawlers when they visit a website looking to scrape data. In July, Cloudflare also announced a marketplace that would allow website owners to sell their data in return for AI companies being allowed to scrape.
Both of these initiatives were separate but intertwined. The first was about guarding data; the second was about monetizing it. Cloudflare’s programs, as well as others that are in the works, are clearly in their infancy, and it might take years before anything resembling a fluid data economy, in the image of Cloudflare’s predictions, may play out for a typical website owner, but the theory behind the marketplace should be particularly interesting to website owners.
All data can be valuable
All data is, of course, useful. If you run a website, it can help you understand your customers and become a better provider for them. For example, on an online social casino platform, player data, i.e., the titles they play, allows the provider to showcase the most popular games, allowing new players to see at a glance what is popular. Or on a platform like Netflix, watch data from the movies and TV shows we watch allows the streaming platform to make personalized recommendations.
However, for AI, there is an unquenchable thirst for data, especially if the stated goals of AGI (artificial general intelligence) or Superintelligence are to be reached. But to improve the LLMs, the AI companies need to have access to more and more information, helping the models contextualize and understand better the human world. This is all fine and well, but data holders believe they should be compensated for helping to train those models.
AI companies have signed vast deals
You might have seen that AI companies have signed huge deals with data holders over the last year or two. This has included OpenAI’s massive deal with Condé Nast, publisher of the New Yorker, Vanity Fair, and many more cultural magazines. It also signed a huge deal with News Corp. The New York Times, however, has threatened to sue it for “billions” for training on its content without permission.
The argument that Cloudflare is trying to set out is that it is all well and good for The News Corp and Condé Nast to sign multi-million-dollar deals with AI companies, but what about the little guy? What if you have a cookery blog focused on healthy vegan snacks? Or what if you have a website that discusses the history of badminton in the United States? AI companies aren’t queuing up to pay these niche enterprises, so companies like Cloudflare are trying to make a scenario where they are forced to do so (by blocking web crawlers) and making it easy for them to do it (through a marketplace).
Some observers are of the mind that data on the web should be fair game for AI, and that companies should not be forced to pay. Yet, there are both ethical and practical arguments against that. As more and more people use AI, they will visit websites less and less. So the young man who runs a vegan cookery blog will get fewer hits and be less inclined to continue with that blog. This, in turn, creates a vacuum for AI – where does it learn about this niche style of cooking? The same goes for our history of badminton example and a myriad of other topics.
In the end, it should come down to the understanding that AI needs a vibrant ecosystem of data to meet its goals, but that ecosystem will only remain healthy if there is an incentive for the people creating it. We aren’t saying that Cloudflare’s marketplace is foolproof, nor if it will even be successful, but it does at least attempt to identify a problem and create a solution for it.














