Opting Out Of Training Google’s Bard And Future AIs: Take Control Over Your Web Content

Google’s use of large language models, such as its Bard AI, has raised concerns about the collection and utilization of data without explicit consent. However, now web publishers have an opportunity to decide whether they want their content to contribute to the training of these AI models.

Key Takeaway

Web publishers now have the opportunity to opt out of allowing Google to use their content for training AI models, like Bard. By disallowing “User-Agent: Google-Extended” in their robots.txt file, publishers can exercise greater control over how their content is utilized.

The Choice is Yours: Disallowing “User-Agent: Google-Extended”

If you want to exercise more control over how your web content is used, you can simply disallow “User-Agent: Google-Extended” in your site’s robots.txt file. This file specifies which parts of your website can be accessed by automated web crawlers.

In a blog post, Google’s VP of Trust, Danielle Romain, acknowledges that web publishers have expressed a desire for greater choice and control over how their content is utilized by emerging generative AI use cases. This announcement comes as a response to those concerns.

Notably, the term “train” is not explicitly mentioned in the post, even though it is clear that the collected data is used as training material for machine learning models. Instead, Romain appeals to publishers by asking if they are willing to contribute to the improvement of Bard and Vertex AI generative APIs, enabling these models to become more accurate and capable over time.

While framing the question in terms of consent is important, it is worth acknowledging that Google has already collected vast amounts of data without explicit permission. This raises questions about the authenticity of Google’s commitment to ethical data collection and consent.

The reality is that Google had unimpeded access to web data and is now seeking permission retroactively, creating the illusion of prioritizing consent and ethical data practices. If they were truly committed to such principles, this option would have been available years ago.

In a coincidental move, Medium recently announced its decision to universally block crawlers like Google’s Bard until a more refined and granular solution can be developed. Medium’s action hints at the potential formation of a media coalition to counter the influence of AI crawlers.