Baidu Takes a Stand: Blocking Google and Bing from Scraping Its Content

In a strategic move that underscores the increasing competition for valuable data in the artificial intelligence (AI) sector, Chinese tech giant Baidu has blocked Google and Microsoft’s Bing from indexing content on its Wikipedia-style platform, Baidu Baike. This decision reflects Baidu’s intensified efforts to safeguard its data assets, which are critical for training generative AI models and applications.

The Change in Baidu’s Policy

Baidu updated its robots.txt file—a protocol used by websites to manage how search engines crawl their pages—on August 8, effectively preventing the Googlebot and Bingbot crawlers from accessing its vast repository of nearly 30 million entries. The recent changes to Baidu Baike’s accessibility settings have been observed through records from the Wayback Machine, a digital archive service that provides historical snapshots of web pages.

Before this update, Google and Bing were allowed to crawl and index most of Baidu Baike’s content, with only some sections off-limits. Now, the entire service is blocked from both platforms, reflecting Baidu’s determination to retain exclusive control over its data.

AI Training and the Growing Data Demand

This move comes in response to escalating global demand for high-quality data, driven by the rapid expansion of generative AI (GenAI) technologies like ChatGPT by OpenAI, launched in November 2022. Since then, major companies such as Google and Microsoft have intensified their efforts to acquire massive amounts of data to train their own AI models.

Baidu’s decision mirrors similar actions by other platforms, such as Reddit, which in July restricted all search engines except Google from indexing its content. Google, in turn, holds an exclusive, multi-million-dollar deal with Reddit for access to its data for AI training purposes. Even Microsoft has taken steps to protect its search data, threatening to cut off access to competitors if they continued using it to develop rival AI systems.

Implications for the AI Landscape

Baidu’s move is part of a broader trend among companies aiming to control and monetize their data in a highly competitive AI market. As firms like OpenAI secure deals with publishers, such as its agreement with Time magazine, which grants it access to over a century’s worth of archived content, the race for proprietary data is intensifying. Baidu, by safeguarding its data, is likely positioning itself to retain a competitive edge in developing its own AI services, including its Ernie Bot and other AI-driven applications.

With Baidu’s proactive stance, other content-rich platforms may follow suit, restricting access to their data to maintain exclusivity and leverage in the rapidly evolving AI ecosystem.

The Broader Context of the US-China Tech Competition

Baidu’s decision also reflects the ongoing US-China tech rivalry, particularly in AI and data privacy domains. The competition has prompted firms from both countries to reassess how they manage and utilize their data assets to protect intellectual property and maintain market competitiveness.

As more companies around the globe strive to strengthen their AI capabilities, the control over valuable data resources is set to become a critical factor in shaping the future of technology. Baidu’s decision to block Google and Bing from its content is an early indication of how data wars may shape the AI landscape in the years to come.


By securing its data resources, Baidu not only strengthens its own AI initiatives but also positions itself as a key player in the rapidly advancing field of generative AI, where data is the most valuable asset.

Share this 🚀