An Internet hub with domestic and international news, analysis, original reporting, and popular features from the left, center, indies, centrists, moderates, and right

Earth to Newspapers: You CAN Control Your Content With Robots.txt

Newspapers act like they don’t have a clue. So Google directed this message at them:

The truth is that news publishers, like all other content owners, are in complete control when it comes not only to what content they make available on the web, but also who can access it and at what price. This is the very backbone of the web — there are many confidential company web sites, university databases, and private files of individuals that cannot be accessed through search engines. If they could, the web would be much less useful.

For more than a decade, search engines have routinely checked for permissions before fetching pages from a web site. Millions of webmasters around the world, including news publishers, use a technical standard known as the Robots Exclusion Protocol (REP) to tell search engines whether or not their sites, or even just a particular web page, can be crawled. Webmasters who do not wish their sites to be indexed can and do use the following two lines to deny permission:

User-agent: *
Disallow: /

But that, of course, is not what newspapers want. They want to be listed in Google results and also paid for the right. Google’s answer:

Today, more than 25,000 news organizations across the globe make their content available in Google News and other web search engines. They do so because they want their work to be found and read — Google delivers more than a billion consumer visits to newspaper web sites each month. These visits offer the publishers a business opportunity, the chance to hook a reader with compelling content, to make money with advertisements or to offer online subscriptions. If at any point a web publisher feels as though we’re not delivering value to them and wants us to stop indexing their content, they’re able to do so quickly and effectively.

Some proposals we’ve seen from news publishers are well-intentioned, but would fundamentally change — for the worse — the way the web works. Our guiding principle is that whatever technical standards we introduce must work for the whole web (big publishers and small), not just for one subset or field. There’s a simple reason behind this. The Internet has opened up enormous possibilities for education, learning, and commerce so it’s important that search engines makes it easy for those who want to share their content to do so — while also providing robust controls for those who want to limit access.

Emphasis mine.

Via Michael Masnick at Techdirt, “With the silly introduction last week of the AP’s attempt to create a weird and totally unnecessary new data feed to keep out aggregators and search engines, it seems that Google has gotten fed up.”

Also from Masnick on Techdirt, Journalist Demands Google Give Up Its ‘Fair Share’ To Newspapers. It’s an effective take-down of a piece by Peter Osnos, the Vice-Chairman of the Columbia Journalism Review:

There are numerous problems with the logic in the piece, but they can be summarized in two basic camps: a misunderstanding of the internet and a misunderstanding of economics.



3 Responses to “Earth to Newspapers: You CAN Control Your Content With Robots.txt”

  1. archangel says:

    good article Joe. The dust up with AP and a blogger about a year ago was covered at TMV when we were covering the elections for the Ruckus on Newsweek magazine. It was one of those 'attack the really really little guy' by huge huge huge network AP. It got settled, but it was about copying content (fair use in the amount he took, to these eyes) and even though citing it properly, AP wanted him to quit and never ever wuote them again.

    So like the NYT and their subscription service to news that floundered hopelessly, because bloggers and readers simply turned to other good papers, wsj, csm, lat, chitrib, et al, to utilie fair use while citing the author and source properly and often linking as well

    There must be someone in leadership at AP who is trying to get others not to cross the maginot. I think your update on UP is very interesting.

  2. [...] Earth to Newspapers: You CAN Use Robots.txt (themoderatevoice.com) [...]

  3. [...] Earth to Newspapers: You CAN Use Robots.txt (themoderatevoice.com) [...]

© 2003-2011 The Moderate Voice | Site design by Elegant Themes | Site customization, hosting, and security by Mode Equity