Pages

Tuesday, October 27, 2015

A Tech Joke: Google Caught in Dichotomy

Before I reveal the joke, let me explain a little bit about 'Web Scraping'. 

Web Scraping is a computer technology where a program is designed to crawl someone's website without any sort of permission. It is similar to a human browsing a publicly available website but it is more than that in the sense that a program can simulate human browsing in a much faster, efficient and structured manner. 

Many use web scraping for data mining or comparing prices of millions of products on large ecommerce sites. To that effect, even I have just completed writing my own (my first ever) search engine to search thousands of products in Amazon and flipkart and compare the prices using the same web scraping technology. 

Although web scrapping is a handy technology, many frown at it merely because it has a big gray area in terms of copyright violation, pirates, duplicating the content etc. In a way it is seen as stealing the content without permission from the owner. Google is one big organization that is committed  to seriously kill all web scrapers. 

There is one very famous guy at google called Matt Cutts. He is the guy who is responsible for making search engine results better. He is often seen as an evil by the bloggers because he punishes blogs and websites severely by ranking them low if they violate any kind of google's guidelines - such as duplicate content, less authority websites etc. In all, there are millions of bloggers who hate Matt Cutts because he does not allow their websites to show up in the google search.

Coming back to web scraping, even the biggest company Google, sometimes finds it hard to determine which content is original and which one is web-scraped. Such is the complexity of dealing with web scraping.

So one day Matt Cutts tweeted this:

If you see a scraper URL outranking the original source of content in Google, please tell us about it.
In response, one online entrepreneur Dan Barker (probably a supporter of web scraping) replied to his tweet. His reply went viral as it is one of the most hilarious. This is what he replied:


He nailed Matt Cutts by pointing that Google itself is the world's largest web scraper and it is, at the same time, strongly against web scraping. What a dichotomy he pointed out! 

No comments :

Post a Comment