Turn on Translator

Wednesday, January 13, 2010

How Google collects data about you and the Internet

We are watching you

Google has, perhaps more than any other company, realized that information is power. Information about the Internet, information about innumerable trends, and information about its users, YOU.

So how much does Google know about you and your online habits? It’s only when you sit down and actually start listing all of the various Google services you use on a regular basis that you begin to realize how much information you’re handing over to Google.

This has, as these things tend to do, given rise to various privacy concerns. It probably didn’t help when Google’s CEO, Eric Schmidt, recently went on the record saying: “If you have something that you don’t want anyone to know, maybe you shouldn’t be doing it in the first place.”

Now let’s have a look at how Google is gathering information from you, and about you.

Google’s information-gathering channels

Google’s stated mission is “to organize the world’s information and make it universally accessible and useful” and it is making good on this promise. However, Google is gathering even more information than most of us realize.

  • Searches (web, images, news, blogs, etc.) – Google is, as you all know, the most popular search engine in the world with a market share of almost 70% (for example, 66% of searches in the US are made on Google). Google tracks all searches, and now with search becoming more and more personalized, this information is bound to grow increasingly detailed and user specific.
  • Clicks on search results – Not only does Google get information on what we search for, it also gets to find out which search results we click on.
  • Web crawling – Googlebot, Google’s web crawler, is a busy bee, continuously reading and indexing billions of web pages.
  • Website analytics – Google Analytics is by far the most popular website analytics package out there. Due to being free and still supporting a number of advanced features, it’s used by a large percentage of the world’s websites.
  • Ad serving – Adwords and Adsense are cornerstones of Google’s financial success, but they also provide Google with a lot of valuable data. Which ads are people clicking on, which keywords are advertisers bidding on, and which ones are worth the most? All of this is useful information.
  • Email – Gmail is one of the three largest email services in the world, together with competing options from Microsoft (Hotmail) and Yahoo. Email content, both sent and received, is parsed and analyzed. Even from a security standpoint this is a great service for Google. Google’s email security service, Postini, gets a huge amount of data about spam, malware and email security trends from the huge mass of Gmail users.
  • Twitter – “All your tweets are belong to us,” to paraphrase an early Internet meme. Google has direct access to all tweets that pass through Twitter after a deal made late last year.
  • Google Apps (Docs, Spreadsheets, Calendar, etc.) – Google’s office suite has many users and is of course a valuable data source to Google.
  • Google Public Profiles – Google encourages you to put a profile about yourself publicly on the Web, including where you can be found on social media sites and your homepage, etc.
  • Orkut – Google’s social network isn’t a success everywhere, but it’s huge in some parts of the world (mainly Brazil and India).
  • Google Public DNS – Google’s newly launched DNS service doesn’t just help people get fast DNS lookups, it helps Google too, because it will get a ton of statistics from this, for example what websites people access.
  • The Google Chrome browser – What is your web browsing behavior? What sites do you visit?
  • Google Finance – Aside from the finance data itself, what users search for and use on Google Finance is sure to be valuable data to Google.
  • YouTube – The world’s largest and most popular video site by far is, as you know, owned by Google. It gives Google a huge amount of information about its users’ viewing habits.
  • Google Translate – Helps Google perfect its natural language parsing and translation.
  • Google Books – Not huge for now, but has the potential to help Google figure out what people are reading and want to read.
  • Google Reader – By far the most popular feed reader in the world. What RSS feeds do you subscribe to? What blog posts do you read? Google will know.
  • Feedburner – Most blogs use Feedburner to publicize their RSS feeds, and every Feedburner link is tracked by Google.
  • Google Maps and Google Earth – What parts of the world are you interested in?
  • Your contact network – Your contacts in Google Talk, Gmail, etc, make up an intricate network of users. And if those also use Google, the network can be mapped even further. We don’t know if Google does this, but the data is there for the taking.
  • Coming soon – Chrome OS, Google Wave, more up-and-coming products from Google.

And the list could go on since there are even more Google products out there, but we think that by now you’ve gotten the gist of it… ;)

Much of this data is anonymized, but not always right away. Logs are kept for nine months, and cookies (for services that use them) aren’t anonymized until after 18 months. Even after that, the sheer amount of generic user data that Google has on its hands is a huge competitive advantage against most other companies, a veritable gold mine.

Google’s unstoppable data collection machine

There are many different aspects of Google’s data collection. The IP addresses requests are made from are logged, cookies are used for settings and tracking purposes, and if you are logged into your Google account, what you do on Google-owned sites can often be coupled to you personally, not just your computer.

In short, if you use Google services, Google will know what you’re searching for, what websites you visit, what news and blog posts you read, and more. As Google adds more services and its presence gets increasingly widespread, the so-called Googlization (a term coined by John Batelle and Alex Salkever in 2003) of almost everything continues.

The information you give to any single one of Google’s services wouldn’t be much to huff about. The really interesting dilemma comes when you use multiple Google services, and these days, who doesn’t?

Try using the Internet for a week without touching a single one of Google’s services. This means no YouTube, no Gmail, no Google Docs, no clicking on Feedburner links, no Google search, and so on. Strictly, you’d even have to skip services that Google partner with, so, sorry, no Twitter either.

This increasing Googlization is probably why some people won’t want to use Google’s Chrome OS, which will be strongly coupled with multiple Google services and most likely give Google an unprecedented amount of data about your habits.

Why does Google do this?

As we stated in the very first sentence of this article, information is power.

With all this information at its fingertips, Google can group data together in very useful ways. Not just per user or visitor, but Google can also examine trends and behaviors for entire cities or countries.

Google can use the information it collects for a wide array of useful things. In all of the various fields where Google is active, it can make market decisions, research, refine its products, anything, with the help of this collected data.

For example, if you can discover certain market trends early, you can react effectively to the market. You can discover what people are looking for, what people want, and make decisions based on those discoveries. This is of course extremely useful to a large company like Google.

And let’s not forget that Google earns much of its money serving ads. The more Google knows about you, the more effectively it will be able to serve ads to you, which has a direct effect on Google’s bottom line.

It’s not just Google

It should be mentioned that Google’s isn’t alone in doing this kind of data collection. Rest assured that Microsoft is doing similar things with Bing and Hotmail, to name just one example.

The problem (if you want to call it a problem) with Google is that, like an octopus, its arms are starting to reach almost everywhere. Google has become so mixed up in so many aspects of our online lives that it is getting an unprecedented amount of information about our actions, behavior and affiliations online.

Google, an octopus?
Google, an octopus?

Accessing Google’s data vault

To its credit, Google is making some of its enormous cache of data available to you as well via various services.

If Google can make that much data publicly available, just imagine the amount of data and the level of detail Google can get access to internally. And ironically, these services give Google even more data, such as what trends we are interested in, what sites we are trying to find information about, and so on.

An interesting observation when using these tools is that in many cases information can be found for everything except for Google’s own products. For example, Ad Planner and Trends for Websites don’t show site statistics for Google sites, but you can find information about any other sites.

No free lunch

Did you ever wonder why almost all of Google’s services are free of charge? Well, now you know. That old saying, “there ain’t no such thing as a free lunch,” still holds true. You may not be paying Google with dollars (aside from clicking on those Google ads), but you are paying with information. That doesn’t have to be a bad thing, but you should be aware of it.

Posted in Main on January 8th, 2010 by Pingdom

Reblog this post [with Zemanta]
Post a Comment