This post is the latest in an ongoing series about how we harness the data we collect to improve our products and services for our users. - Ed.
As the head of the webspam team at Google, I'm in charge of making sure your search results are as relevant and informative as possible. Webspam, in case you've never heard of it, is the junk you see in search results when websites successfully cheat their way into higher positions in search results or otherwise violate search engine quality guidelines. If you've never seen webspam, here's a good example of what you might see if you click on a link in the search results that's spam (click on the image to see it larger).
You can see how unhelpful such a page would be. This example is filled with almost no original content, irrelevant links, and information that is of little use to a user. We work hard to ensure you rarely see search results like this. Imagine how annoyed you would be if you clicked on a link from a Google search result and ended up on a page like this.
Searchers don't often see blatant, outright spam like this in search results today. But webspam was much more of an issue before Google became popular and before we were able to build effective anti-spam methods. In general, webspam can be a real annoyance, such as when a search on your own name returns links to porn pages as results. But for many searches, where getting relevant information is more critical, spam is a serious problem. For example, a search for prostate cancer that's full of spam instead of relevant links greatly diminishes the value of a search engine as a helpful tool.
Data from search logs is one tool we use to fight webspam and return cleaner and more relevant results. Logs data such as IP address and cookie information make it possible to create and use metrics that measure the different aspects of our search quality (such as index size and coverage, results "freshness," and spam).
Whenever we create a new metric, it's essential to be able to go over our logs data and compute new spam metrics using previous queries or results. We use our search logs to go "back in time" and see how well Google did on queries from months before. When we create a metric that measures a new type of spam more accurately, we not only start tracking our spam success going forward, but we also use logs data to see how we were doing on that type of spam in previous months and years.
The IP and cookie information is important for helping us apply this method only to searches that are from legitimate users as opposed to those that were generated by bots and other false searches. For example, if a bot sends the same queries to Google over and over again, those queries should really be discarded before we measure how much spam our users see. All of this--log data, IP addresses, and cookie information--makes your search results cleaner and more relevant.
If you think webspam is a solved problem, think again. Last year Google faced a rash of webspam on Chinese domains in our index. Some spammers were purchasing large amounts of cheap .cn domains and stuffing them with misspellings and porn phrases. Savvy users may remember reading a few blogs about it, but most regular users never even noticed. The reason that a typical searcher didn't notice the odd results is that Google identified the .cn spam and responded with a fast-tracked engineering project to counteract that type of spam attack. Without our logs data to help identify the speed and scope of the problem, many more Google users might have been affected by this attack.
In an ideal world, the vast majority of our users wouldn't even need to know that Google has a webspam team. If we do our job well, you may see low-quality results from time to time, but you won't have to face sneaky JavaScript redirects, unwanted porn, gibberish-stuffed pages or other types of webspam. Our logs data helps ensure that Google detects and has a chance to counteract new spam trends before it lowers the quality of your search experience.
2008年06月29日
Get outdoors with GO Georgia!
Our Atlanta office recently teamed up with the Parks, Recreation and Historic Sites Division of the Georgia Department of Natural Resources to support an initiative called Get Outdoors Georgia (GO Georgia). An effort to help Georgians get outdoors, get fit and enjoy their diverse natural resources, the initiative focuses on family-friendly, nature-based, healthy outdoor recreation opportunities throughout the state. As a founding sponsor of the program, Google will offer consultation on products including AdWords, Analytics, Maps, Earth, Picasa, Gadgets and a branded YouTube channel
According to a 2007 report from the Trust for America's Health, Georgia is one of the "heaviest" states in the union, ranking 14th for adult obesity and 12th for overweight children (16+ percent of its youth overweight or obese). We're pleased that our products will play a part in an historic effort to improve the health and well-being of all Georgians. And today, we're expanding our relationship with GO Georgia by spending a day in Panola Mountain State Park. Atlanta Googlers will help to restore the park and remove growth not indigenous to the area, improving the experience for Georgians and other visitors when they get out and visit the park.
According to a 2007 report from the Trust for America's Health, Georgia is one of the "heaviest" states in the union, ranking 14th for adult obesity and 12th for overweight children (16+ percent of its youth overweight or obese). We're pleased that our products will play a part in an historic effort to improve the health and well-being of all Georgians. And today, we're expanding our relationship with GO Georgia by spending a day in Panola Mountain State Park. Atlanta Googlers will help to restore the park and remove growth not indigenous to the area, improving the experience for Georgians and other visitors when they get out and visit the park.
2008年06月20日
Google Code Jam is back
If you're a great sprinter, you've probably been in a few races. And if you're a great chess player, you've probably had your share of matches. But what do you do if you're a great programmer?
Well, if you're looking for the rush of competition, the feeling of matching your mind up against the greatest in the world, you can't do better than Google Code Jam. The contests are intense: you'll have two short hours to solve some fiendish algorithmic challenges. You'll read a problem, write your code, download our test cases, and tell us what you think the right answers are. If you're right, it's time to move on to another problem -- but if you're wrong, it's time to make a decision. Debug, or look for an easier challenge...?
Registration is now open, so you can find out more about the contest, and practice on some sample problems. Practice hard! If you make it to the top 500, you'll travel to a nearby Google office for our semifinal round. If you're in the top 100, we'll fly you to our Mountain View headquarters to compete with the world's very best.
Well, if you're looking for the rush of competition, the feeling of matching your mind up against the greatest in the world, you can't do better than Google Code Jam. The contests are intense: you'll have two short hours to solve some fiendish algorithmic challenges. You'll read a problem, write your code, download our test cases, and tell us what you think the right answers are. If you're right, it's time to move on to another problem -- but if you're wrong, it's time to make a decision. Debug, or look for an easier challenge...?
Registration is now open, so you can find out more about the contest, and practice on some sample problems. Practice hard! If you make it to the top 500, you'll travel to a nearby Google office for our semifinal round. If you're in the top 100, we'll fly you to our Mountain View headquarters to compete with the world's very best.
Plug-ins converge on Washington
Last week Google.org and the Brookings Institution hosted a two-day conference in Washington to showcase plug-in electric vehicles and examine how the government can support their widespread adoption. An impressive lineup of Members of Congress, auto and utility executives, and technology experts spoke to a packed house about the potential of plug-ins to reduce oil dependence, lower the cost of driving, and fight global warming. Between panels, participants were treated to a display of the latest plug-in cars, including one of Google.org's RechargeIT cars, an electric sportscar, and Detroit's answer to high gas prices.
There appeared to be overwhelming agreement that government leadership is necessary to make this industry transformation a reality. (A recent poll commissioned by Google.org shows that voters agree.) A second theme was the need to modernize and green the power grid as the country moves toward electrifying transportation. But with gas prices at record highs and enthusiasm for the promise of electric cars growing, the feeling in Washington last week was that plug-ins' time has come.
There appeared to be overwhelming agreement that government leadership is necessary to make this industry transformation a reality. (A recent poll commissioned by Google.org shows that voters agree.) A second theme was the need to modernize and green the power grid as the country moves toward electrifying transportation. But with gas prices at record highs and enthusiasm for the promise of electric cars growing, the feeling in Washington last week was that plug-ins' time has come.
2008年06月13日
Our agreement to provide ad technology to Yahoo!
Today, we announced a non-exclusive advertising agreement that will provide Yahoo! with access to our AdSense for search and AdSense for content advertising programs on their U.S. and Canadian web properties. In addition, we will work to enable interoperability between our respective instant messaging services allowing users better, broader communication online.
We are proud of the advertising technologies we have built, which show users a relevant ad whether they are searching for a specific item or browsing the internet. This arrangement extends those benefits to Yahoo! and its many users, advertisers and publisher partners. We currently provide similar services to sites like AOL and Ask.com as well as many other partners, and we work closely with all of our partners to ensure that our partnership drives their long term success.
Why did we make this agreement? Quite simply, we think it is good for users, advertisers and publishers. By offering Google's industry-leading technology to Yahoo!, the whole system becomes more efficient, and everyone benefits:
* Consumers will see more relevant ads when they are looking for information and browsing the web. And with interoperability between IM services, users will have easier access to even more of their contacts.
* Publishers currently in the Yahoo! Publisher Network will benefit from Google's advertising technology, potentially increasing the revenue they earn from their sites.
* Advertisers will have new ways to reach their target customers online more efficiently.
We also think this is good for competition. The truth is, this kind of arrangement is commonplace in many industries, and it doesn't foreclose robust competition. Toyota sells its hybrid technology to General Motors, even though they are the number one and number two car manufacturers globally. Canon provides laser printer engines for HP, despite also competing in the broader laser printer market. Google and Yahoo will continue to be vigorous competitors, and that competition will help fuel innovation that is good for users.
It is important to say what this agreement is not:
* This is not a merger. Rather, we are merely providing access to our advertising technology to Yahoo! through our AdSense program.
* This does not remove a competitor from the playing field. Yahoo! will remain in the business of search and content advertising, which gives the company a continued incentive to keep improving and innovating. Even during this agreement, Yahoo! can use our technology as much or as little as it chooses.
* This does not prevent Yahoo! from making similar arrangements with others. This arrangement is not exclusive, meaning that Yahoo! could enter into similar arrangements with other companies.
* This does not increase Google's share of search traffic. Yahoo! will continue to run its own search engine and advertising programs, and the agreement will not increase Google's share of search traffic.
* This does not let Google raise prices for advertisers. Google does not set the prices manually for ads; rather, advertisers themselves determine prices through an ongoing competitive auction. We have found over years of research that an auction is by far the most efficient way to price search advertising and have no intention of changing that.
We have been in contact with regulators about this arrangement, and we expect to work closely with them to answer their questions about the transaction. Ultimately we believe that the efficiencies of this agreement will help preserve competition.
The Internet is a healthy, competitive environment where content creators, advertisers and users come together to access information, communicate and create new business opportunities. We think this deal extends these benefits -- it's good for users, advertisers and publishers and good for the industry.
We are proud of the advertising technologies we have built, which show users a relevant ad whether they are searching for a specific item or browsing the internet. This arrangement extends those benefits to Yahoo! and its many users, advertisers and publisher partners. We currently provide similar services to sites like AOL and Ask.com as well as many other partners, and we work closely with all of our partners to ensure that our partnership drives their long term success.
Why did we make this agreement? Quite simply, we think it is good for users, advertisers and publishers. By offering Google's industry-leading technology to Yahoo!, the whole system becomes more efficient, and everyone benefits:
* Consumers will see more relevant ads when they are looking for information and browsing the web. And with interoperability between IM services, users will have easier access to even more of their contacts.
* Publishers currently in the Yahoo! Publisher Network will benefit from Google's advertising technology, potentially increasing the revenue they earn from their sites.
* Advertisers will have new ways to reach their target customers online more efficiently.
We also think this is good for competition. The truth is, this kind of arrangement is commonplace in many industries, and it doesn't foreclose robust competition. Toyota sells its hybrid technology to General Motors, even though they are the number one and number two car manufacturers globally. Canon provides laser printer engines for HP, despite also competing in the broader laser printer market. Google and Yahoo will continue to be vigorous competitors, and that competition will help fuel innovation that is good for users.
It is important to say what this agreement is not:
* This is not a merger. Rather, we are merely providing access to our advertising technology to Yahoo! through our AdSense program.
* This does not remove a competitor from the playing field. Yahoo! will remain in the business of search and content advertising, which gives the company a continued incentive to keep improving and innovating. Even during this agreement, Yahoo! can use our technology as much or as little as it chooses.
* This does not prevent Yahoo! from making similar arrangements with others. This arrangement is not exclusive, meaning that Yahoo! could enter into similar arrangements with other companies.
* This does not increase Google's share of search traffic. Yahoo! will continue to run its own search engine and advertising programs, and the agreement will not increase Google's share of search traffic.
* This does not let Google raise prices for advertisers. Google does not set the prices manually for ads; rather, advertisers themselves determine prices through an ongoing competitive auction. We have found over years of research that an auction is by far the most efficient way to price search advertising and have no intention of changing that.
We have been in contact with regulators about this arrangement, and we expect to work closely with them to answer their questions about the transaction. Ultimately we believe that the efficiencies of this agreement will help preserve competition.
The Internet is a healthy, competitive environment where content creators, advertisers and users come together to access information, communicate and create new business opportunities. We think this deal extends these benefits -- it's good for users, advertisers and publishers and good for the industry.



