Search Quality is the name of the team responsible for the ranking of Google search results. Our job is clear: A few hundreds of millions of times a day people will ask Google questions, and within a fraction of a second Google needs to decide which among the billions of pages on the web to show them -- and in what order. Lately, we have been doing other things as well. But more on that later.
For something that is used so often by so many people, surprisingly little is known about ranking at Google. This is entirely our fault, and it is by design. We are, to be honest, quite secretive about what we do. There are two reasons for it: competition and abuse. Competition is pretty straightforward. No company wants to share its secret recipes with its competitors. As for abuse, if we make our ranking formulas too accessible, we make it easier for people to game the system. Security by obscurity is never the strongest measure, and we do not rely on it exclusively, but it does prevent a lot of abuse.
The details of the ranking algorithms are in many ways Google's crown jewels. We are very proud of them and very protective of them. By some estimate, more than one thousand programmer/scientist years have gone directly into their development, and the rate of innovation has not slowed down.
But being completely secretive isn’t ideal, and this blog post is part of a renewed effort to open up a bit more than we have in the past. We will try to periodically tell you about new things, explain old things, give advice, spread news, and engage in conversations. Let me start with some general pieces of information about our group. More blog posts will follow.
I should take a moment to introduce myself. My name is Udi Manber, and I am a VP of engineering at Google in charge of Search Quality. I have been at Google for over two years, and I have been working on search technologies for almost 20 years.
The heart of the group is the team that works on core ranking. Ranking is hard, much harder than most people realize. One reason for this is that languages are inherently ambiguous, and documents do not follow any set of rules. There are really no standards for how to convey information, so we need to be able to understand all web pages, written by anyone, for any reason. And that's just half of the problem. We also need to understand the queries people pose, which are on average fewer than three words, and map them to our understanding of all documents. Not to mention that different people have different needs. And we have to do all of that in a few milliseconds.
The most famous part of our ranking algorithm is PageRank, an algorithm developed by Larry Page and Sergey Brin, who founded Google. PageRank is still in use today, but it is now a part of a much larger system. Other parts include language models (the ability to handle phrases, synonyms, diacritics, spelling mistakes, and so on), query models (it's not just the language, it's how people use it today), time models (some queries are best answered with a 30-minutes old page, and some are better answered with a page that stood the test of time), and personalized models (not all people want the same thing).
Another team in our group is responsible for evaluating how well we're doing. This is done in many different ways, but the goal is always the same: improve the user experience. This is not the main goal, it is the only goal. There are automated evaluations every minute (to make sure nothing goes wrong), periodic evaluations of our overall quality, and, most importantly, evaluations of specific algorithmic improvements. When an engineer gets a new idea and develops a new algorithm, we test their ideas thoroughly. We have a team of statisticians who look at all the data and determine the value of the new idea. We meet weekly (sometimes twice a week) to go over those new ideas and approve new launches. In 2007, we launched more than 450 new improvements, about 9 per week on the average. Some of these improvements are simple and obvious -- for example, we fixed the way Hebrew acronym queries are handled (in Hebrew an acronym is denoted by a (") next to the last character, so IBM will be IB"M), and some are very complicated -- for example, we made significant changes to the PageRank algorithm in January. Most of the time we look for improvements in relevancy, but we also work on projects where the sole purpose is to simplify the algorithms. Simple is good.
International search has been one of our key focus areas in the past two years. This means all spoken languages, not just the major ones. Last year, for example, we made major improvements in Azerbaijani, a language spoken by about 8 million people. In the past few months, we launched spell checking in Estonian, Catalan, Serbian, Serbo-Croatian, Ukranian, Bosnian, Latvian, Filipino, Slovenian and Farsi. We organized a network of people all over the world who provide us with feedback, and we have a large set of volunteers from all parts of Google who speak different languages and help us improve search.
Another team is dedicated to new features and new user interfaces. Having a great engine is necessary for a great car, but it is not sufficient. The car has to be comfortable and easy to drive. The Google search user interface is quite simple. Very few of our users ever read our help pages, and they can do very well without them (but they're good reading nevertheless, and we're working to improve them). When we add new features we try to ensure that they will be intuitive and easy to use for everyone. One of the most visible changes we made in the past year was Universal Search. Others include the Google Notebook, Custom Search Engines, and of course, many improvements to iGoogle. The UI team is helped by a team of usability experts who conduct user studies and evaluate new features. They travel all over the world, and they even go to people's homes to see users in their natural habitat. (Don't worry, they do not come unannounced or uninvited!)
There is a whole team that concentrates on fighting webspam and other types of abuse. That team works on variety of issues from hidden text to off-topic pages stuffed with gibberish keywords, plus many other schemes that people use in an attempt to rank higher in our search results. The team spots new spam trends and works to counter those trends in scalable ways; like all other teams, they do it internationally. The webspam group works closely with the Google Webmaster Central team, so they can share insights with everyone and also listen to site owners.
There are other teams devoted to particular projects. In general, our organizational structure is quite informal. People move around, and new projects start all the time.
One of the key things about search is that users' expectations grow rapidly. Tomorrow's queries will be much harder than today's queries. Just as Moore's law governs the doubling of computing speed every 18 months, there is a hidden unwritten law that doubles the complexity of our most difficult queries in a short time. This is impossible to measure precisely, but we all feel it. We know we cannot rest on our laurels, we have to work hard to meet the challenge. As I mentioned earlier, we will continue providing you with updates on search quality in the coming months, so stay tuned.
2008年05月21日
2008年05月20日
A peek into our search factory
Today we hosted an informal gathering -- a factory tour of sorts -- to offer a glimpse into what we think is most exciting about search, and where innovation is most likely to come from. We also gave an update on Google Health.
On the search front, we wanted to share news about the way we think search is expanding. When we talk about search, we mean images, news, finance, books, local, and geographical information as well as web search. These media types are becoming more and more integral in our core universal search, but each presents its own challenges, innovations, and triumphs. Today R.J. Pittman, Director of Search Properties, showed some of the amazing advances we've made in image search -- we now offer an early form of face recognition on advanced search, for example -- as well as how ads might work to enhance the user experience on image search. He also demonstrated the interesting innovative technologies that Google News has deployed to support features like quotes from newsmakers and better quality search for local news.
Carter Maslan, Director of Local Search Quality, talked about our Geo products (Maps and Earth and their features) and the fact that they represent a considerable search problem: how do you take all of the information about the physical world and make it searchable? How do you label disputed borders? How can Street View help you find where you are going? Google Earth has helped archaeologists find things they've looked for for years (i.e. a Roman villa in someone's backyard). User-generated content is the rage right now, but in addition to entertaining shared videos and photos, the user-generated content that we're seeing on geo products is profoundly useful and helps us better understand the world.
Then, we turned to core search quality and got the latest update on web search from Johanna Wright, Director of Search Quality. It's amazing to me how sophisticated web search has become in such a short period of time. We've accomplished a lot with universal search this past year by bringing new form and function to our results page. Now, our search quality team is turning its attention toward the ever-elusive "user intent" ("this is what I typed, here's what I meant"). This will help us make universal search even more useful. You'll get pictures or maps when that's what you meant. Understanding user intent also helps us break down language barriers and find the best possible answer regardless of what language it's in or where it lives on the web.
In terms of new products, we made Google Health publicly available. It offers users a safe and secure way to collect, store, and manage their medical records and health information online. How many of us have touched, or even seen, our medical records? In this day and age of information, isn't it crazy that you don't have a copy of your medical records under your control? You could use those records to develop a better understanding of your health and ultimately get better care. It's your data about your own health; why shouldn't you own and control it?
Back in February, I wrote about how Google Health will harness the power of the Internet to put users in control of their own medical records. Data will stay with you -- if you change doctors, want a second opinion, if you're traveling -- and not stay siloed or stuck in files or databases that you can't get to. To break down these information silos, we launched Google Health today with several partners and third party services already integrated. These partners are as committed as we are to solving this urgent need. Our flagship partners include everyday brand names such as Walgreens, Quest Diagnostics and Longs Drugs, to name just a few.
In addition to helping you get better control of your medical information, we've also put strong privacy policies in place to keep your information safe and private. (Read more about this on our public policy blog.) There's a lot left to do in health -- literally thousands of partnerships to forge and petabytes of data to move around -- but we're looking forward to hearing feedback from early Google Health adopters about our first step.
Unrelated to Google Health but in the interest of helping people get healthier, we launched our Go for Good campaign with the Cleveland Clinic. The Walk for Good iGoogle gadget encourages you to be good to yourself by walking regularly and tracking your progress. If you finish week 15 of the program by October 25th this year and have completed at least half of the total walking program by then, you can vote to tell us which of the health charities from our list should receive part of a $100,000 donation.
On the search front, we wanted to share news about the way we think search is expanding. When we talk about search, we mean images, news, finance, books, local, and geographical information as well as web search. These media types are becoming more and more integral in our core universal search, but each presents its own challenges, innovations, and triumphs. Today R.J. Pittman, Director of Search Properties, showed some of the amazing advances we've made in image search -- we now offer an early form of face recognition on advanced search, for example -- as well as how ads might work to enhance the user experience on image search. He also demonstrated the interesting innovative technologies that Google News has deployed to support features like quotes from newsmakers and better quality search for local news.
Carter Maslan, Director of Local Search Quality, talked about our Geo products (Maps and Earth and their features) and the fact that they represent a considerable search problem: how do you take all of the information about the physical world and make it searchable? How do you label disputed borders? How can Street View help you find where you are going? Google Earth has helped archaeologists find things they've looked for for years (i.e. a Roman villa in someone's backyard). User-generated content is the rage right now, but in addition to entertaining shared videos and photos, the user-generated content that we're seeing on geo products is profoundly useful and helps us better understand the world.
Then, we turned to core search quality and got the latest update on web search from Johanna Wright, Director of Search Quality. It's amazing to me how sophisticated web search has become in such a short period of time. We've accomplished a lot with universal search this past year by bringing new form and function to our results page. Now, our search quality team is turning its attention toward the ever-elusive "user intent" ("this is what I typed, here's what I meant"). This will help us make universal search even more useful. You'll get pictures or maps when that's what you meant. Understanding user intent also helps us break down language barriers and find the best possible answer regardless of what language it's in or where it lives on the web.
In terms of new products, we made Google Health publicly available. It offers users a safe and secure way to collect, store, and manage their medical records and health information online. How many of us have touched, or even seen, our medical records? In this day and age of information, isn't it crazy that you don't have a copy of your medical records under your control? You could use those records to develop a better understanding of your health and ultimately get better care. It's your data about your own health; why shouldn't you own and control it?
Back in February, I wrote about how Google Health will harness the power of the Internet to put users in control of their own medical records. Data will stay with you -- if you change doctors, want a second opinion, if you're traveling -- and not stay siloed or stuck in files or databases that you can't get to. To break down these information silos, we launched Google Health today with several partners and third party services already integrated. These partners are as committed as we are to solving this urgent need. Our flagship partners include everyday brand names such as Walgreens, Quest Diagnostics and Longs Drugs, to name just a few.
In addition to helping you get better control of your medical information, we've also put strong privacy policies in place to keep your information safe and private. (Read more about this on our public policy blog.) There's a lot left to do in health -- literally thousands of partnerships to forge and petabytes of data to move around -- but we're looking forward to hearing feedback from early Google Health adopters about our first step.
Unrelated to Google Health but in the interest of helping people get healthier, we launched our Go for Good campaign with the Cleveland Clinic. The Walk for Good iGoogle gadget encourages you to be good to yourself by walking regularly and tracking your progress. If you finish week 15 of the program by October 25th this year and have completed at least half of the total walking program by then, you can vote to tell us which of the health charities from our list should receive part of a $100,000 donation.
Responding to the earthquake in China
One week ago, an earthquake of 7.8 magnitude struck in Sichuan. Everyone in China was shocked and then heartbroken as reported deaths climbed from 10,000 to 20,000 to more than 32,000 people. The death toll is still rising, and the number of injured and missing is many times greater.
But the Chinese people have faced this disaster with resilience, compassion, and courage. There have been non-stop airlifts, blood donations, and rescue missions. One bold executive drove hundreds of miles in his jeep, started digging, and saved several lives. Taxi drivers stopped carrying passengers and drove to affected areas to help. One hundred thousand brave soldiers risked (and some gave) their lives to look for every possible survivor. Here in Beijing more than a thousand families have volunteered to adopt children who have lost their parents. Everyone is eager to help -- and that includes Googlers.
Within hours of the earthquake, our China-based teams pulled together to use Google's resources and technology to help. At the request of the government, we obtained new satellite images of Sichuan province (Earth KML) to help them better focus their recovery efforts. We developed and launched a “lost loved one” search based on our Custom Search Engine (CSE). To populate the CSE index, hundreds of Googlers worked around the clock looking through published tables, hospital records, news reports, and community sites. We tuned our Chinese news search, video search, image search, blog search, and oneboxes. We also partnered to build community sites, and launched both homepage promotions and a map-based information page. Google China has an extremely dedicated and passionate team and I am deeply honored to work alongside them.
In addition to these efforts here in China, Googlers worldwide have also made substantial financial donations to the relief operations. As a company, we’ve committed $2 million for disaster relief and rebuilding, in addition to donating a large advertising budget for donation ads and public service announcements to aid organizations throughout the world.
We have also created a Google Checkout donations page so you can easily donate to Mercy Corps, which works with the China Foundation for Poverty Alleviation, or the Tsinghua Foundation, which works with the Red Cross Society of China. Both organizations have assured us that all of the proceeds will go directly to earthquake relief.
Our efforts are but one piece of a giant effort now underway, bringing together the governments, private companies, NGOs and countless heroic individuals - all striving to address this disaster as quickly and comprehensively as we can.
Please pray for the victims of the earthquake. May the injured rest and recover. May the survivors be resilient. May all of us learn from the Chinese people to turn our anxiety into courage, misery into compassion, and sorrow into love.
But the Chinese people have faced this disaster with resilience, compassion, and courage. There have been non-stop airlifts, blood donations, and rescue missions. One bold executive drove hundreds of miles in his jeep, started digging, and saved several lives. Taxi drivers stopped carrying passengers and drove to affected areas to help. One hundred thousand brave soldiers risked (and some gave) their lives to look for every possible survivor. Here in Beijing more than a thousand families have volunteered to adopt children who have lost their parents. Everyone is eager to help -- and that includes Googlers.
Within hours of the earthquake, our China-based teams pulled together to use Google's resources and technology to help. At the request of the government, we obtained new satellite images of Sichuan province (Earth KML) to help them better focus their recovery efforts. We developed and launched a “lost loved one” search based on our Custom Search Engine (CSE). To populate the CSE index, hundreds of Googlers worked around the clock looking through published tables, hospital records, news reports, and community sites. We tuned our Chinese news search, video search, image search, blog search, and oneboxes. We also partnered to build community sites, and launched both homepage promotions and a map-based information page. Google China has an extremely dedicated and passionate team and I am deeply honored to work alongside them.
In addition to these efforts here in China, Googlers worldwide have also made substantial financial donations to the relief operations. As a company, we’ve committed $2 million for disaster relief and rebuilding, in addition to donating a large advertising budget for donation ads and public service announcements to aid organizations throughout the world.
We have also created a Google Checkout donations page so you can easily donate to Mercy Corps, which works with the China Foundation for Poverty Alleviation, or the Tsinghua Foundation, which works with the Red Cross Society of China. Both organizations have assured us that all of the proceeds will go directly to earthquake relief.
Our efforts are but one piece of a giant effort now underway, bringing together the governments, private companies, NGOs and countless heroic individuals - all striving to address this disaster as quickly and comprehensively as we can.
Please pray for the victims of the earthquake. May the injured rest and recover. May the survivors be resilient. May all of us learn from the Chinese people to turn our anxiety into courage, misery into compassion, and sorrow into love.
Opening our content network to third parties
Today, we're announcing that Google is accepting third-party advertising tags on the Google content network in North America. This will empower advertisers to work with approved third parties to serve and track display ads, including rich media ads, across the Google content network through AdWords, giving them more options, flexibility and control over their campaigns.
We had not accepted third-party tags in the past because we didn't have a process for reviewing ads to make sure that they comply with our format standards and policies, which were established to ensure that ads we serve provide the best possible user experience. Now that's in place.
Ad servers, rich media ad agencies and research firms can now go through a certification process that ensures the highest level of advertiser service and user experience. In fact, advertisers and agencies now have the ability to serve ads and measure performance through these certified third parties:
* Advertiser ad servers: DoubleClick (DFA), Mediaplex
* Rich media agencies: DoubleClick Rich Media, Eyeblaster, EyeWonder, Interpolls, PointRoll, Unicast
* Research firms: Dynamic Logic, IAG Research, InsightExpress, Factor TG
We will be certifying more third-party partners in the future.
Advertisers and agencies will now be able to manage their Google content network campaigns with the same systems they use for other online campaigns, which is helpful for determining the effectiveness of their online advertising mix. Further, this new service gives advertisers and agencies more opportunities to increase their return on investment and reach new audiences in informed and creative ways. The response from those testing early versions of the program have been positive.
For publishers on the network, this program offers a way to expand their advertiser base and enable advertisers to better understand the value of their inventory, with the goal of increasing their overall revenue. And they'll be able to show more compelling display ads to their visitors, enhancing their web experience.
We had not accepted third-party tags in the past because we didn't have a process for reviewing ads to make sure that they comply with our format standards and policies, which were established to ensure that ads we serve provide the best possible user experience. Now that's in place.
Ad servers, rich media ad agencies and research firms can now go through a certification process that ensures the highest level of advertiser service and user experience. In fact, advertisers and agencies now have the ability to serve ads and measure performance through these certified third parties:
* Advertiser ad servers: DoubleClick (DFA), Mediaplex
* Rich media agencies: DoubleClick Rich Media, Eyeblaster, EyeWonder, Interpolls, PointRoll, Unicast
* Research firms: Dynamic Logic, IAG Research, InsightExpress, Factor TG
We will be certifying more third-party partners in the future.
Advertisers and agencies will now be able to manage their Google content network campaigns with the same systems they use for other online campaigns, which is helpful for determining the effectiveness of their online advertising mix. Further, this new service gives advertisers and agencies more opportunities to increase their return on investment and reach new audiences in informed and creative ways. The response from those testing early versions of the program have been positive.
For publishers on the network, this program offers a way to expand their advertiser base and enable advertisers to better understand the value of their inventory, with the goal of increasing their overall revenue. And they'll be able to show more compelling display ads to their visitors, enhancing their web experience.
2008年05月19日
Google Treasure Hunt update
Avast, matey! As announced on the Google Australia blog, we've launched Treasure Hunt ― a puzzle contest designed to test yer problem-solving skills in computer science, networking, and low-level UNIX trivia. You'll find the first of four brainteasers at http://treasurehunt.appspot.com/. A new puzzle will be posted every week for the next three weeks, and a few lucky gobs to submit correct answers to every question will receive a prize.
The second puzzle will be appearing soon ― to be exact, 936266827 seconds before Y2K38, so keep yer eyes open. We'll also be highlighting our Mountain View mother ship, so step smartly, lads and lasses, and good luck!
In case ye missed out on the first week's puzzle, it's still available, so 'tis not too late! ARR! (Can you tell we can hardly wait to Talk Like a Pirate?)
The second puzzle will be appearing soon ― to be exact, 936266827 seconds before Y2K38, so keep yer eyes open. We'll also be highlighting our Mountain View mother ship, so step smartly, lads and lasses, and good luck!
In case ye missed out on the first week's puzzle, it's still available, so 'tis not too late! ARR! (Can you tell we can hardly wait to Talk Like a Pirate?)



