When Google suggests queries to search upon, based upon what is being typed in a search box, how does it come up with those suggestions?
Autocomplete query suggestions
Internet search engines aim to identify documents or other items that are relevant to a user’s needs and to present the documents or items in a manner that is most useful to the user. Such activity often involves a fair amount of mind-reading–inferring from various clues what the user wants. Certain clues may be user specific. For example, knowledge that a user is making a request from a mobile device, and knowledge of the location of the device, can result in much better search results for such a user.
Google appears to have changed the documents that they will present in these query predictions to make what may seem to be better guesses.
THE OLDER FIRST CLAIM
A computer-implemented method for processing query information, comprising: receiving query information at a server system, wherein the query information includes a portion of a query from a search requestor, the query information being received prior to receiving data indicating that the search requestor has completed the query and the portion of the query from the search requestor being only a portion of a final query; obtaining a set of predicted queries relevant to the portion of the query from the search requestor based upon the portion of the query from the search requestor and data indicative of search requestor behavior relative to previously submitted queries, wherein the set of predicted queries includes two or more predicted queries, and each predicted query is a prediction of a possible final query of the search requestor and wherein each predicted query includes the portion of the query and is different from each other query; ranking the predicted queries in the set of predicted queries according to a ranking criteria; providing the ranked set of predicted queries for display to the search requestor; determining whether an input is received from the search requestor selecting a predicted query, of the ranked set of predicted queries displayed to the search requestor, within a specified time; in response to a determination that the input from the search requestor selecting a displayed predicted query is not received within the specified time: obtaining a subsequent ranked set of predicted queries for the portion of the query from the search requestor, the predicted queries in the subsequent ranked set of predicted queries being ranked according to different criteria than the predicted queries in the ranked set of predicted queries; and providing the subsequent ranked set of predicted queries for display to the search requestor in response to receiving the query information.
THE NEWER FIRST CLAIM
The predicted query suggestions shown as autocomplete results in the new claims are based on a searcher’s previous searches and documents that they may have looked at and interacted with previously. That is a change from the first version of the patent. This reminded me of another continuation patent I looked at from Google, which I blogged about in a post that I called, Personalizing Search Results at Google. That one told us that it might personalize search results by selecting them from the union of two different sets of documents. One of those sets is a set of “high-quality sites” and the other set includes documents that are considered “bias documents” or pages that may have shown up in a person’s search or query history. They may have visited those pages before, or seen them in a set of search results and not clicked through to them.
So, autocomplete query suggestions may end up returning documents that a person may have seen before (a positive bias, maybe), or interacted with in some way such as not selecting them from search results (more of a negative bias). I’ve highlighted in yellow where it talks about previous documents.
A method performed by data processing apparatus, the method comprising: receiving, from a user device of a user, query data specifying a portion of a query entered by the user; selecting, based on the portion of the query and first criteria different from query text entered by the user, a first set of predicted queries that each predict a respective final query for the portion of the query; providing, to the user device, data that cause presentation of the first set of predicted queries at the user device; receiving, from the user device, a user request for additional predicted queries, wherein the user request is sent by the user device in response to user-initiated activity; in response to receiving the user request for additional predicted queries, selecting, based on the portion of the query and second criteria that is
(i) different from the first criteria and
(ii) different from query text entered by the user, a second set of predicted queries that each predict a respective final query for the portion of the query, wherein at least one of the first criteria or the second criteria is based upon a behavior of the user relative to documents provided to the user in response to previous queries received from the user; determining that the second set of predicted queries includes a given predicted query that is included in the first set of predicted queries; removing the given predicted query from the second set of predicted queries; and providing, to the user device, data that cause presentation of the second set of predicted queries at the user device.
LINKS FROM SITES AND CLICK DATA
This is an interesting statement about auto-complete query suggestions that appears in both the older, and the newer version of the patent. It seems that if the search results from a query that is being typed link to other results from query suggestions, that may be a sign that a query that returns those linked-to pages may be something that would interest a searcher. That isn’t part of the change from the older version of this patent, to the newer version, but it is an interesting aspect of both of them, which shows the potential value of linking out to other sites and other pages:
Clues about a user’s needs may also be more general. For example, search results can have an elevated importance, or inferred relevance, if a number of other search results link to them. If the linking results are themselves highly relevant, then the linked-to results may have a particularly high relevance. Such an approach to determining relevance may be premised on the assumption that, if authors of web pages felt that another web site was relevant enough to be linked to, then web searchers would also find the site to be particularly relevant. In short, the web authors “vote up” the relevance of the sites.
Other various inputs may be used instead of, or in addition to, such techniques for determining and ranking search results. For example, user reactions to particular search results or search result lists may be gauged, so that results on which users often click will receive a higher ranking. The general assumption under such an approach is that searching users are often the best judges of relevance, so that if they select a particular search result, it is likely to be relevant, or at least more relevant than the presented alternatives.
Particular embodiments of the described subject matter can be implemented to realize one or more of the following advantages. A search assistant receives query information from a search requestor, prior to the requestor indicating completion of inputting the query. Additionally information associated with previous user (or users) searches (such as click data associated with search results) is collected. From the received query information and the previous search information, a set of predicted queries is produced and provided to the search requestor for presentation.