Professional SEOs have been in the industry long enough to recognize that Google’s algorithm is huge, complicated, and impossible to reverse engineer. If you’ve been trying to game the search engines, it’s harder now than ever, and it definitely is not the path to sustained search engine rankings.
That said, understanding some things about how the algorithm works, and what the Google search engine’s goals are, remains key to successful SEO.
Discussion about the algorithm this year has been lighter than usual. I feel it’s necessary to remind ourselves every once in a while that even though we are marketers, SEO is still a technical discipline.
Let’s reacquaint ourselves with the search engine algorithm, and the direction it’s headed.
Google Then And Now
To understand how Google works today, we need to start by understanding how Google worked at its inception. For that, we need to turn to the original PageRank paper, published at Stanford and authored by Sergey Brin and Lawrence Page.
The paper outlines the two key features of the Google search engine, still vital to its success today.
The first feature is PageRank. PageRank is essentially an estimate of the probability that somebody would come across a page by randomly clicking links on the web. So if one page has a lot of links pointing toward it, a person is more likely to stumble across that site.
If, in turn, that heavily linked site links to another site, that link is more likely to bring people to the site that it links to than a typical link. From this we get the result that links from a more “authoritative” site pass more PageRank and thus contribute more to your rankings in the search results.
The other key feature was factoring in the anchor text of the link. Since, at the time, people used links primarily for citation or to describe what somebody would find if they clicked the link, the anchor text in a link was a good indicator of what was on the page.
That paper was published in 1998, which is now roughly twenty years ago.
A lot has changed.
Google’s success in the beginning was the result of analyzing the structure of the web in its “virgin” form, but once Google became a force in the market, its influence on that structure could not be separated from it. It wasn’t long before various link manipulation schemes came into play, and spam sites battled their way to the top of the search results once again.
It has been a never ending battle between the spammers and the search engines from that point forward, with professional SEOs in the middle, looking for the best way to building long term value into their marketing strategies.
The algorithm has changed in countless ways. Google embraced machine learning and uses it to identify low tier quality with Panda, putting an end to the “content farm” industry. Penguin was used to root out spam links and to forever discredit paid links and other spam link building tactics. The importance of solid branding was hammered home by the Vince update, and RankBrain introduced intelligent interpretation of queries.
What does Google’s algorithm look like in 2018? How has it changed, and how will it change?
Google’s New Priority: Fake News
Google’s algorithm updates are driven in large part by how Google is being covered in the press. If you take a look back and the landscape in 2011, you will find that Google was receiving a great deal of bad press as a result of the rising influence of content farms on Google’s search results.
Around that time, you’ll find posts beating up on Google’s tolerance of content farms made by Business Insider, TechCrunch, ReadWriteWeb, and GigaOm, to name just a few. These murmurings were bad PR for Google, and they responded, quite directly, by announcing the launch of Panda even before it went live. Panda was not the first response to such articles, either, with the “May Day” update being a prior update dead set on combating low quality content.
You may have noticed that, recently, Google’s algorithm is being most harshly criticized for failing to properly deal with “fake news.”
Business Insider is, again, one of the biggest critics. Multiple sources had informed them that Google had been tweaking its algorithms to more strongly reward sites that you were more likely to click on, something that the SEO industry had suspected for quite some time. Business Insider pointed out that this often causes fake news sites to perform well in the search results, even long after the stories have been disproven. Examples they gave included high rankings for fake stories about Obama banning the national anthem, and Hilary Clinton selling weapons to ISIS.
The Huffington Post likewise pointed out that Google is rewarding Holocaust denial sites over factually accurate sites, even for queries that don’t necessarily indicate that you are looking for Holocaust denial information. This was despite Google claiming to have made efforts to demote these sites.
Mashable also pointed out that Google has twice rewarded false, politically charged information about the gunmen in mass shootings.
Needless to say, these errors are even worse press for Google than the rise of content farms circa 2010, especially in such a politically charged climate.
In 2015, Google published a patent outlining Knowledge-Based Trust, an algorithm that extracts facts from sites to determine which facts are accurate based on popularity, uses those facts to identify which sites are trustworthy, feeds that trustworthiness back into the algorithm to determine which facts are accurate, and so on until the results converge.
It’s not clear whether this algorithm was ever put into use in the Google search engine, but if not, now is about the time Google would be looking at ways to make it work. If so, Google is almost certainly considering boosting the relevancy of the signal.
Either way, Google’s fake news problem is bad PR, and we can expect a major update in 2018 that will punish sites that publish false information.
Perhaps the biggest change to Google’s algorithm since its inception, and to its philosophy in general, has been its willingness to embrace machine learning.
Before the launch of Panda, Google was averse to using machine learning to tackle problems. The reason for this is that machine learning does not give the coders and software engineers behind the algorithm a clear understanding of what exactly it is doing.
A machine learning algorithm is simply given a set of data, a set of right and wrong answers, and then “trained” using that data set. Machine learning algorithms have a tendency to get stuck in local minima, unable to find more efficient solutions, and it’s difficult to predict how they will behave once they are given new data.
Even so, certain types of problems are likely only solvable with the use of machine learning. The content quality classifier produced by Panda was evidently the first instance where Google found it necessary to make the leap.
Google’s reliance on predicting which links people are more likely to click on is likely a manifestation of this relatively new acceptance of machine learning, but the fake news problem is the flipside of this reliance, and the result of the aforementioned unpredictability.
One of the more striking examples of Google’s machine learning capabilities can be found in their Cloud Vision API. The algorithm has been trained on a huge data set of images and is capable of identifying a surprising number of objects in images with striking accuracy.
The implications for Google’s image search are obvious, but these results also hammer home just how important AI is becoming for search. If you needed more evidence, Google has so much machine learning capability at their fingertips that they are currently selling it as its own cloud product. TensorFlow, likewise, was originally developed for use by Google, but is now its own product. If that’s still not enough, you may be interested to know that Google’s AI are now smart enough to develop smarter versions of themselves.
Despite the unpredictability, machine learning affords Google’s search team with the ability to judge your site in ways that are not possible to reverse engineer, even by Google’s own employees. This makes keeping Google’s business model in mind one of the most important skills for the modern SEO.
If the modern Google search engine is as influenced by machine learning algorithms as it is by code written by actual humans, we need to start thinking more and more about exactly what Google itself is trying to reward, rather than what it is currently rewarding, since more AI will inevitably bring more powerful changes.
To understand that on a level deeper than Google’s guidelines, we should be taking Google’s search quality evaluator guidelines into account.
These guidelines are used by human beings to rate the quality of sites manually. Those ratings are not generally used to adjust site rankings directly, but are instead used to train machine learning algorithms in order to rank sites in a similar manner automatically, without direct human input.
Here are a few of the most important things to learn from those guidelines:
- The quality standards are most rigorous for “Your Money Or Your Life” (YMYL) pages, pages that ask for your money, give you financial advice, give you legal advice, have a major life impact, or give you medical advice.
- Expertise, authoritativeness, and trustworthiness are the primary factors that raters are asked to consider when rating a webpage.
- Google specifically wants ads to be ignorable. That is, they should not actually interfere with the user experience, only be present for those users who want to interact with them.
- Google cares more about the focus of design than its aesthetic value. Google is not trying to reward pretty sites, but rather sites that create a bad user experience as a result of cluttered focus or intentional manipulation.
- Having contact information and background information is important.
- The reputation of the author comes into play.
- Low-quality sites have insubstantial information in the main content that is poorly written or unhelpful.
- For low quality sties, the amount of main content available isn’t a good match for the user query.
- Low quality sites feature authors who don’t have the right expertise for a given topic.
- Low quality sites have a bad reputation.
- Low quality sites offer supplementary information that is distracting instead of on topic and helpful.
The primary thing to take away from the guidelines, however, is how quality is determined. What makes a page valuable is the fact that it meets the needs of the searcher. Pages are ranked as “fully meets,” “highly meets,” “slightly meets,” and “fails to meet.”
If your page “fully meets” the needs of the searcher, it means that they were satisfied with the result and don’t need to look anywhere else to find more information.
Updates In 2017
To understand where things are headed for Google’s algorithm in 2018, we need to review how things changed in 2017. Perhaps due to a great deal more radio silence from the Google team on algorithm updates, the SEO industry hasn’t focused as much on algorithm updates as they have in prior years.
In addition to Google’s radio silence, I believe more SEOs are waking up to the fact that we can’t reverse engineer the algorithm, that gaming it is becoming increasingly difficult, and that we should be focusing on SEO strategies that have marketing benefits outside of direct SEO.
All of that is true, but SEO is still very much a technical field, and I believe we need to keep up with changes in the algorithm in order to stay informed and ahead of future changes.
Let’s take a look at what changed in 2017.
– Mobile Interstitials – John Mueller confirmed that Google was rolling out a penalty for mobile interstitial ads on January 11. This specifically targeted interstitials that showed up only on mobile devices and only after clicking from Google search results. Interstitials targeted would be popups that covered the site content, standalones that users would have to dismiss, and above the fold ads that look like interstitials, when in reality the content was below the fold.
– Links and spam – An unconfirmed Google update occurred circa February 1, and seems to be either directly related to or similar in nature to Penguin. Most of the noise was related to private blog networks, networks of sites set up to create links.
– A Quality update – An unconfirmed update also occurred around February 7 that had a huge impact. The sites most heavily impacted suffered from quality issues. Glenn Gabe stressed that quality here wasn’t just about content, but about aggressive ad placement, deception, things breaking, autoplay, pop-ups, interstitials, and so on. Gabe drew how tight the connection was with Google’s human quality rating guidelines.
– Fred – On March 8, several webmasters began to talk about a huge update taking place. The update, which had occurred on March 7, impacted sites with a specific set of issues, according to Barry Schwartz. Most sites were heavy on ads, had an unfocused topic, wrote content that seemed obviously written with rankings in mind, didn’t add value above and beyond other search results, and tended to see drops of between 50 and 90 percent of their traffic.
– May 17 – Another major update occurred on May 17 that also seemed to focus most heavily on quality related issues. Glenn Gabe pointed out once again the connection to Google’s quality rating guidelines and noted that the hardest hit sites had issues with aggressive advertising, UX barriers, thin content, deceptive ads, low content quality, and user interface issues. One example he shared featured both a content ad pop-up and a browser notification request at the same time. A site that had recovered, meanwhile, had refined its sponsored links and ad placement.
– June 25 – Search Engine Journal and others reported that a significant Google update had occurred around June 25. The update seems to have been quality related and, interestingly, focused primarily on sites ranking in positions 6 through 10. Evidently Google was interested in making an adjustment to the variety of pages making it to the first page.
– August 19 – In his analysis, Glenn Gabe found that this update seemed to be connected to some testing that occurred on August 14, since sites affected on August 14 also seemed to be affected on August 19. Factors that hit sites were similar to the other quality updates last year, with their connection to the quality ratings. Category pages that were ranking well without providing any main content were a common target for demotion. Deceptive ads were again a huge contributor. Sites with flash elements and other broken UI were also hit hard.
– November 14 – Around this time another update hit. Once again, ad-heavy and thin content sites seemed to be the hardest hit.
A Changing SEO Landscape
In this post, I’ve tried to offer a comprehensive look at the state of Google’s algorithm and its near future. The ever growing influence of machine learning will push this industry towards an understanding and internalization of Google’s quality ratings, as well as predictions about how the press will influence the direction of its updates. A politically charged landscape is bound to influence the algorithm, as is Google’s need to keep AdSense profitable.
While I believe that a solid SEO strategy is built to last regardless of the decisions Google makes, the fate of this industry will always be intertwined with the future of search engines.