Methods Search Engines May Be Using to Determine E-A-T

We recently discussed Google’s August 1st broad core algorithm update, and how to recover from it or take advantage of the changes. We pointed out that Google updated their publicly available search quality evaluator guidelines on July 20th, and that the changes to the guidelines were likely put to use in the core update.

A major change to the guidelines was a much stronger emphasis on E-A-T (expertise, authority, and trustworthiness) as opposed to the more vague idea of “quality.” In our previous post, we shared our advice on how to improve your E-A-T score as well as make other changes reflecting the changes to the rater guidelines.

In this post, we wanted to discuss some of the ways in which search engines like Google might be evaluating E-A-T.

We should warn you that we don’t have inside knowledge regarding how these algorithms work, and there’s a good chance that Google employed machine learning in the process of updating the core algorithm, in which case even Google’s engineers wouldn’t be able to give you a step by step explanation of how the algorithm evaluates sites and queries. So what follows is necessarily speculative.

That said, I believe that some healthy speculation about which factors could be at play here is beneficial. While any actions you take should be taken with the end user in mind, and it’s helpful to imagine a hypothetical Google quality rater evaluating your site every step of the way, I also believe that consideration for how the algorithms might work can inform our efforts.

TrustRank-Like Evaluation of Authors

One of the most notable changes to the quality rater guidelines was a stronger emphasis on authors and content creators. In places where brands or websites were previously mentioned exclusively, consideration for the individual author responsible for an individual piece of content was added.

Quality raters were asked to evaluate the expertise, authority, and trustworthiness of individual authors. An author who had a reputation for spreading deceptive, misleading, or false information would result in a low quality score.

Likewise, if the quality rater was unable to find information about the author, that could have a negative impact on the quality score, depending on the purpose of the page. This isn’t to say that an author needs to be famous for the page to receive high E-A-T, they merely need to have the appropriate level of expertise for the topic in question.

While we can’t know for sure how Google is evaluating author E-A-T, the algorithms at play most likely have some conceptual similarities to the algorithm known as TrustRank.

TrustRank was first developed as a collaboration between computer scientists at Stanford and Yahoo!, but Google has a patent for a similar algorithm, and a modern web search engine is almost certainly employing some variation on the concept.

The idea behind TrustRank is as follows:

  • A set of trustworthy “seed” websites or pages is first identified. These sites are evaluated by experts to determine if they are trustworthy.
  • TrustRank flows through links from these seed sites to the rest of the web, getting more and more diluted as it is split among more links and gets further and further away from the seed set.
  • Pages that “closer” in the link graph of the web to the seed set are thus thought to be more trustworthy, while pages that are far away from this seed set are thought to be untrustworthy.

While TrustRank is based in hyperlinks, a similar concept can be put into use to evaluate the trustworthiness of authors.

Consider the following variation on the concept:

  • A set of trustworthy “seed” authors are first identified.
  • AuthorRank flows from these trusted authors to other authors that are published on the same sites as them.
  • Diluted AuthorRank flows from these secondary authors to tertiary authors on other sites, and so on.
  • Authors that have never shared publication on a site with somebody who has high AuthorRank would thus be considered to have low or neutral AuthorRank.

Other variations on these can exist as well, such as:

  • “Anti” or “negative” AuthorRank can exist, in which a seed set of untrustworthy authors are identified and negative trustworthiness spreads by association.
  • A seed set of untrustworthy or trustworthy sites, and a distance from them could be used in place of a seed set of authors.
  • Proximity to authors on social media could potentially be put into play.

While we can’t be sure that Google is using this specific method to evaluate author trustworthiness, there is no harm in behaving as though they are. Preserving your reputation means only seeking publication on trusted platforms, and avoiding association with untrustworthy ones.

Bear in mind that trust is likely topic dependent as well. Somebody with a high degree of expertise in neuroscience shouldn’t’ be taken as an expert on economics, and any TrustRank-inspired algorithm would likely weight expertise by the phrases and jargon used in different fields.

The Credibility of Contact and Biographical Information

The quality rater guidelines are careful to note that some authors won’t have much of a reputation, good or bad, outside of the site they exist on. The raters are told that for smaller sites and brands, a lack of information about the brand shouldn’t be taken as evidence of either a positive or negative reputation.

Even so, raters are repeatedly urged to check for contact information and to verify that it is accurate. They are also told to make sure that the creator of the content is an expert on the topic in question, particularly for “your money, your life” (YMYL) pages.

For example, a site giving law advice should feature content exclusively written by lawyers. Medical content should be created exclusively by medical professionals. Science content should be written by scientists or science journalists.

In cases such as these, where the author hasn’t been published elsewhere, reviewed by a customer, or mentioned by another expert, how would Google be able to evaluate expertise?

Under these circumstances, the most Google would have to work with is the author’s name, contact information, and any biographical information published on the site.

But since quality raters are asked to evaluate reputation based on third-party sources wherever possible, undoubtedly they would consider such biographical information more trustworthy if it could be verified elsewhere.

From our perspective as publishers and content creators, this means that we should be doing everything we can to make sure that our biographical info and contact info is verifiable. We should seek to be listed publicly as alumni for any schools mentioned in our biographies. Our contact information should be identical across all websites and social media profiles, if possible. Our biographies across different sites should share similar information, and any organization we mention in our biography should ideally feature a profile for us somewhere on their site.

This is similar to the way in which citations are used in Google local search results. Local SEO demands the consistent use of a name, address, and phone number (NAP) across multiple authoritative sites. While it’s unlikely that Google would demand replication this exact or rigorous for author reputation, the easier we can make it for a search engine to verify the accuracy of our biographical information, the better.

The Knowledge-Based Trust Algorithm

In February of 2015, Google published a paper about an algorithm referred to as knowledge-based trust (KBT) that was designed to evaluate the trustworthiness of web sources. With Google’s rater guidelines placing more emphasis on E-A-T, we now have reason to believe that this algorithm, or something similar, has either been added to the core algorithm, or its use as a factor has been more heavily weighted.

The knowledge-based trust algorithm works like this:

  • Facts are extracted from web pages in the form of “triples,” statements of the form: subject, predicate, object. The paper mentions that 16 different types of extractors are used to pull these facts from web pages and that they are cross-referenced for accuracy with Freebase (now folded into Wikidata).
  • A first estimate on the accuracy of any given fact is then found by compiling the popularity of the fact amongst various websites and the consistency of the facts amongst the various extraction methods.
  • Since we wouldn’t want to decide a fact is true simply because it is popular, the KBT algorithm then refines this process using the tautology that “a source is accurate if its facts are correct” and “the facts are correct if they are extracted from an accurate source.”
  • Using an iterative process, each source is awarded a trustworthiness score based on how accurate its facts are. This trustworthiness score is then used to revalue how accurate the facts are, which is then fed back into the algorithm again, and so on in a circular fashion.
  • This process continues until the trustworthiness score of each site converges to a static value.

In summary, your pages should be considered trustworthy if they tend to state facts which other trustworthy sites also tend to state.

Judging by the updates to Google’s quality rater guidelines, if any modifications were made to this algorithm, it was most likely to count each individual author as a source of trustworthiness, rather than websites or web pages.

If any other improvements have been made to the algorithm, they would most likely have come in the form of improvements to the extractors used to extract facts from websites, which at the time of the release weren’t particularly reliable.

To optimize for the KBT algorithm:

  • Cross-reference your facts with Wikidata, since it plays such an important part in the development and testing of this algorithm
  • Cite original sources for your facts
  • Fact check every piece of content you produce
  • Make sure that your more editorial, opinionated content still contains factual content to support it
  • Avoid working with authors who have published misleading or false information in the past
  • Avoid publishing on sites that have a history of publishing false information

The Anticipation of User Subsequent Queries

This is one of the more speculative ways in which expertise may be measured, but it shows promise since it can be deployed even in the absence of reputation in the eyes of third parties. There are many cases where an author isn’t well known enough to earn a clear reputation, but this doesn’t mean the author is not an expert on the relevant subject matter.

The idea behind this measure is that an expert in a topic would be more likely to understand and anticipate the kinds of follow-up questions a user might have. For example, consider how an expert and a non-expert might respond to a query like “makeup for deep-set eyes.”

The non-expert would probably just read a tutorial or watch a YouTube video on the subject and regurgitate the steps for their audience.

An expert, on the other hand, would be able to offer advice on the subject, but then anticipate other questions the searcher might have, such as how the advice would differ based on the shape of their eyes, how to determine their eye shape and face shape, how to find shades that work well with their skin tone, where they can find the right makeup for their needs, and other common questions they repeatedly encounter when somebody approaches them with the initial question.

The difference between the expert and the non-expert is that their level of experience gives them the ability to predict what people need to know before they themselves know to ask for it.

This is also, to some extent, something that Google is capable of measuring.

Google has a history of user queries and knows what kinds of queries tend to follow other ones. We have seen Google using this to its advantage for years already, changing search results and autocomplete based on your search history, such as suggesting “uninstall winebottler” if you type “unins” after previously searching for “winebottler.”

If Google sees that your content responds to queries that tend to come shortly after the initial query that would have brought a user to you, then Google has reason to believe yours have expertise on the topic. Your ability to predict what the user was going to search for next means that you probably have some experience with the topic, or at least that you have done extensive enough research to seem as though you do.

For this reason, we suggest being as comprehensive as possible with your content, digging deep and offering your users the most useful content possible, rather than simply addressing the surface-level query without deeper reflection.

For more depth on this, you can look at Google’s “searches related to” box, as well as put your query into keyword.io for autocomplete suggestions, and running your own manual searches to see personalized autocompletes that go with your search history.

Provided you address these queries with authoritative content, it may help the search engines perceive your content as displaying expertise.

Jargon, Taxonomy, and Semantic Hierarchy

In the section on TrustRank-inspired methods of measuring author trustworthiness, we briefly mentioned that any measure of trust is likely topic-dependent, and that use of jargon likely plays a role in this.

One method a search engine could potentially use to estimate the expertise of a content creator would be through the use of estimating how well the author understands the Ontology of the subject. In information science, ontology is the study of semantic hierarchies, a way of naming and representing how various concepts, data, and entities are categorized and relate to one another.

In an important research paper, Google discussed Biperpedia, an ontology of 1.6 million pairs of classes and attributes, and 67,000 attribute names.

Biperpedia was built using extractors that pulled classes and their attributes from user’s queries, allowing Google to better understand long tail queries, extract more facts from the web, and interpret the semantics of web tables.

Biperpedia demonstrates that Google is capable of understanding what types of attributes certain classes of nouns are capable of having.

Anybody who has seen a knowledge graph search result knows that Google is capable of doing this, for example understanding that Donald Trump has attributes such as Net worth, birth date, height, spouse, children, and parents:

What the Biperpedia paper reveals, however, is that what we see here is just the tip of the iceberg. 1.6 million class-attribute pairs dig far deeper, and this paper is from 2014. Google’s understanding of semantic relationships undoubtedly digs far deeper than this in 2018.

Experts in a topic will display a deeper understanding of the semantic relationships within that subject matter. A non-expert will only be able to tell you if a wine is red or white, while a sommelier will break their description down into body, yeast, style, tannin, acidity, alcohol, spice, fruit, flower, herb, oak, and inorganic. Simply being aware of how to categorize a wine’s flavor and mouthfeel displays a degree of expertise absent in a non-expert.

Note that displaying an understanding of the semantic hierarchy of your subject matter is more in-depth than simply using jargon. It isn’t just about the words you use, but about the way you categorize and relate information within your domain of expertise.

Using buzzwords and jargon in a way that obfuscates rather than clarifies is not going to do you any favors.

It’s impossible to know whether or to what extent Google is using ontologies to evaluate the expertise of authors, and there are computational limits that would make this kind of evaluation at scale difficult. Still, there may be shortcuts a search engine could take in estimating expertise using related methods.

We need not assume Google is analyzing the ontology of an author’s work and comparing it to the established ontology within that subject matter. It could be as simple as comparing the complexity of a non-expert’s ontology with the complexity of an expert’s in a more general sense.

What I mean by this is that non-experts and experts tend to discuss topics in different ways. A non-expert is more likely to discuss things at the surface level, while an expert is more likely to dive deeper on a subject. Without understanding or interpreting what an author is actually saying, or comparing it to other experts in that particular field, an extractor could still determine how deep an individual document’s ontology goes, the number of attributes at play for each class, the number of subclasses, and so on.

There is a psychological effect known as the Dunning-Kruger effect, in which a non-expert assumes that a subject is much less complicated than it actually is, and thus overestimates their proficiency with that subject. It stands to reason, then, that a non-expert’s take on a topic will have a much more shallow and simple ontology than an expert’s.

From our perspective as content creators, this is a reminder to be comprehensive, dig deep, and address the less obvious and more novel aspects of the subject at hand.

Conclusion

Google’s stronger emphasis on E-A-T and individual content creators in the new rater guidelines, and most likely in the core algorithm update, reflect not just a change in Google’s abilities, but also it’s interesting. Google and other tech companies have been accused of perpetuating fake news, and the search engine has made it clear that they want to address this as rigorously as possible.

As Google moves from acting as a search engine to acting as a personal assistant capable of making calls and setting appointments for users, it becomes more important for them to be seen as a reliable source of accurate information, and the pressure for them to return pages with high E-A-T will only continue to grow.

Speculating about how search engines can estimate or measure E-A-T for author’s and brands is useful and necessary work for us in the SEO industry if we want to make it as clear as possible to the algorithms at play that we know our topics.

At the same time, if we want to be successful in the long run, we must invest not just in appearing to have high E-A-T, but in legitimately having it. A strategy that encompasses both is the most fruitful investment of time and resources.