Insights

7 Things We Have Learned From the Google Search Document Leak

The Google Search document leak is big news in the world of SEO. Here we break through the hype to explore what it means for businesses that rely on the web for sales, sharing actionable advice that can be gleaned from the leak.

Emma Grant
June 5, 2024

The Google Search document leak is big news in the world of SEO. In this post, we’re breaking through the hype to get to the bottom of what it all means at ground level for businesses that rely on the web for sales, and sharing what we believe are the top pieces of actionable advice that can be gleaned from the leak.

In case you missed it, Google had a rather large Search document leak occur recently and SEOs have been having a field day trying to make sense of it all.

There’s been a massive amount of hype around it, and much of the feedback from the many industry insiders who’ve been unravelling the contents of the leaked Google API documents have been returned on a deeply technical level.

We have of course taken a deep dive into the fallout to see what there was to learn from it all. Now we’re clear on the most important aspects, we’re ready to sum up for you – in plain English – the essential parts you need to know, and more importantly, how they might affect your business in the online world.

What is the Google Search document leak?

On March 13^th 2024, more than 2,500 documents, which appeared to emanate from Google’s internal Content API Warehouse, were released on developer code sharing platform GitHub by an automated bot called yoshi-code-bot.

The documents made their way into the hands of SEO consultant Erfan Azimi. He shared the findings with Rand Fishkin, co-founder of audience intelligence tool SparkToro and founder and former CEO of the globally renowned SEO tool Moz. As Fishkin had not been at the front line of search for six years, he collaborated with Mike King, iPullRank CEO. Both reviewed and analysed the leaked information and shared their thoughts on their respective blogs.

What did Fishkin and King discover, and why does it matter?

In a nutshell, information has been uncovered around how Google Search has or is using clicks, links, content, entities, Chrome data and other elements to rank content.

This is important, as some of this information may form part of Google’s closely guarded search ranking algorithm, which it has always been intensely secretive about.

How did Google react to the leak?

At first, Google stated that many assumptions were being published in response to the leaked documents. They said they were out of context and based on incomplete information.

They wouldn’t comment about specific elements of the documents either, failing to confirm which were accurate, which were invalid and which are currently being used, and how they’re being used. There wasn’t even any notion as to whether the documents were authentic.

On 29^th May, however, Google did confirm that the collection of documents is authentic. But that did not ease the widespread disgruntlement in the SEO community that had burgeoned in the 48 hours following the leak’s announcement.

The trouble was that the Google API documents showed clear details about ranking signals that Google had historically stated they did not use. And that’s why so many people felt they’d been lied to and misled by Google.

Why should we care?

The way Google decides how to rank a website has a huge impact on any business with a reliance on the web to drive sales. So, as you can imagine, if they’ve been saying one thing, but have been doing another, that’s not very helpful to anyone involved in managing SEO campaigns. And that’s putting it politely.

But not everyone in the SEO world feels that way. Some support Google. They could never bring themselves to believe that they’d be wronged by such a ‘trustworthy organisation’.

Between those two fences – the disgruntled and the believers – there sits the rational, ‘prove it for yourself’ camp. The ones of the opinion that Google sometimes tells the truth, but that it’s always best to do your own due diligence and test the theory.

And that’s pretty much where we’re sitting at this moment in time. We are ready to examine the theories, and see how we can draw tangible, actionable conclusions from them that will benefit our clients.

So, what next?

The Google API documents referred to more than 14,000 possible SEO ranking factors. Obviously we’re not going to unpick all of those in this post. That we’ll spare you of.

Ranking factors uncovered by the Google Search Document leak — The Google API documents referred to more than 14,000 possible SEO ranking factors.

But what we can do is share some of the SEO ranking factors that have come out of this revelation as being just as important as, or more important than previously realised. And then we can draw some actionable conclusions.

So, here we go…

1. Links ARE important – but there’s more

Contrary to what we may have been led to believe, backlinks remain an important ranking signal. But there’s more… and it’s all about the right type of links.

The Google Search document leak has revealed that Google favours links from established sites with high domain authority, using non-spammy anchor text, and on popular, high-traffic pages.

Basically, links from a regularly updated news site that attracts many visitors each month are more beneficial, as opposed to links from a site that hasn’t been updated in years and gets few visitors, even if it does have a high domain authority.

New links basically trump old links. And one way to attract new links is by investing in digital PR, pulling in fresh links from regularly updated news sources.

Anchor text is also important. The clickable text of a link provides context about the linked page. Descriptive, relevant anchor text can enhance the relevance signals of a page and help it rank better for certain keywords.

However, excessive use of exact match keywords can result in penalties. Anchor text should therefore be naturally integrated within the content so that the reading experience remains seamless.

Backlinks remain an important ranking signal according to the Google Search document leak. — Contrary to what we may have been led to believe, backlinks remain an important ranking signal. It’s all about the right type of links, and fresh links trump existing links.

2. Google ‘site authority’ score does exist

‘Domain authority’ or ‘domain rating’ have traditionally been connected to specific SEO tools such as Moz, Semrush or Ahrefs to show the growth in authority or strength of a domain, mostly based on links.

Google has long had us believe that they don’t measure domain authority. But now we are led to question whether that’s the case.

Following the Google API document leak, it has come to light that Google does indeed have a feature named ‘siteAuthority’ which looks very much like an equivalent version of domain authority or domain rating.

Unfortunately, it’s not clear how Google calculates its site authority, or the extent to which it uses it within its ranking systems, if at all. But there are clues that it’s playing some sort of role in its quality signals.

So it looks like we should take this as authority matters. This returns us to the importance of backlink quality, and the value of fresh links coming from high quality news sites.

3. Content authors matter

Whilst there’s nothing specifically mentioned in the API leak about it and it’s never been confirmed as an SEO ranking factor, we all know the value of E-E-A-T as a quality signal, and it’s obviously important for creating user trust.

However, something we did learn was that Google tracks authors (known as ‘entities’) across the web and within a website.

There’s also a way for them to tell if an author is the same person across different websites, and cross referencing to verify whether the entity on a page is actually the content’s author.

Now, we know that an important aspect of E-A-A-T is article authorship. So what we’ve learnt from this aspect of the leak is the importance of making sure a site’s blog posts carry an author bio and author archive, preferably connected to a social media account or other relevant bio.

It’s also valuable to carry the author’s name into published news posts as the ‘expert quote’ to maintain consistency outside of the website.

Something we learnt from the Google Search document leak was that Google tracks authors across the web and within a website. This makes including an author biography with every article vital.

4. Content dates are significant

The Google Search document leak has prompted a renewed look at the importance of paying attention to dates on published content.

The leak has shown that Google places significant value on fresh, up to date content.

Whilst there was no mention of any date-related demotions, maintaining consistency across the dates of published content across all of Google’s date measurements can be considered best practice.

This is easier to explain by way of an example:

You publish an article on your website, say for example, ‘2022 Fashion Trends to Watch’. Its URL contains the year 2022. But a few months later, you update the article and change its title to 2023. Then, 12 months later, you make some more amends, and add ‘last updated 2024’ into the content.

So now you have an article with a 2022 URL, a 2023 title, and 2024 in the content. Together, this mishmash of dates could harm the ability of the article to rank, because it is confusing Google as to the date your content was published.

Really, the best advice is to avoid including dates in article URLs at all, and ensure when refreshing content, all the dates are changed consistently.

5. User experience is top of the list

We’ve always known that user experience is important. After all, why would we want our website users to have anything other than a positive experience?

The leaked data shows that user interaction, especially click data, plays an important role in ranking. The ‘NavBoost’ system uses clickstream data to rank pages based on user behaviour, rewarding sites with a higher level of engagement.

The API documents specifically mention features such as ‘goodClicks’, ‘badClicks’, and ‘lastLongestClicks’. This is basically showing that Google is tracking how users make their way around a site, how long they spend on certain pages, where they navigate to next, etc.

With this in mind, optimising for NavBoost must also involve enhancing user satisfaction by adding content that quickly meets user intent and maintains interest by providing a unique take on things that stands out from every other page.

A strong introduction that keeps the user from clicking back to the Search Engine Results Pages (SERPs) in just a few seconds is imperative to show Google that your content is useful. Strong calls to action, a visually appealing design and plenty of interactivity will all help the cause, as will subheadings that introduce the answers to common user queries.

It also means creating enticing meta descriptions and titles that improve the click through rate from the SERPs. This will also reduce bounce rate and increase dwell time, letting Google know that the content is relevant and valuable.

Finally, we have for a long time known the importance of a well-designed website for user experience. But there is a clear case for making this an ongoing concern.

In other words, instead of designing and launching a website and leaving it as it is, make an effort to keep on improving it. Do this based on user journey feedback and analysis over time, and by keeping things aligned with the latest Google updates.

User experience is hugely important according to the Google Search document leak. — We’ve always known that user experience is important. After all, why would we want our website users to have anything other than a positive experience?

6. Chrome data is being used to inform rankings

The leaked Google documents reveal that data collected from the Chrome browser has a role to play in search rankings. Despite Google having previously said on more than one occasion that it does not use Chrome data for rankings, it is clear now that it is actually doing so in order to refine and improve its algorithms.

The documents reveal that Chrome tracks user interactions such as clicks, time spent on a page, and browsing patterns. This data helps Google understand user preferences and behaviour, which informs its decisions on ranking pages based on what users are actually doing in the real world.

This information demonstrates the importance of focusing on enhancing user experience by increasing page load times, improving navigation, and upping the ante with content engagement. As we’ve already explored, engaging content is the name of the game.

7. Fresh content is key

Another key takeaway from the Google Search document leak was that content that is not regularly updated has the lowest storage priority for Google, and will be unlikely to appear in the search results for fresh queries.

It’s therefore vital to update content regularly. Enhancing with fresh information and unique opinions and adding new images and videos is the name of the game.

The leak has also uncovered that Google maintains a record of every version of a web page, in effect creating an internal ‘web archive’. However, only the past 20 versions of a page are used. So, by updating a page and allowing it to be crawled, it is possible to push out older versions.

It has also become clear that any pages on a website that aren’t topically relevant should be removed or blocked, as should poorly performing pages. Look to cull pages with low user metrics, and that have failed to attract backlinks.

Site-wide scores are continuously referred to across the leaked documents, so it is just as effective to delete the weakest pages as it is to optimise new pages.

Fresh content is important according to the Google Search document leak — Another key takeaway from the Google Search document leak was that content that is not regularly updated has the lowest storage priority for Google, and will be unlikely to appear in the search results for fresh queries.

The Google Search document leak – what next for SEO?

The Google Search document leak whipped up something of a frenzy in the SEO world.

But whilst there have been a number of revelations in terms of contradictions between what Google has been telling us, and what is actually the case around its SEO ranking factors, much of the actionable advice really is what we’ve already been doing.

This is basically because so much of it is to do with creating the ultimate user experience, with content that engages the audience and addresses their queries, a clear navigation that guides people seamlessly to where they need to be, and a general, all-round positive, interactive experience.

The good news is that creating a positive user experience is what we’ve always been striving for anyway.

Figment is a London SEO agency with a proven track record of getting businesses found in Search. To discuss how we can help you improve your online visibility, you are welcome to get in touch.

Blog, Content Marketing, Google Updates, Local SEO, News, SEO

Want to drive sustainable business growth?

Discover how we can make it easier for your ideal clients to find you online.

7 Things We Have Learned From the Google Search Document Leak

What is the Google Search document leak?