A set of weighing scales that represent TW-BERT, weighing up wordpieces vs n-grams

TW-BERT: Good for users, good for SEO?

June 6, 2024 Digital channels Oban International

Dave Cousin

Written by Dave Cousin, Head of Organic

Dave has been with Oban International since 2021 as Head of Organic. With 19 years’ of experience in SEO, client and agency side, Dave knows that SEO never stands still. An expert in International SEO, Dave has worked with some of the world’s biggest, most global brands.

In August 2023, Google researchers unveiled ‘End-to-End Query Term Weighting,’ introducing TW-BERT, a model designed to enhance search accuracy by integrating seamlessly with existing ranking systems. Though Google hasn’t officially confirmed using TW-BERT, this framework could significantly advance search intent understanding and query expansion. Read on to discover more.

First, there was BERT

In October 2018, Google introduced BERT, which officially rolled out in late 2019. BERT, or Bidirectional Encoder Representations from Transformers, is a natural language processing (NLP) application that brought AI to search at an unprecedented level.

BERT’s role in Google’s algorithm is to enhance the understanding of the meaning or intent behind search queries. It uses a deep learning algorithm, trained on a vast corpus of text, to understand the relationships between text elements for each language, based on understanding the relationship between wordpieces.

Essentially, BERT helps Google’s search engine to comprehend human language more accurately and naturally. This allows it to deliver better search results by identifying the most important parts of a query, especially for longer and more natural language searches.

However, despite this, Google attracted criticism

BERT initially delivered on its promise, with both Google and users noting an improvement in the quality of search results. However, in recent years, there has been growing criticism regarding the quality of Google’s results. Many users feel that the search results have deteriorated, expressing increasing dissatisfaction.

Are Google’s results really worse?

This isn’t just one-off examples and edge cases from individual users – mainstream publications have noted that the quality of Google’s results is getting worse. Recent research, published in 2024 but with slightly older data, found that Google’s results have deteriorated with more spam and irrelevant results, but also that all major search engines, including Bing, have the same problem.

This has been frustrating for SEOs who follow best practice and adhere to Google’s guidelines, only to see spam results appearing first. It isn’t BERT’s fault though, it’s just that those creating spammy content, adapted and arguably became better at it and started using AI. But this means Google found itself with a problem to solve.

Hence the introduction of TW-BERT. TW-BERT is a simpler model compared to BERT, designed to address a significant flaw in BERT. It solves this issue so effectively that many Google researchers are likely wondering why they didn’t think of it first.

What is term weighting?

Term weighting allows the elements of a search term to be prioritised based on their importance. Google already employs a form of term weighting through BERT, but this hasn’t been applied at the individual word or word series level until now.

Before Google implemented term weighting, the frequency of a term was more important, and pages had a better chance of ranking highly if they simply matched all the words in a user’s query in order. This led to keyword stuffing, causing even ethical SEOs to focus heavily on keywords.

Let’s be honest, it’s a habit we’re still struggling to break, and we may be overly focused on keywords even though they still hold some value. If you want to find out whether TW-BERT finally renders keywords obsolete, keep reading.

Weighting wordpieces vs weighting words and n-grams

BERT was built to be fast (in order to maintain the speed of returning search results) and efficient in terms of computing power (so Google can still make a decent average margin on every search users make). This means it has to rely on a relatively small database, in large language model terms at least.

BERT has, we are told, around 30,000 wordpieces per language, not every wordpiece has a direct relationship with every other wordpiece but will at least have an indirect relationship meaning it can understand 30,000^30,000 relationships. BERT’s database, or to be more precise high-dimensional neural network, would have to be significantly bigger if instead of wordpieces it used words and n-grams.

What are wordpieces in the context of BERT & TW-BERT?

All human languages use wordpieces, with some languages using them more extensively, particularly those with many affixes and suffixes. Wordpieces are elements of words that, while not always standalone words, have a basic meaning and can be combined in various ways to form new words.

For example, the wordpiece “news” appears in “newspaper,” “newsagent,” and “newsman,” and can also stand alone as a word. Other examples include suffixes and affixes like “-ly” in “happily,” “thankfully,” and “calmly,” or “dis-” in “disquiet,” “disqualified,” and “disappointed.” These wordpieces follow specific usage patterns and contribute to the meanings of the words they form.

What are n-grams in the context of BERT & TW-BERT?

Humans don’t think in wordpieces though, and when BERT weights the important wordpieces in a search term it can easily get things wrong by missing context.

But humans understand millions of n-grams, n-grams are words and series of words, and we know that long compound words can mean more than their constituent parts and a combination of words can have even more meaning.

We understand common phrases, sayings, importance of word order, which BERT doesn’t and can’t because it would need 10,000,000^10,000,000 relationships (direct or indirect) in its neural network to compete with the human brain. The architecture and hardware to support this simply don’t exist right now, not to mention the cost and latency if they did.

TW-BERT as a statistical bridge

The researchers behind TW-BERT, in their paper “End-to-End Query Term Weighting,” describe TW-BERT as a statistical bridge. What it isn’t is simply a more powerful neural network. Instead, it builds on BERT’s existing capabilities.

A simplified version of the process is that TW-BERT identifies different n-grams (including 1, 2, and 3-word n-grams) within a search query and uses the wordpiece-level weightings BERT has already established. It applies these weightings to the n-grams based on their wordpieces.

For example, in the search term “Bose noise cancelling headphones with microphone,” if BERT gives higher weight to wordpieces like “phone,” “noise,” and “cancel,” then TW-BERT will highly weight n-grams such as “noise cancelling headphones,” “noise cancelling,” “cancelling headphones,” “headphones,” and “microphone,” roughly in that order.

These weights are crucial when Google’s scoring mechanism evaluates the relevance of each candidate document in its index. A page mentioning “noise cancelling headphones” will rank higher than one mentioning only “headphones” or “Bose headphones.”

Additionally, BERT expands the search terms to include synonymous and contextually relevant terms. Therefore, terms like “noise-isolating headphones” might also receive high weighting.

Does TW-BERT work?

TW-BERT’s solution might seem overly simplistic and one might assume it would be limited in understanding context compared to human language comprehension. However, it is able to indirectly understands language more like a human, as borne out by the results of the paper “End-to-End Query Term Weighting”.

While TW-BERT isn’t perfect and can still miss context or make mistakes, it performs better than BERT alone and even surpasses larger deep learning models. According to the researchers’ paper, TW-BERT excels in ‘zero-shot’ retrieval, where it handles terms it hasn’t been trained on. This is because wordpieces allow weighting of even unfamiliar terms based on important components they contain.

TW-BERT also outperforms in handling longer search terms, up to 10 words, consistently delivering the right documents in the correct order more often than BERT or other deep learning models.

This improvement is significant for SEOs, as it means Google is getting better at understanding long-tail terms and natural language, areas where search engines have historically struggled and only made incremental progress.

Is TW-BERT part of Google’s algorithm?

Yes, we believe so, although Google hasn’t confirmed it yet. Given Google’s investment in creating a solution that improves result quality and is easy to implement, it’s likely they’re using it.

There is also some evidence, in Google’s results, that it has rolled out. The ‘End-to-End Query Term Weighting’ paper was published in August 2023, but academic publications often have delays. Many attribute fluctuations in Google’s SERPs from July 2023 to the rollout of TW-BERT.

We need an updated study to confirm if the quality of results has improved. Until then, it’s reasonable to assume TW-BERT has been implemented, and Google may still be experimenting with its impact, balancing the traditional ‘wordpiece’ weightings from BERT with the newer ‘n-gram’ weightings of TW-BERT.

What does TW-BERT mean for SEOs?

The advice on how to optimise for TW-BERT is nothing that decent SEOs haven’t been doing already but maybe makes these things more important:

How do we optimise for TW-BERT? We should prioritise optimising for user intent. As Google improves at understanding user needs, it will favour results that match intent and meaning over exact terms. Focus on providing a great experience with semantically and contextually relevant content, rather than content overly optimised for specific keywords.
So are keywords dead? Keywords remain relevant, and considering n-grams in search optimisation is valuable. However, it’s important to understand the intent behind keywords. When you find a keyword through research or Google Keyword Planner, especially for long-tail terms, remember there are many ways people might search for the same thing using different wording. Google won’t always show search volumes for these variations. For short, head terms, this is less of an issue since the keyword and the n-gram are usually the same.

Use of language in search

As Google improves its understanding of language through TW-BERT and other innovations, people will increasingly search as they speak. This will lead to more varied search queries, differences between regions and countries, and greater use of questions and detailed queries.

Additionally, the understanding and adoption of new concepts, technologies, products, and offerings vary across countries and languages, affecting opportunities and intent for the same terms. To succeed, we must focus on creating excellent content. Start with a keyword that has volume and clear intent but aim to capture traffic from numerous related search terms that match this intent, generating a significant volume of clicks.

Can we optimise for n-grams in search?

To some extent, we can anticipate varied search queries by identifying key n-grams that remain consistent, such as brands, technical terms, abbreviations, or proper nouns. Including these naturally in our content can improve our performance in Google’s scoring mechanism. However, it’s important to remember that Google incorporates many other ranking factors, so this approach should complement, not replace, other essential SEO tactics and techniques.

How can we keep up with Google’s understanding of intent?

To stay competitive in search against the backdrop of Google’s ever-improving AI capabilities like TW-BERT, we need to focus on creating content that matches user intent. Here’s how we can do it:

1. Monitor Google’s changing SERPs regularly, especially for key terms and when rankings drop. Google’s improved understanding of intent will mean that changes are seen in SERPs more immediately and will help us understand how user intent evolves over time.

2. Recognise that shifts in intent might mean our content needs updating. For example, people might have initially searched for “ChatGPT” to learn what it is, but now they might be looking for the latest news, features, or login information.

3. Analyse the current search results to see if there’s a mix of content. If no single result fully satisfies user intent, that’s an opportunity to create better content.

4. Examine AI-generated summaries to understand how Google’s AI interprets queries and aim to provide even more valuable answers, potentially becoming a key source.

TW-BERT enhances search quality on its own, but by continually improving our content to meet evolving user needs, we can achieve better rankings and enhance the overall search experience.

. . .

The international search landscape is evolving all the time. Oban can help you stay ahead – to find out how, please get in touch.

Oban International is the digital marketing agency specialising in international expansion.
Our LIME (Local In-Market Expert) Network provides up to date cultural input and insights from over 80 markets around the world, helping clients realise the best marketing opportunities and avoid the costliest mistakes.