BERT: Google’s Newest Natural Language Processing

If you are like me, ever since you started using the internet you learned how to properly formulate a search query. Early on it was vital to focus solely on keywords and minimal variables that might throw off your search results. The difference between typing “best Chicago daycares” and “Which daycare in Chicago is the best for my child to attend?” could give drastically different results. Today a website with good SEO tactics can appear in both but there are still big differences in results depending on how you formulate your search query. With Google’s newest search algorithm announcement there might be a new way to merge these search tactics and improve overall search results.

Table of Contents

What is BERT?

You might have heard that Google recently announced an update to their search engine capabilities. In fact, their biggest search engine update in five years. So, what is this update and what makes it so important? Back in November 2018, Google announced their new open source, AI ranking algorithm; BERT. Bidirectional Encoder Representations from Transportation (BERT) is a new Natural Language Processing (NLP) technique in which Google will interpret search queries and generate data in order to improve search results. With BERT, Google plans on evaluating conversational search queries with more accuracy than before. This means that searching “Which daycare in Chicago is the best for my child to attend?” might actually start giving you the precision and results you expected. Throughout the last year Google has been working on BERT and as of October 2019 started implementing it in the US.

How does BERT Work?

When comprehending the mechanics behind BERT it’s important to have an understanding of what bidirectional means. Of all the elements that make up the BERT acronym bidirectional is the one that defines the process best. Bidirectional, meaning to go in two directions, refers to how the algorithm reads a search query. Google explains in a diagram the different methods in which a search query is read and how they differ from the BERT method. With a bidirectional approach each word or element of the query is read separately and compared with all of the other elements, regardless of if they come before or after that specific element or how far away from them, they are in the sequence of events. If we were to use the initial example of “Which daycare in Chicago is the best for my child to attend?” it would mean that the word “best” would be compared to all of the other worlds in this query, whether they come before or after.

The bidirectional method also includes minor conjunctions and prepositions such as “of” or “to”, something that has been mainly overlooked in previous query analysis algorithms. Major conjunctions such as “and” and “or” have been long been accounted for in search analytics, seeing as alternating one for the other can give you drastically different results. For example, searching “Chicago AND Philadelphia” vs “Chicago OR Philadelphia” will give you either the next Bears vs Eagles game or a list of pros and cons of moving to either city.

Google has realized that words like “of” or “to” can make similar differentiations and need to be accounted for as well. In Google’s announcement they use the example “2019 Brazil traveler to USA need a visa” they emphasize how important “to” is to the search query. Without acknowledging “to”, results would be less particular to the actual question being asked. Including “to” in the search process give the context that this search is being made by a Brazilian traveling to America and not an American traveling to Brazil. Accounting for this difference helps return more accurate results than simply omitting “to” from the search query.

If we were to take the initial example of “Which daycare in Chicago is the best for my child to attend?” we can see the importance of including “for” and “to” in order to get more accurate results. By including “for” and “to” in this search using the BERT system the results would be more focused on results about what kinds of children these daycares specialize in serving and when they might have availability. If you owned a daycare with a survey page asking about potential attendees needs, you would be more likely to show up in the search results. Likewise, if you had an upcoming open house or calendar of when new students were being accepted you would rank higher than a generic search about Chicago daycares.

Accounting for these minor contractions is just a small example of what the BERT system takes into account when determining what a user is searching for. The BERT system’s precision means that all elements of the search query will hold value in determining the results of a search by better understanding the context of the search without asking for further information. Overall the BERT update hopes to specialize search results in a way that is more accurate than with Google’s previous efforts.

What kinds of results are the BERT system providing?

Its great that Google sees room for improvement in the way that they analyze search queries but how well does the BERT system work? While, BERT has only been around for a year so far and existed mainly in the testing stage, the growth and progress of BERT has tracked along the way. Google has provided comparison data for BERT’s accuracy and efficiency through multiple different metrics. Firstly, they gave a comparison using the SQuAD (Stanford Question Answering Dataset) test. SQuAD is a reading comprehension test that asks questions sourced from Wikipedia articles where the answer is a portion of text, sourced from the corresponding passage, or purposely unanswerable. The SQuAD test has been used on multiple NLP programs as well as tested against human performance. The SQuAD test gives two results, EM (exact match) and F1 (accuracy). The first results were given in November 2018, when BERT was first announced, used SQuAD 1.1 BERT scored 87.4% EM and 93.2% F1. This is in comparison to the human performance scores of 82.3% EM and 91.2% F1.

Since November 2018 many more BERT tests have been run and the SQuAD test has been updated to SQuAD 2.0. The SQuAD 2.0 test expands on the comprehension of unanswerable questions and now evaluates if the system can acknowledge unanswerable questions and abstain from answering them. The BERT system has also grown so large that Google has needed to create a smaller version of BERT called ALBERT for running tests due to memory capabilities. While the ALBERT system has less parameters, it has been able to more efficiently participate in these NLP tests. The most recent results using BERT technology in the SQuAD 2.0 test are from September 2019 and yielded a result of 89.7% EM and 92.2% F1 compared to the human performance of 86.3% EM and 89.5% F1.

Another form of evaluation that BERT has been subjected to is GLUE (General Language Understanding Evaluation). GLUE is a project run by NYU with the goal of improving Natural Language Understanding (NLU) technology by testing its abilities, deciphering sentences and accuracy in comprehension. In February 2019 BERT reported a score of 80.5 compared to the human baseline performance of 87.1. More recently, a score of 89.4 was reported using the ALBERT system in September 2019. This is the second highest, current, GLUE score behind T5 Team Google with a score of 89.7; which was reported in November 2019.

The third metric that Google uses to report on BERT’s efficiency is MultiNLI (Multi-Genre Language Interference), another NYU project that measures sentence comprehension through matched and unmatched results. The baseline results for the MultiNLI test are 72.4% matched and 71.9% unmatched. This is in comparison to the BERT scores of 86.7% matched and 85.9% unmatched.

As you can see from the continued testing and reporting of the BERT system it is only getting better. All of these tests are used to report on the AI systems of other major corporations such as Microsoft and Facebook. BERT’s results always top the lists and if competitor’s scores start to catch up a new updated system test comes out to take the lead once again. With only a year of intense testing so far it’s hard to know if this is where the BERT system tops out but with the constant improvement to testing models and the BERT systems capabilities, whether alone or in combination with other Google products, its forecasted to continue improving in its accuracy and comprehension.

What is this update?

While Google has been working on BERT for a year already this is only the beginning. First off, BERT is only being implemented in the US and will only influence 1 in 10, English language search queries. In the grand scheme of Google’s reach this is a small starting point but they plan to extend BERT to other regions and languages as the system progresses. This announcement marks the start of a new era of search query methods for Google but don’t think you have to start changing your entire SEO strategy because of it.

How does the BERT update affect me?

Like any algorithm update, big changes can come with seemingly nothing to notice. It’s important to keep in mind that BERT is being rolled out in a small sample size at the moment and that it is a machine learning process that won’t make major changes overnight. The Idea of BERT is that it is open source and like all major algorithms, an ongoing, self-learning process, this means that improvements will be made incrementally.

If you are someone who has consistently focused on SEO and created quality content over the last few years, there really isn’t much to worry about when it comes to adapting for the BERT update. As long as you have been making thoughtful content that has been filling yours or your client’s customer needs you shouldn’t have much to worry about.

That being said, if you have a website that hasn’t been updated in a few years you definitely will need to start updating now more than ever. The goal of the BERT update is to provide users with better search results, in order to do so it needs more specified content. If you haven’t been making specific pages addressing your top search results associated with your company you’ve already fallen behind competitors and the BERT update will only continue to widen the gap. The best thing to do is to make sure consumer inquiries are covered on your site or related sources with detail and close consideration.

In Conclusion

While a lot has been covered here about the BERT update it should be noted that we are still in the early stages of how BERT will influence search results. Google has high hopes that BERT will revolutionize the way search queries are responded to. If this bidirectional method truly yields better results than the status quo Google provides, not only will it mean an advancement for Google but a need for competitors to catch up in order stay relevant. That being said, no matter how revolutionary or inadequate the BERT ends up being in the long run the fact that it is slowly being integrated into Google’s search algorithm means it won’t be a daunting change. While most of you will want to be ahead of the curve and prepared for the future its important to remember these changes are only a small piece of the SEO pie. Whether BERT is a trend or a mainstay the important thing should be focusing on answering consumer needs, creating content and building websites that enhance the customer’s experience. With good attention to detail, customers will find a reason to return to your site and sick around longer.

Local SEO Tips, Search Engine Optimization Tips, Website Development Tips

BERT: Google’s Newest Advancement in Natural Language Processing

What is BERT?

How does BERT Work?

What kinds of results are the BERT system providing?

What is this update?

How does the BERT update affect me?

In Conclusion

Contact Us Today!

RESOURCES

Useful Links