Machine Learning and SEO Competitor Research

Many SEO professionals are learning Python and other scripting languages in 2021 to up their game. Another relevant trend in the SEO industry is machine learning. Machine learning, or ML, can help you with competitor research and more. We’ll tell you how to use machine learning to conduct competitor research and make your job easier. 

machine learning and SEO competitor research

Definition of Machine Learning

Machine learning is a branch of artificial intelligence (AI) and computer science that uses data and algorithms to mimic the way people learn. It works by exploring data and analyzing patterns. 

Why Machine Learning is Important for SEO Purposes

One thing many great SEO experts do is analyze the search engine results pages (SERPs) and their competitors to see what they are doing to get a high ranking. In the past, SEOs used spreadsheets to collect and analyze data from the SERPs, with various columns containing different data like number of links, number of words, etc. While this tactic does work, Excel is limited in what it can do, even when you know tricks like INDEX + MATCH and VLOOKUP. Plus, there are now more complicated factors that affect search ranking, like mobile usability, social media, page speed, schema markup, and more. 

Now, however, there are tools like Python/R that help you handle millions of rows of information at once. You can use machine learning on competitor data to learn: 

  • Which ranking factors account for the differences in rankings of websites
  • The winning benchmark 
  • How much a unit change in the factor is worth, in terms of ranking 

machine learning and SEO

ML Problem for Competitor Analysis

ML solves several different problems, like categorizing things, which is called classification, or predicting a continuous number, which is called regression. In order to perform competitor analysis, you will need a regression problem. That’s because the quality of a competitor’s SEO is denoted by its Google ranking, which is a continuous number. 

Outcome Metric

Now that we know the ML problem is regression, the outcome metric is rank. This is because rank isn’t affected by seasonality and because competitor rank is third-party data, which you can find using SEO tools and software, as opposed to user traffic and conversions, which is not available to you. 


Now that we know the outcome metric, now we need to determine the features, or independent variables. The data types for the features vary. For example, first paint measured in seconds would be a numeric. Sentiment with the categories positive, neutral, and negative would be a factor. You will want to cover as many ranking factors as possible, like technical, content, user experience, and more in order to conduct the most comprehensive research. 

How to Do the Math

Since rankings are numeric, you will want to explain the difference in rank numerically. You can do this with: 

rank ~ w_1*feature_1 + w_2* + … + w_n*

The tilde (~) means “explained by,” n is the nth feature, and w is the weighting of the feature. 

what is machine learning

How to Use Machine Learning to Discover Competitor Secrets

Now that you know how to do the math, you’re ready to see how machine learning can help you discover more data about your competition. We will assume that at this point you have gathered your SERPS data (“serpa_data”) and it has been joined, transformed, cleaned, and is ready for modeling. Your data will at least need to contain the Google ranking and feature data you want to test. Your columns might include:

  • Google_rank
  • Page_Sspeed
  • Flesch_kincaid_reading-Ease
  • Sentiment
  • Site_depth
  • Amp_version_available
  • Internal_page_rank
  • Referring_domains_count
  • Avg_domain_authority_backlinks
  • Title_keyword_string_distance

How to Train Your Machine Learning Model

In order to train your model, you will want to use XGBoost because it delivers better results than other ML models. If you want to try an alternative, you could trust RandomForest, Adaboost, or LightGBM for large datasets. 

Here is how you use Python code for XGBoost for your SERPS dataset. Keep in mind that this is a basic example. For a real client, you will want to try various model algorithms on a training data sample, evaluate, and then choose the best model. 

Import the Libraries

The first step is to import the libraries with this bit of code:

import xgboost as xgb 

import pandas as pd

serps_data = pd.read_csv(‘serps_data.csv’) 

Set the Model Variables

# your SERPs data with everything but the Google_rank column

serp_features = serps_data.drop(columns = [‘Google_rank’])

#your SERPs data with just the Google_rank column 

rank_actual = serps_data.Google_rank

Instantiate the Model

serps_model = xgb.XGBRegressor(objective=’reg:linear’, random_state=1231)

Fit the Model, rank_actual)

Generate the Model Predictions

rank_pred = serps_model.predict(serp_features)

Evaluate the Model Accuracy 

mse = mean_squared_error(rankactual, rank_pred)

how to use machine learning for SEO

What You Can Learn from the Data

Here are just a few things you can learn from your model: 

The Most Predictive Drivers of Rank 

The data will tell you the most influential SERP features or ranking factors in order of importance. Every market or industry will be different. 

How Much a Ranking Factor is Worth

You can also see how much rank will be delivered based on each factor. 

The Winning Benchmark for a Ranking Factor

The data will also tell you the winning benchmark for each ranking factor. For example, though there might be certain general SEO rules you follow for each ranking factor, the actual benchmark for each factor might vary depending on market or industry. 

You can use this data to help you optimize your content in order to beat your competition in the SERPs. 

How to Automate Your SEO Competitor Analysis with Machine Learning

While ML analysis of SEO competitor data is important, it is even more helpful when it is ongoing. A one-time machine learning analysis is just a moment in time for the SERPs, when the SERPs are always changing and evolving, as is Google’s algorithm. However, a continuous stream of data collections and analysis gives you a better overall view of what is really happening on the SERPs – and why – in your industry (or your clients’.) This is why it is key that you automate your SEO competitor analysis with machine learning. 

This is where SEO purpose-built data warehouse and dashboard systems come in. These systems take your data daily from your chosen SEO tools, combine the data, and use machine learning to share insights in a front-end application of your choice, like Google Data Studio.

In order to build your own automated system, you will need to deploy into a cloud infrastructure like Amazon Web Services or Google Cloud Platform in what is called an ETL, aka extract, transform, and load. Extracting refers to the daily calling of your SEO tool APIs. Transforming is the cleaning and analysis of your data. Loading is depositing the finished result into your data warehouse. This allows you to automate your data collection, analysis, and visualization in one place. 

machine learning

Machine Learning and SEO Competitor Research

It can be difficult to conduct SEO competitor research, but machine learning can help make it easier for you, especially if you automate the process. When you use machine learning on your competitors, you can learn what the key drivers are, identify winning benchmarks among them, and learn just how much lift in rank your optimizations can deliver. If you need help using machine learning for SEO competitor research, contact SEO Design Chicago today and our SEO experts can help you


  • What is machine learning?
  • How do I set up a ML model? 
  • What data do I need for an ML model on competitor research?
  • How does machine learning help my SEO competitor research?
  • How do I automate machine learning for SEO competitor research?

Contact Us Today!

Call Now