Pre-bid Bot Filtration with Programmatic Advertising

Bot Traffic in Digital Advertising: How Ad Tech Vendors Serve Fortune 500 and Government Ads to Bots

A comprehensive analysis by Adalytics uncovered a troubling reality in digital advertising: despite claims of sophisticated bot detection and filtration, major ad tech vendors are serving ads from Fortune 500 companies and government agencies to declared bots operating out of data centers. The investigation analyzed petabytes of web traffic data from three different bot sources over seven years, revealing that thousands of brands had their ads delivered to bots despite employing “bot avoidance” technology from verification vendors.

The research found that vendors who publicly claim to filter bot traffic pre-bid—including Google, Trade Desk, Microsoft Xandr, Index Exchange, Pubmatic, and many others—were consistently serving ads to bots, including some that operate from known data center IPs and openly declare themselves as non-human via their user agent headers.

Particularly concerning was the discovery that US government agencies like the Department of Homeland Security, US Navy, US Army, Department of Veterans Affairs, Centers for Disease Control and Prevention, and healthcare.gov were among those whose ads were served to bots. The research also showed that Integral Ad Science (IAS) labeled declared bots as valid human traffic 17% of the time, and non-declared bots as human 77% of the time in the observed sample.

Adalytics Study

Over the course of the last year, Adalytics conducted several research reports and worked with various public sector and Fortune 500 advertisers to analyze their marketing data. Several media buyers reported paying millions annually for bot avoidance technology, yet were still being charged for ad impressions served to openly declared bots, including search engine crawlers and even classification bots operated by the very verification companies they paid to prevent bot traffic.

These observations, combined with evidence of US government ads being served to bots, prompted Adalytics to conduct this exploratory research to better understand how ad tech vendors allow digital ads to be served to bots in data centers.

Background: Bots and Bot Traffic

What Are Bots?

According to Cloudflare, “Bot traffic describes any non-human traffic to a website or an app.” Not all bot traffic is malicious—some bots perform essential services like search engine indexing or digital assistant functions. However, even benign unauthorized bots can disrupt site analytics and generate click fraud.

HUMAN Security, a bot detection company, distinguishes between human traffic (people visiting sites to view products) and bot traffic (automated software visiting sites). Bot traffic can distort analytics, inflate costs, slow site performance, and contaminate data used for business decisions.

Prevalence of Bot Traffic

Multiple industry sources estimate significant bot presence online:

Cloudflare: “over 40% of all Internet traffic is comprised of bot traffic”
Audit Bureau of Circulations (ABC UK): “Up to 40% of web traffic is invalid”
Imperva (2024): “almost 50% of internet traffic comes from non-human sources” with bad bots comprising “nearly one-third of all traffic”
Akamai Technologies (2024): “bots compose 42% of overall web traffic, and 65% of these bots are malicious”

Financial Impact

The World Federation of Advertisers estimates ad fraud will “exceed $50bn globally by 2025,” making it “second only to the drugs trade as a source of income for organized crime.” According to the Association of National Advertisers (ANA), ad fraud costs the marketing industry approximately “$51 million per day” with losses projected to reach “$100 billion annually by 2023.”

Invalid Traffic Classifications

The Media Rating Council (MRC) defines “Invalid Traffic” (IVT) as traffic that fails to meet quality criteria or represent legitimate traffic. It’s divided into two categories:

General Invalid Traffic (GIVT): Routine filtration through lists or standardized parameter checks, including:
- Known invalid data-center traffic
- Declared bots, spiders, and crawlers
- Non-browser user-agent headers
- Pre-fetch or pre-rendered traffic
Sophisticated Invalid Traffic (SIVT): More difficult to detect situations requiring advanced analytics:
- Automated browsing from dedicated devices
- Bots masquerading as legitimate users
- Invalid proxy traffic

Industry Standards and Guidelines

Several industry bodies have established standards for detecting and filtering bot traffic:

The IAB Tech Lab Spiders and Bots List provides reference files to identify known bots
The OpenRTB Protocol outlines responsibilities for exchanges and bidders to reject non-human traffic
The Trustworthy Accountability Group (TAG) Certified Against Fraud program requires participants to filter both GIVT and SIVT
The Media Rating Council (MRC) requires “filtration of invalid data-center traffic originating from IPs associated to the three largest known hosting entities: Amazon AWS, Google and Microsoft”

Bot Detection Vendor Claims

Several vendors have received MRC accreditations for bot detection and prevention, including HUMAN Security (formerly White Ops), Integral Ad Science (IAS), and DoubleVerify.

HUMAN Security claims to verify “more than ten trillion digital interactions per week” with a “multilayered detection methodology” delivering responses “in 10 milliseconds or less before a bid is made.”

Many ad tech platforms have publicly announced partnerships with HUMAN Security, including:

Trade Desk claimed to “block non-human impressions at the front door” with “no level of fraud that is acceptable”
Index Exchange stated it allows buyers to purchase “with confidence that its supply chain is protected against invalid traffic before a bid request is ever sent”
Microsoft Xandr announced its platform “protects before a bid is even made”
Google claimed integration with HUMAN “serves as an extra safety check for our invalid traffic defenses”

Research Methodology

Adalytics used three different bot traffic sources for their study:

HTTP Archive: A non-profit that crawls millions of URLs monthly, operating from Google Cloud data centers with a declared bot user agent that’s been on the IAB Tech Lab Spiders and Bots list since 2013.
Anonymous Web Crawler: A vendor that crawls approximately seven million websites monthly from known data center IPs but doesn’t declare itself as a bot via its user agent.
URLScan.io: A security tool that scans suspicious URLs, sometimes declaring itself as a normal browser rather than a bot, operating from data center IPs.

This methodology was designed to use datasets where the “ground truth” of bot traffic was certain, eliminating ambiguity about whether traffic was human or non-human.

Adalytics also examined how IAS’s publisher services pixel classified bot traffic by analyzing the fr parameter in responses from IAS’s endpoint, where fr=true indicates bot traffic and fr=false indicates human traffic.

Through digital forensics of ad source code, Adalytics analyzed which supply-side platforms were involved in transactions and extracted information about bot avoidance services charged to advertisers. For instance, many US Navy ads contained base64-encoded references to “charge-allDoubleVerifyBotAvoidance.”

Key Research Findings

IAS Bot Classification Failures

IAS’s publisher services pixel, which helps publishers optimize ad delivery, frequently misclassified bots as human traffic:

URLScan.io’s bot was labeled as valid human traffic (fr=false) 77% of the time
The HTTP Archive bot, despite being on the IAB’s bot list since 2013, was labeled as human traffic 16% of the time
In some cases, IAS inconsistently labeled the same bot visiting the same page as both human and non-human

Publishers Serving Ads to Declared Bots

Many publishers using IAS and DoubleVerify’s publisher optimization tools still served ads to bots, including:

Wall Street Journal, Washington Post, Reuters, CNN
Weather.com, Fandom, Forbes
Condé Nast publications like Wired and The New Yorker

For example, on April 23, 2023, when URLScan.io’s bot visited TripAdvisor.com, IAS correctly labeled it as invalid traffic (fr=true), yet an Air New Zealand ad was still served by Trade Desk and Google—with the ad’s source code referencing “charge-allIntegralSuspiciousActivity.”

Ad Tech Vendors Serving Ads to Bots

Vendors claiming to partner with HUMAN Security to prevent ads from being served to bots were frequently observed serving ads to bots, including declared bots from known data centers:

Google DV360: Served the highest number of ads to HTTP Archive’s declared bot, approximately 15 times more than Trade Desk.
Trade Desk: Served ads to bots through various supply-side platforms:
- Trade Desk Open Path (direct publisher integration)
- Microsoft Xandr
- Index Exchange
- Pubmatic
- Magnite
- And other SSPs
Other vendors observed serving ads to bots included Yahoo DSP, Amazon DSP, Comcast Freewheel Beeswax, Epsilon, AdTheorent, and Basis Technologies.

Notably, some demand-side platforms like Basis Technologies effectively filtered declared bot traffic, with their Platform Operations Director Chris Coupland stating: “Basis strives to remove all invalid traffic (IVT) as per MRC guidelines… We maintain blocklists of known data center IP addresses… and actively enforce the IAB’s Spiders and Bots list.”

Brands Whose Ads Were Served to Bots

US Government Agencies

US Navy, Army, Air Force
Department of Homeland Security (TSA)
Department of Veterans Affairs
Centers for Disease Control and Prevention
US Postal Service
Healthcare.gov (served hundreds of thousands of times to HTTP Archive’s bot)
NYPD

Fortune 500 Companies

Procter & Gamble, Unilever, Kenvue (J&J Consumer)
JPMorgan Chase, Bank of America, American Express
Microsoft, IBM, HP
Coca-Cola, Hershey’s, Diageo
T-Mobile, Visa, MasterCard

Many of these ads included references to bot avoidance services in their source code.

Bot Avoidance Services Failing

The research found numerous examples of ads containing references to bot avoidance technology still being served to bots:

DoubleVerify Bot Avoidance: Ads for US Navy, New York state government, IBM, Singaporean Government, Visa, Novo Nordisk (Ozempic), and Pfizer included “charge-allDoubleVerifyBotAvoidance” references yet were served to bots.
IAS Suspicious Activity: Brands like Starbucks, JPMorgan Chase, NYSERDA (NY government), Coca-Cola, USPS, and Microsoft had ads with “charge-allIntegralSuspiciousActivity” references served to bots.
DoubleVerify Scibids AI: Progressive Insurance and Ontario Lottery (Canadian government) ads using this AI optimization technology still reached bots. Even ads for Scibids itself were served to bots.

YouTube TrueView Ads Served to Bots

Google’s TrueView video ads, which should be skippable and audible according to Google’s policies, were observed being served to bots in muted, auto-playing formats on Google Video Partner sites. Examples included ads for Senator Mike Lee and the Timberlyne Group.

Google Serving Government Ads to Its Own Data Center Bots

Google Ad Manager was observed serving millions of ad impressions to bots, including declared bots operating out of Google’s own data centers. Healthcare.gov ads were served hundreds of thousands of times to HTTP Archive’s declared bot operating from Google Cloud between 2022 and 2024.

Publishers Successfully Blocking Bot Ad Delivery

Some publishers consistently prevented ads from being served to declared bots:

Reuters and Wall Street Journal blocked users declaring themselves as bots
Arena Group properties (Parade, Autoblog, TheStreet, etc.) blocked bot access
Bloomberg implemented challenges for declared bots

In contrast, sites like washingtonpost.com, nytimes.com, weather.com, and fandom.com served ads to declared bots.

Arena Group’s representatives commented: “We believe it’s inappropriate to serve ads to bots… Serving ads to bots is a betrayal of [advertiser] trust… Dealing with bots and other types of fraud is the responsibility of every company in the advertising supply chain.”

Discussion and Implications

The study reveals a disturbing contradiction between industry claims and practices. Despite public statements about bot filtration, many ad tech vendors—including those with MRC accreditations and TAG certifications—consistently serve ads to bots, even obviously declared ones.

As US Senators Mark Warner and Chuck Schumer noted in a 2018 letter to the FTC, there appears to be “willful blindness to fraudulent activity in the online ad market.” Industry expert Mikko Kotila told the Financial Times that middlemen “often turn a blind eye to fake traffic” because “doing so is in their financial interest.

The Association of National Advertisers aptly observed that “a false sense of security enables fraud to thrive.” This research suggests that despite substantial investment in fraud prevention technology, basic bot filtration measures are failing.

For advertisers, these findings raise critical questions:

Why pay for bot avoidance technology that doesn’t prevent ads from reaching declared bots?
How much ad spend is wasted on non-human traffic?
Why are government agencies’ ads being served to bots?

The World Federation of Advertisers recommends that “brands need to develop in-house expertise” and “demand full transparency of investment,” while the ANA advises advertisers to “refuse payment on non-human traffic in media contracts.”

Caveats and Limitations

This observational study has several limitations:

It makes no assertions about causality, intent, or quantitative impact
It doesn’t recommend excluding specific vendors or publishers
It doesn’t assign fault to any specific party in the complex digital media supply chain
It’s based on client-side forensics which may yield false positives

Conclusion

The digital advertising ecosystem has created sophisticated detection and filtration systems that supposedly prevent ads from being served to bots. Yet this research reveals that these systems frequently fail to catch even the most obvious bot traffic—declared bots operating from known data centers.

As Shailin Dhar pointed out, “Advertisers, why do we spend our efforts chasing ‘super sophisticated botnets’ operated by the world’s ‘most devious cybercriminals,’ when we haven’t stopped basic data-center/server-farms from eating up ad budgets?”

The findings suggest that despite industry claims, bot traffic remains a significant problem in digital advertising—affecting not just Fortune 500 companies but also government agencies spending taxpayer dollars. Perhaps most troublingly, it appears that some of the very companies tasked with preventing this problem are failing at their core mission.

Advertisers would be wise to demand greater transparency, conduct independent verification, and consider the example of publishers like Arena Group who take concrete steps to prevent serving ads to bots. As Arena’s representatives noted, “allowing ads to be served to bot traffic could enable additional short-term revenue, but not without significant downsides,” including “decreased value for our advertisers.”

Until the industry addresses these fundamental gaps in bot detection and prevention, advertisers will continue to waste millions on non-human impressions while verification vendors provide what the ANA calls “a false sense of security.”

Start ranking higher by fixing what matters most: our comprehensive audit resolves the essential SEO issues affecting 92% of websites, from improper header tags to missing structured data.

Fix the core SEO issues holding back 92% of websites: SEO Design Chicago's comprehensive audit identifies and resolves critical mobile usability problems, broken internal links, and crawlability issues that are silently killing your rankings.

Transform your online presence with SEO Design Chicago's industry-leading website audit, analyzing over 200 ranking factors to dominate search results.

Join the thousands of SEO Design Chicago clients who increased organic traffic within 6 months, starting with our comprehensive audit that analyzes 85+ critical ranking metrics

Experience the most comprehensive website audit in America with SEO Design Chicago's expert analysis of 200+ ranking factors that unlock your site's true potential.

Marketing