The word “bot” is a double-entendre. It refers both to the larval stage of the bot-fly, a rather gruesome internal parasite (warning, link not for the squeamish) and to a software application that runs automated scripts on the internet in highly repetitive fashion—also with parasitic effects. Internet bots can be programmed for good or evil. This post refers to the latter kind.
Use of Bots for Ad Fraud
The dominant model of compensation for digital advertising remains the CPM, or cost per thousand impressions. Impressions are simply ad views, that is, the successful delivery of a specific advertisement to a consumer’s web browser. (We adopt the fiction that it has been “viewed” so long as it appears for some period of time on the consumer’s screen.) There is robust debate on what an impression actually means with respect to non-static ads, such as videos, for which compensation may flow even if the consumer sees only a few seconds of the entire video. A publisher, such as a website, will offer advertisers a sliver of their online real-estate at a given CPM rate, and will thereafter be compensated by advertisers at that rate for how many impressions have been generated.
Online ad purchasing nearly always occurs through Ad Exchanges, with Demand Side Platforms (DSPs) managing the purchase of ad inventory across multiple exchanges on an automated basis and Supply Side Platforms (SSPs) managing the inventories of available publishers. Bidding occurs on an automated, real-time basis. Impressions are tracked and payments flow according to the transaction records and publisher tracking of impressions delivered.
Mischief can arise when non-human bots generate what appear to be valid impressions, thereby triggering payment, but those impressions are not actually viewed by humans. Bots can fool publishers as they are in many respects indistinguishable from legitimate traffic. Indeed, since bots often take over ordinary consumer desktops, the traffic they generate will emanate from addresses that are associated with legitimate consumers. Technical solutions to root out this fraud often focus on identifying suspicious patterns in traffic that signify the behavior of mass, automation rather than human traffic.
Mischief can also arise when bots infiltrate the publisher side by offering up to exchanges what appear to be legitimate publisher inventory but which is in fact spoofed. Such inventory may have legitimate-looking URLs, but will tack on to those URLs a series of randomly generated digits, letters or words that lead the visitor to a dead address with no content. Nevertheless, the advertiser will be fooled into paying for fake impressions on these non-existent domains, under the mistaken belief that it was legitimate inventory.
As we have previously reported, bots have been implicated in theft of digital ad spend to the tune of millions of dollars per day and up to $6.5 billion globally. This has caused justifiable alarm among the industry, which has rushed to restore confidence through a mix of high-tech and low tech solutions, such as ads.txt. Still, many in the industry have tended to view this kind of digital ad fraud as a “long-tail” problem, primarily affecting less valuable inventory, lower-end publishers, and less sophisticated advertisers.
Within the last week, however, Danish adtech firm AdForm published a White Paper throwing some cold water on this prevailing view. Its investigation turned up a menacing new bot, which it called “HyphBot”, that it estimates has been defrauding advertisers to the tune of about $500,000 per day. AdForm found that this particular bot is an equal opportunity criminal, stealing from premium publishers offering high-end inventory, as well as from lower-end sites. At its peak, HyphBot is estimated to have generated up to 1.5 billion requests per day and fake traffic on more than 34,000 different domains. Most troublingly, it evaded ordinary detection for many months: “HyphBot was built by smart people. Our analysis showed that full URLs tracked via our Ad Server match 99% of the time the URL that was offered in the bid request. This means that the browser actually ‘believes’ that it’s visiting these URLs.” Only through sophisticated forensic analysis and experimentation was AdForm finally able to uncover the extent of the fraud.
How many other HyphBots continue to operate? We don’t know, but advertisers and publishers had better plan now for a strategy to deal with the inevitable fallout when these kinds of things are uncovered, because these frauds are large and people are going to want their money back. See, e.g., Uber v. Fetch (Uber alleges that defendant Fetch Media “squandered tens of millions of dollars to purchase nonexistent, nonviewable and/or fraudulent advertising.”)
Bots are Bad for Democracy
In an open letter to the Chairman of the Federal Communications Commission, New York Attorney General Eric Schneiderman, decries the FCC’s refusal to respond to New York AG requests for information regarding the infiltration of bots to the FCC notice and comment process. The FCC has recently released for notice and comment and draft proposal to withdraw its classification of cable ISPs as common carriers under Title 2 of the FCC Act, commonly known as Net Neutrality. Schneiderman’s letter alleges that hundreds of thousands of fake, automated comments—almost certainly from bots—had flooded the FCC’s inbox in the leadup to the proposed rule. These comments, which hijacked the identities of real, ordinary citizens, drowned out legitimate commentary submitted by other to the agency.
While mass-mailing of scripted postcards and e-mails to Congress and agencies has occurred for many years, the use of automated bots for this purpose takes this to an extreme level and threatens to subvert the normal feedback loop that characterizes healthy democracy.
Twitter is also suffering from an infestation of politically motivated bots. It has adopted technical tools that attempt to root out this problem by temporarily banning accounts that are suddenly followed by bots, but this method ironically may also shut down researchers who are investigating the problem and therefore attract an automated bot army that responds to any of their Twitter commentary. Since Twitter’s ad revenue, and the amount of compensation received by famous influencers operating on the platform, depend on the legitimacy of traffic, this is a problem on which Twitter is focusing great attention.
Bots will be the technical story of the decade. Their creators are often creative and subversive. However, until our systems evolve to block or account for them, they will be vulnerable to fraud. Our economy and government depend on clear and uncorrupted signals and information. At the moment, those signals are being disrupted. The clock is ticking.