AI LLM Bots? Don’t Get Distracted by the 'AI' Part

By Illia Bromberg and Christine Ferrusi Ross

Whenever something captures our interest we tend to think of it as completely different from everything that came before, but in reality most innovations build on past knowledge. Therefore, you may be underestimating what you already know. Bots used to train artificial intelligence models are a good example of this phenomenon. We often get questions about how to block them since they’re so “new," and customers are worried about potential adverse impacts of letting the bots have so much of their content. These questions imply that AI bots are totally new and need to be handled differently when in reality companies can and do manage these bots effectively.

Managing AI bots is particularly relevant to retailers, as bots can scrape content to train chatbots and large language models (LLMs). Those bots, and other web crawlers, continuously ping sites looking for new information and then immediately scrape it. This can cause site performance degradation, the retailer’s product information being listed in places the retailer didn’t authorize, inventory hoarding of hot items, and other harmful consequences.

There are two points to remember when considering how to manage these bots:

They tend to identify themselves — legitimate bots used in training AI such as chatbots and LLMs share their identity with bot management companies so they can be included in a known bot directory. Once they’re included in the directory they can easily be managed as known bots.
Bots scraping data to be used in AI are just that — scraper bots. They scrape data to train a LLM engine. Scrapers are an understood problem with solutions in bot management.

Let’s start with known bots. ChatGPT, for example, advertises a clearly defined user agent, facilitating easy identification and blocking if desired. But it gets more interesting than that. It’s important to think about the bot’s purpose, not its tooling. It may be an AI bot because it's being used by AI, but the bot’s purpose is more critical to categorization. Is the AI bot scraping for news aggregation? Enterprise data aggregation? Media search? Those are all pre-existing categories and AI bots performing those functions are classified there.

While it might seem ideal to just have a category called “AI bots” in a bot directory so you can just block all of them, that approach can have negative, unintended consequences. For example, prominent AI chatbots like Google's Gemini and Microsoft Bing’s chatbot don’t identify as a unique category of bot. Instead, these bots utilize their parent’s existing search engine crawlers for data collection and are categorized as search engine bots. One could block these easily because they’re known but attempting to block them poses the threat of adversely affecting a company's SEO standings.

You may still want to block some AI bots within a category, but not the whole category. A good bot management strategy will allow you to take action on one specific bot within a category. For example, you can allow the coupon scraping bots of companies you work with and block those you don’t.

Now let’s look at AI bots that don’t clearly identify themselves. They aren’t “known” but we can still detect them and recognize them as scrapers. Even more common than unidentified bots, we see threat actors trying to impersonate known bots. Impersonation can be detected as well.

The below image illustrates how a threat actor tries to impersonate the user-agent for the Claude/Anthropic AI bot.

Source: Akamai Technologies

Good bot management should involve the ability to detect unknown scraper bots as well as impersonators of known bots. This means that any unwanted scrapers can be blocked by default.

All of this is to say, don’t let the letters “AI” in front of the word bot make you think you have to do something completely different from your existing bot management. You should always look for what’s new and different in bot tooling, but don’t neglect what you already know.

Ilia Bromberg is a member of the product management team at Akamai for bot manager.

Christine Ferrusi Ross leads product marketing for Akamai’s abuse and fraud protection portfolio.

0 Comments

View Comments

Illia Bromberg Author's page

Ilia Bromberg is a member of the product management team at Akamai for bot manager. Prior to becoming a PM, he spent more than 20 years in various technical customer-facing roles, architecting and building enterprise software for customers all around the world.

Christine Ferrusi Ross Author's page

Christine Ferrusi Ross leads product marketing for Akamai’s abuse & fraud protection portfolio. She’s passionate about helping companies turn the tables on attackers and ensuring trust online. Prior to Akamai, she worked with blockchain and security startups. She also spent many years as an analyst helping organizations buy and manage emerging technologies and services.