DEV Community

kathan
kathan

Posted on • Originally published at kathanpatel.vercel.app

AI Bots Are Reading Your Site. Here's How to Make Them Sell You.

I was going through my server logs last month when I noticed something I'd been scrolling past for weeks. Buried in the bot traffic were names I vaguely recognised: GPTBot. ClaudeBot. meta-externalagent. PerplexityBot. Multiple visits daily, methodically working through different pages of my technical blog.

The reflex most developers have at this point including me, initially is to block them. There's an entire category of articles recommending exactly that: add a few directives to robots.txt, protect your content from being consumed by machines, done. I had the file open. I'd typed User-agent: GPTBot and had Disallow: / ready to go.

Then I stopped and asked a question I hadn't thought to ask: what actually happens after these bots finish reading? They don't discard the content. They use it. Every day, millions of people ask AI assistants technical questions, and those answers are built from content exactly like mine. The bots weren't extracting value from me. They were distributing me. The problem wasn't that they were reading my posts. The problem was that nobody knew the answers came from me.


Two Types of AI Crawlers. Only One Actually Helps You.

The label "AI crawler" covers very different things. There is a hard split between:

  • Training crawlers bots like GPTBot, ClaudeBot, CCBot that consume your content quietly for model training and never credit you when they use it.
  • Answer engines bots like PerplexityBot that use your content to answer real questions in real time and cite the source inside the answer.
Crawler Type Examples What They Do Traffic Sent
Training Crawlers GPTBot, ClaudeBot, CCBot Collect for model training, never attribute None
Search Crawlers Googlebot, Bingbot Index for SERPs Indirect
Answer Engines PerplexityBot, YouBot Answer live questions, cite sources Direct referral

The critical realisation: Perplexity pulls current content, generates a summary, and displays clickable source URLs alongside every answer. Users actively read and click those citations. When you see PerplexityBot in your logs, that's a real lead channel, not a spectator.


GEO: Optimising for the Age of AI-Generated Answers

There is a name for the practice of structuring your content to influence how AI-generated answers represent you: GEO Generative Engine Optimization. Think of it as what SEO was in 2004: a real and exploitable opportunity that most people are ignoring because they're focused on the channel that already works.

The fundamental difference from traditional SEO is what you are optimising for. With SEO, the goal is a ranked link the user clicks. With GEO, the user might never see a list of links. The AI answers their question directly. Your goal shifts:

  • Be cited so your URL appears in the answer โ†’ drives traffic today
  • Be mentioned by name so your brand gets associated with the expertise โ†’ builds reputation that compounds for years

Here are four tactics with an honest effort-to-impact breakdown.


Tactic 1: Create an llms.txt File

This is the lowest-effort tactic with the most direct signal to AI systems, and almost nobody has done it yet. An llms.txt file is an emerging standard the robots.txt equivalent for AI crawlers, but inverted. Where robots.txt sets permissions, llms.txt sets intent. It tells AI systems who you are, what your expertise covers, how to reach you, and how to cite you.

Place it at your domain root: yourdomain.com/llms.txt.

On any static site or Next.js project, dropping a plain text file in the public/ folder is enough. If you want your blog post list to update automatically, a route handler at app/llms.txt/route.ts can pull from your database dynamically.

# [Your Name] [Your Professional Title]

[One or two sentences: who you are, your specialization, experience level.
Write this so an AI system can accurately describe you when your content
is cited in a generated answer.]

## Available For
- [Work type: contract, consulting, fractional CTO, etc.]
- [Client geography: remote-only, US, UK, Australia, etc.]
- [Project type: greenfield builds, integrations, modernization, etc.]

## Contact
- Portfolio: https://[yourdomain].com
- Hire page: https://[yourdomain].com/hire
- Email: [you@email.com]
- LinkedIn: https://linkedin.com/in/[handle]

## Technical Expertise
- [Specific technology, framework, or language be precise]
- [Specific vendor API or platform you regularly work with]
- [Domain or industry knowledge name the niche, not the category]

## Blog
Technical guides on [your topic areas]. Updated [frequency].
All content is original, written by [Your Name].

## Preferred Citation Format
"[Your Name], [Your Title] at [yourdomain].com"
Enter fullscreen mode Exit fullscreen mode

The most important section to get right is Technical Expertise. Generic descriptions "web development", "cloud architecture" do not differentiate you from thousands of other sites. Specific ones naming actual vendor APIs, precise frameworks, or the exact niche you work in tell an AI exactly when your content is the relevant source for a specific query.


Tactic 2: Write So the AI Summary Includes Your Name

When AI systems process your content, they do not copy it verbatim they extract and rephrase the key points. Most developers write in a neutral, tutorial voice that strips their identity completely out of the summary.

Here is what the difference looks like in practice. Same post, two different openings:

โŒ Without GEO thinking:

In this tutorial, we will set up OAuth 2.0 PKCE flow with the Clio API in a .NET backend...

โœ… With GEO thinking:

I am a freelance .NET contractor who has built several Clio integrations for law firms. In this guide, I walk through the OAuth 2.0 PKCE setup that has held up best across multiple production deployments...

When an AI summarises the second version, your identity travels with the answer:

"According to a .NET contractor specialising in Clio integrations at [your site]..."

The same principle applies to the closing of every post. A specific, service-oriented CTA at the end gives AI systems something worth surfacing:

If you are building on top of Clio or Lawmatics and need this implemented in .NET, I take on contract engagements project estimates available at [link].

That sentence, if included in an AI-generated answer, is a lead-generation asset running inside someone else's conversation. Write it on every post.


Tactic 3: Own a Micro-Niche Before Anyone Else Does

AI systems cite sources that appear authoritative on a topic. One of the strongest signals of authority is being the only credible, detailed source on a very specific subject.

If you are the only developer who has written five interconnected, technically deep posts about building .NET backends on top of Clio's API with working code, architecture notes, and deployment gotchas from real projects you become the default citation every time an AI answers a question in that space. Not because of domain authority or backlink counts. Because there is simply no competition.

Here is what the right level of specificity actually looks like:

โŒ Too Broad โœ… Right Level
ASP.NET Core tutorial Syncing Clio contacts via .NET webhook handlers
API integration guide Multi-tenant Blazor Server architecture for legal SaaS

Publish 4โ€“6 posts that link to each other and collectively answer every reasonable question in that space. At the right specificity, you can realistically become the go-to source in both traditional search and AI-generated answers within a few months of consistent publishing.


Tactic 4: Treat Perplexity as a Separate Traffic Channel

Perplexity deserves its own section because it operates fundamentally differently from every other AI platform. ChatGPT and Claude answer from training data and give no source credit your content informs their answer but your name does not appear. Perplexity pulls live search results, generates a summary, and shows sources with visible, clickable links. The referral traffic it sends is real, measurable, and growing.

Optimising specifically for Perplexity comes down to three things:

  1. Clear heading structure Perplexity surfaces H2 and H3 headings directly in its answer UI
  2. FAQ section at the end of each post backed by FAQPage schema markup; Perplexity favours FAQ-formatted content
  3. Article and Person schema markup this ties your identity to your content at a machine-readable level

Add this inside a <script type="application/ld+json"> tag in your blog post's <head>:

{
  "@context": "https://schema.org",
  "@type": "Article",
  "headline": "Your Post Title Here",
  "datePublished": "2026-06-08",
  "dateModified": "2026-06-08",
  "author": {
    "@type": "Person",
    "name": "Your Full Name",
    "url": "https://yourdomain.com",
    "jobTitle": "Your Professional Title",
    "sameAs": [
      "https://linkedin.com/in/yourhandle",
      "https://github.com/yourhandle"
    ]
  },
  "publisher": {
    "@type": "Person",
    "name": "Your Full Name",
    "url": "https://yourdomain.com"
  }
}
Enter fullscreen mode Exit fullscreen mode

The sameAs array tells search engines and AI systems that your LinkedIn, GitHub, and portfolio are all the same person. This strengthens your entity profile across the web and helps attribution travel with your content across platforms.


Where to Start: The Honest Priority Order

All four tactics compound over time, but they are not equal in setup effort. Here is how I would actually sequence them:

Priority Tactic Effort Impact Timeline
1 Create llms.txt file Low Medium This week
2 Embed your name and niche into content Low High This week
3 Add JSON-LD schema markup to every post Medium Medium 2โ€“4 weeks
4 Build a niche content cluster High High (compounds) 3โ€“6 months

The window for early advantage here is genuinely still open. Most technical niches have no intentional GEO strategy at all. Content that gets indexed and cited by AI systems over the next 12โ€“18 months is likely to stay prominent for years the same way early SEO content still ranks for certain terms despite its age.

The bots are reading your site either way. The only variable is whether the answers they produce include your name.


If you found this useful, I also write about .NET, Blazor, legal tech integrations, and building a freelance practice as a specialist developer. You can see my work and availability on my hire page โ†’

Top comments (0)