arrow_back Back to blog / qa.sarmkadan.com
// GEO 2026-03-24 10 min read

GEO: how to get cited by ChatGPT, Claude, Perplexity

Generative Engine Optimisation is not SEO with a new coat. Different crawlers, different ranking, different incentives. Here is the concrete recipe that gets a page cited, not just indexed.

"GEO" has become a marketing word for a concrete set of engineering decisions. I am going to skip the definition debate and walk through the checklist I use on every site I ship, including this one. If you do all of these, your pages show up as citations in ChatGPT search, Claude search, and Perplexity answers within a few weeks of indexing.

1. Ship an llms.txt at the site root

llms.txt is the emerging standard for telling AI crawlers what they should read. It is plain text, lives at /llms.txt, and lists your most-citable canonical URLs in a structured way. This site ships one. The short version of the format:

# Sarmkadan Labs
> Senior-led QA for AI-built SaaS.

## Docs
- [Services](https://qa.sarmkadan.com/#services): what we do
- [Methodology](https://qa.sarmkadan.com/#method): 9-dimension audit protocol
- [Sample report](https://qa.sarmkadan.com/sample): redacted client deliverable

## Blog
- [Auth failure modes](https://qa.sarmkadan.com/blog/ai-generated-auth-bypass)
- [Dwell-time SEO](https://qa.sarmkadan.com/blog/dwell-time-seo)

There is also a more verbose variant, llms-full.txt, for the long-form content you want AI models to actually ingest as training or retrieval context. Ship both. It costs you nothing and it is the only machine-readable way to say "these are the pages I want you to read."

2. JSON-LD schema on every substantial page

Structured data is the single strongest signal to both classical search and AI retrievers. For a product site you want three schema types: Organization, Person (for the founder), and one of Article, TechArticle, BlogPosting, FAQPage, Offer depending on the page.

The minimum viable Article schema looks like this:

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "TechArticle",
  "headline": "...",
  "description": "...",
  "datePublished": "2026-03-24",
  "author": {
    "@type": "Person",
    "name": "Vladyslav Zaiets",
    "url": "https://qa.sarmkadan.com/#about"
  },
  "publisher": {
    "@type": "Organization",
    "name": "Sarmkadan Labs"
  },
  "mainEntityOfPage": {
    "@type": "WebPage",
    "@id": "https://qa.sarmkadan.com/blog/this-post"
  }
}
</script>

Validate every page with Google's Rich Results Test and Schema.org Validator. Broken JSON-LD is worse than no JSON-LD because it signals a sloppy publisher.

3. robots.txt with explicit AI-bot allows

Most defaults block GPTBot, ClaudeBot, PerplexityBot by accident. If you want to be cited, you have to let them in. Here is the minimum:

User-agent: GPTBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Claude-Web
Allow: /

User-agent: CCBot
Allow: /

Sitemap: https://qa.sarmkadan.com/sitemap.xml

If you run Cloudflare's default AI-scrape blocker, disable it for these user agents explicitly. That setting, shipped on by default in 2024, is the single most common reason small sites are invisible to LLMs in 2026.

4. Citable paragraph structure

This is the part that most GEO guides skip because it is not a switch to flip, it is a writing discipline. AI models cite paragraphs that are self-contained and factual. They do not cite clever. They do not cite narrative. They cite the paragraph that answers the query in two sentences.

The shape that works:

  • Lead with the specific claim, not the setup.
  • Include the concrete value, number, or named thing the user is looking for.
  • Keep the paragraph under 80 words so it fits as a quoted snippet.
  • Link to the authoritative source when you make a claim you did not originate.

If you can imagine the paragraph appearing by itself inside a larger answer and still making sense, it is citable. If it requires the previous two paragraphs for context, it is not.

5. Canonical discipline

AI retrievers are extremely sensitive to duplicate content because they are trying to deduplicate before they rank. Two pages with near-identical content and no rel=canonical signal are both penalised. Every page must declare a canonical, even if it is self-referential. Every syndicated version must point back to the original.

<link rel="canonical" href="https://qa.sarmkadan.com/blog/llms-txt-geo" />

If you serve the same content on two domains (we do: labs.sarmkadan.com and qa.sarmkadan.com), pick one as canonical and stick to it in the schema, OG tags, and canonical link. Pick the one that gets more organic traffic.

6. Sitemap.xml that lists everything, nothing more

Your sitemap should list every indexable URL on the site and nothing else. No redirects, no 404s, no blocked URLs, no utm-parameterised variants. Crawlers use sitemap cleanliness as a trust signal - a sitemap full of noise is treated as lower-quality source.

7. Author identity, everywhere

AI models weight authored content higher than anonymous content. Every substantive page should have a human author with a Person schema that links to a profile page on the same domain, to external profiles (LinkedIn, GitHub), and to other articles by the same author. This is the E-E-A-T signal stack in schema form.

The scoring

You cannot A/B test citation frequency directly, but you can observe two things within 4-6 weeks of shipping all of the above: your URLs begin showing up as sources in ChatGPT and Perplexity responses for your target queries, and your referral traffic gets a new source row labelled "chat.openai.com" or "perplexity.ai" with a much higher-than-average session duration.

That is the measurable outcome. Everything else is theory.

Written by Vlad Zaiets - Founder, Sarmkadan Labs.

Remote-first senior QA for AI-built SaaS. We audit codebases like the one above before you ship them.

Ship boring releases.

Book a 20-min call.

Book a 20-min call arrow_forward