How We Run a Technical GEO Setup for a B2B SaaS Platform

    A step-by-step walkthrough of a technical GEO setup for a SaaS platform whose site works for humans but is half-invisible to AI crawlers: how we audit crawler access, fix structured data and rendering, and verify in server logs and repeated sampling that answer engines can actually read the site.

    Methodology walkthrough

    This page shows, step by step, how we run this type of engagement. Where figures appear, they illustrate the mechanics - client results are published only with written permission and supporting data.

    Focus
    AI readability
    Crawl, schema, structure
    Program
    Technical Setup
    First checkpoint
    4-8 weeks
    Directional signals reviewed

    The Typical Challenge

    A B2B SaaS platform has a polished marketing site and solid documentation, and human visitors have no complaints. But the pages that matter render most of their content client-side, the robots.txt was last touched years ago and quietly denies several AI crawlers, structured data is missing or invalid, and the company is named three different ways across the web. When buyers ask ChatGPT or Perplexity about the category, assistants either skip the site entirely or describe the product from stale third-party sources - because the retrieval systems behind those answers never get clean access to the pages that would correct them.

    Our Approach

    1. AI-Crawler Access Audit

    Access comes first - if AI crawlers cannot fetch a page, nothing else on it matters. The first one to two weeks are spent establishing, with log evidence, exactly which crawlers reach which pages.

    • Pull 30-90 days of server and CDN logs and filter for AI user agents: GPTBot, OAI-SearchBot, PerplexityBot, ClaudeBot, Google-Extended, CCBot, and peers - recording what each fetches and what gets denied
    • Review robots.txt line by line: which AI crawlers are allowed or disallowed, and whether each choice is deliberate - opting out of training crawls (Google-Extended, CCBot) and allowing retrieval crawlers (OAI-SearchBot, PerplexityBot) are separate decisions that many default configs conflate
    • Check CDN, WAF, and bot-management rules that silently reject AI user agents even where robots.txt allows them - a common gap that never shows up in a robots.txt review alone
    • Specify llms.txt (a curated, plain-markdown map of priority pages for retrieval systems) and llms-full.txt (the expanded full-text companion), so systems that use them get a clean, current view of the site

    2. Structured Data & Schema Markup

    With access confirmed, we make the site machine-legible. The deliverable is per-template JSON-LD specifications the client's developers can paste and adapt, not a generic best-practices memo.

    • Baseline audit of existing markup across every template, validated with Google's Rich Results Test and the Schema.org validator, with errors and warnings logged per template
    • Organization schema with legal name, logo, founding details, and sameAs links; Product or Service schema on offering pages; FAQPage where real questions are answered; Article on blog and docs content; BreadcrumbList sitewide
    • A small set of correct, validated types on the right templates - sprawling markup that fails validation is worse than none, because it teaches parsers to distrust the site

    3. Entity Consistency

    AI systems reconcile mentions across the web into one entity. If the company is "Acme", "Acme Platform", and "Acme.io" in different places, that reconciliation gets unreliable - and so do the answers.

    • Settle one canonical name and one canonical description, then audit for variants across the site, LinkedIn, Crunchbase, GitHub, review platforms, and directories
    • Point sameAs links in the Organization schema at the authoritative profiles, so machines can connect the site to its verified presences
    • Build or fix a real about page that states plainly what the company is, who it serves, and the founding facts - the page assistants most often quote when describing a brand

    4. Crawlability & Rendering

    Most AI crawlers do not execute JavaScript. A page that looks complete in a browser can be nearly empty to the systems that feed answer engines, so we test what a non-rendering crawler actually receives.

    • Fetch priority pages with JavaScript disabled and as raw HTTP requests, then compare against the rendered version - any copy that only exists after hydration is invisible to most AI retrieval
    • Recommend moving priority content to server-side rendering or static generation where the stack allows; at minimum, core product copy belongs in the initial HTML
    • Canonical hygiene: one canonical URL per page, consistent trailing-slash and parameter handling, and redirect chains flattened to single hops
    • Sitemap freshness: truthful lastmod dates, every priority page present, stale and redirected URLs removed - crawlers use sitemaps to budget attention, and a stale one wastes it

    5. Answer-Ready Content Structure

    Once crawlers can fetch and parse the pages, the remaining question is whether the content is structured so an assistant can lift an accurate answer from it without reading the whole page.

    • Extractable definitions: a direct one-to-two sentence answer immediately under each question-style heading, before any elaboration
    • FAQ sections built from questions buyers actually ask assistants, matched to the FAQPage schema from step 2
    • Comparison tables with consistent columns and honest rows - tabular data is among the easiest content for retrieval systems to quote correctly
    • Self-contained sections whose headings mirror real queries, so a passage still makes sense when quoted without the surrounding page

    What We Work Toward

    Technical work is more verifiable than most marketing work, but AI answers remain non-deterministic - so we manage toward directional movement on observable signals rather than promised placements. On an engagement like this, the signals we want to see move:

    AI crawler fetch evidence

    GPTBot, PerplexityBot, ClaudeBot, and peers appearing in server logs fetching the priority pages - with denials trending toward zero. The most direct proof that the access work landed.

    Structured-data validity coverage

    The share of priority templates carrying valid, warning-free schema in validation tooling - tracked as a simple coverage checklist that trends toward complete.

    Assistant citations of key pages

    The site's own product and docs pages appearing as cited sources in answers to a fixed panel of tracked prompts - sampled repeatedly and reported with variance notes, not single screenshots.

    Branded search interest

    Growth in branded query impressions in Google Search Console - the most reliable downstream proxy when buyers see the brand in an AI answer and then search for it.

    Key Principles

    1. Access before optimization - schema and content structure only matter once crawlers can actually fetch the pages; log evidence beats assumptions every time.
    2. Deliberate crawler policy - allowing retrieval crawlers and opting out of training crawls are separate decisions; robots.txt should encode an actual policy, not inherited defaults.
    3. Valid and specific over broad and broken - a small set of correct schema types on the right templates outperforms sprawling markup that fails validation.
    4. Directional measurement - crawler logs, validation coverage, and repeated prompt sampling instead of one-off screenshots; no guarantees, only observable movement.

    How We Measure

    • Data sources: Server and CDN logs (AI crawler user agents), Google Search Console, Schema validation tooling, Repeated ChatGPT / Perplexity / Claude prompt sampling
    • Timeframe: Weekly log and sampling review; first checkpoint at 4-8 weeks
    • Metric definition: Crawler fetch evidence = requests from AI crawler user agents to priority URLs, observed in server and CDN logs. Schema validity coverage = share of priority templates passing validation without errors. AI citations = the site's pages appearing as cited sources in assistant responses to a fixed panel of tracked prompts, observed through repeated sampling. See our methodology page for detailed definitions.

    Validation & Evidence Standards

    How Results Get Validated on a Real Engagement

    On a live engagement, every reported metric is cross-checked across multiple data sources. We combine platform analytics, third-party tools, and observational methods to confirm directional trends.

    Validation tools we use:

    • Server and CDN log analysis filtered to AI crawler user agents
    • Google Search Console (crawl stats, index coverage, branded query tracking)
    • Google Rich Results Test and the Schema.org validator for markup checks
    • Repeated prompt sampling across ChatGPT, Perplexity, and Claude

    Cross-validation methods:

    • Crawler user agents checked against published IP ranges where available, since user agents can be spoofed
    • Rendering findings confirmed with JavaScript disabled and via raw HTTP fetches, not just browser previews
    • Citation observations confirmed through repeated sampling across days and model versions, not single runs

    About This Walkthrough

    This walkthrough shows exactly how we run this type of engagement. Where figures appear, they illustrate the mechanics. We publish client numbers only with written permission and supporting data exports - transparency about method over dressed-up numbers.

    Measurement Limitations

    AI outputs are non-deterministic and vary by prompt wording, model version, and time. Our measurements are proxy-based and observational, not precise counts. Results should be interpreted as directional indicators rather than absolute guarantees. See our methodology page for detailed measurement definitions.

    Replication Prompts

    These are the kinds of prompts we track on an engagement like this. Try them (or your own category prompts) yourself - AI responses vary by model, wording, and time, so treat any single run as directional:

    1. What are the best B2B SaaS platforms for workflow automation?
    2. Compare the leading workflow automation platforms for mid-size companies
    3. Which sources describe what this platform does and who it is for?

    Not Sure What AI Crawlers See on Your Site?

    Start with an AI Visibility Risk Audit, or go straight to Technical Setup if you already know the foundation needs work.

    We Value Your Privacy

    We use cookies to enhance your browsing experience and analyze our traffic. Learn more