How it works

Content for humans. Redacted for AI bots.

WARD lets a website serve the same URL to everyone — but physically remove protected content before it reaches an AI bot. Not a request. Not a header the bot can ignore. The content is simply absent from the response.

robots.txt asks. WARD enforces.

robots.txt

A polite note at the door: “please don't.” The content is still in the response. A bot that ignores the rule gets everything. Compliance is on trust.

WARD

The protected content is stripped from the HTML before it is sent. A bot can't ignore what it never receives. Enforcement happens on the server, not on the bot's good behaviour.

The mechanism

Three layers

Detection

Is this visitor an AI bot?

WARD identifies the requester by its User-Agent against a maintained registry of known AI crawlers (GPTBot, ClaudeBot, Google-Extended, CCBot and others). Humans and ordinary search crawlers pass through untouched.

Policy

What is this site's rule for AI?

The site declares its rules in a machine-readable file at /.well-known/ward.json. The policy is public — anyone, including the AI company, can read exactly what is and isn't allowed.

{
  "version": "1.0",
  "default": {
    "training": false,
    "summarization": true,
    "attribution": "required"
  }
}

Enforcement

Remove the protected content before delivery.

A server-side middleware parses the page and replaces any block marked with data-ward="redact" with a transparent placeholder. The bot receives valid HTML — just without the protected content.

One request, end to end

Same page, same URL. The response is assembled per visitor.

Request→Detect→Redact→Response

What a human receives

<address data-ward="redact"
         data-ward-reason="pii">
  Max Mustermann
  Musterstraße 1, 12345 Berlin
  info@example.de
</address>

What an AI bot receives

<!-- Content redacted
     per WARD policy
     Reason: pii -->

The German Impressum (§5 TMG) must show a real name and address to the public. WARD keeps it visible to people while removing it from the response sent to a training crawler.

In practice

Three steps to enforce

Publish your rules at /.well-known/ward.json.
Mark sensitive blocks in your HTML with data-ward="redact".
Add the middleware. It detects, redacts, and logs every bot hit.

npm install ward-protocol

The standard — ward.json + data-ward markup — is MIT-licensed and open. GitHub: ward-protocol →

Common questions

Can't a bot just fake its User-Agent?

It can — and then it is operating without the declared identity the EU AI Act (Art. 53) expects from AI providers. WARD's policy is public and machine-readable, so a crawler that spoofs its identity to bypass a stated opt-out is documenting its own non-compliance. Behavioural fingerprinting raises the cost of spoofing further.

How is this different from Cloudflare's AI controls?

WARD is an open standard, not a single vendor's product. It works on any stack, and it controls access at the level of individual content blocks — not just whole domains. The rules travel with the site, readable by anyone.

Isn't serving different content to bots cloaking?

No. Cloaking deceives search engines for ranking. WARD leaves search crawlers untouched, applies only to AI bots, and the policy is published openly at a well-known URL. Nothing is hidden — the rule is declared.

Where does the revenue come from?

The standard stays free. Website owners pay for monitoring and compliance evidence (this dashboard). Later, WARD-Pay lets publishers charge bots for access via privacy-preserving micropayments (GNU Taler) instead of redacting.

See what bots take from your site

Start monitoring in minutes, or read the integration guide first.

Start Monitoring Read the Docs