Custom Log API

The Custom Log API lets you send normalized web request logs to Ansehn directly over HTTPS from any source — your web server, CDN, edge worker, serverless platform, or a small log-shipping script. Unlike the AWS CloudFront Integration, which is tied to a specific provider, the Custom Log API accepts a simple JSON payload, so you can stream AI crawler activity into Ansehn no matter where your logs live.

Once connected, Ansehn automatically detects AI crawler traffic (GPTBot, OAI-SearchBot, ClaudeBot, PerplexityBot, Gemini, and more) and surfaces it in your Server Logs dashboard and Our Pages analytics — the same views used by every other ingestion method.

Which method should I use?

Method	Best for	Setup effort
Manual Upload	One-time analysis, quick checks, any web server	Low — upload a `.log`/`.txt` file from your dashboard
AWS CloudFront Integration	Continuous monitoring of CloudFront distributions	Medium — one-time AWS setup, then fully automated
Custom Log API (this page)	Continuous, automated monitoring from any platform	Medium — point a log drain or script at our endpoint

How it works

Your platform (web server, CDN, edge worker, script)
        │
        │  Batched request logs (JSON over HTTPS)
        ▼
Ansehn Custom Log API  (POST /v1/logs/custom)
        │
        │  API key auth · AI crawler detection · validation
        ▼
Your Server Logs & Our Pages Analytics

You collect request logs on your side, batch them into JSON, and POST them to Ansehn using an API key you generate in your dashboard. Ansehn validates each record, detects AI crawler traffic by User-Agent, stores only the recognized AI crawler rows, and returns a transparent summary of exactly what was received, stored, and dropped.

Prerequisites

An Ansehn account with an active project for the domain you want to monitor.
An API key with the logs:ingest scope (see Step 1).
The ability to send an HTTPS POST request from your platform (any language or tool that can make HTTP requests).

Quick start

Generate a logs:ingest API key in your Ansehn dashboard (Step 1 below).
Collect your request logs and shape them into the JSON format below.
POST a batch to https://ingest.ansehn.com/v1/logs/custom with your API key and an Idempotency-Key.
Read the JSON response to confirm how many AI crawler rows were stored.
Open your Server Logs dashboard to see Daily Visits, Top Pages, and Top Crawlers populate.

Step 1 — Generate an API key in Ansehn

API key generation is available in your dashboard under Project Settings → API Keys.

Navigate to Projects → [Your Project] → Settings → API Keys
Click + New API Key
Give the key a descriptive name (e.g., Production Log Shipper)
Under Scopes, select logs:ingest
Under Project, select the project for the domain you want to monitor
Click Generate and copy the key immediately — it is only shown once

Your key will look like: ank_a1b2c3d4e5f6...

The key is scoped to a single project. Ansehn derives the project from the key itself, so you never send a project ID in the request body — logs are always attributed to the project the key belongs to.

API reference

Endpoint

POST https://ingest.ansehn.com/v1/logs/custom

Authentication

Send your API key using either header:

x-ansehn-api-key: ank_your_key

Authorization: Bearer ank_your_key

Required headers

Header	Value
`Content-Type`	`application/json`
`Idempotency-Key`	A stable, unique ID for this batch (see below)

Optional:

Header	Value
`Content-Encoding`	`gzip` (recommended for larger batches)

Idempotency

The Idempotency-Key header is required. It protects you from accidental double-counting when a network retry resends the same batch.

Same key + same body → the batch is recognized as a duplicate and not ingested again. You get 200 OK with "status": "duplicate".
Same key + different body → rejected with 409 Conflict. Each unique batch must use its own key.

Use a stable, unique key per batch — never a random value regenerated on retry, and never the same key for different data. A good key encodes the source and the exact log window, for example: logs-prod-edge-001-2026-06-08T12:00:00Z-shard-a. Avoid keys like Date.now() that change on every retry.

Request body

You can send either a wrapped object:

{
  "logs": [
    {
      "timestamp": "2026-06-08T12:00:00Z",
      "method": "GET",
      "host": "example.com",
      "path": "/blog/article",
      "status_code": 200,
      "ip": "203.0.113.10",
      "user_agent": "Mozilla/5.0 ... GPTBot/1.2",
      "referer": "https://example.com",
      "bytes_sent": 12345,
      "duration_ms": 120
    }
  ]
}

…or a bare array, for easier integration:

[
  {
    "timestamp": "2026-06-08T12:00:00Z",
    "method": "GET",
    "host": "example.com",
    "path": "/blog/article",
    "status_code": 200,
    "ip": "203.0.113.10",
    "user_agent": "Mozilla/5.0 ... GPTBot/1.2"
  }
]

Required fields

Field	Type	Notes
`timestamp`	ISO 8601 string, epoch seconds, or epoch milliseconds	Must fall within the accepted retention window (see below)
`method`	string	HTTP method; uppercased and normalized (unknown verbs stored as `OTHER`)
`host`	string	Host header, max 255 chars
`path`	string	Request path, max 2048 chars
`status_code`	integer	Between 100 and 599
`ip`	string	May be raw, anonymized, or hashed (see Privacy)
`user_agent`	string	Max 1024 chars; used for AI crawler detection

Optional fields

Field	Type	Maps to
`referer` / `referrer`	string	Referrer
`bytes_sent` / `response_size`	integer	Response size in bytes
`duration_ms`	number	Request duration in milliseconds (converted to seconds internally)
`time_taken`	number	Request duration in seconds

Compression

For larger batches, gzip your JSON body and set Content-Encoding: gzip. This keeps you comfortably under request size limits and is the recommended approach for high-volume sites.

Limits

Limit	Value
Records per request	1,000
Compressed request body	≤ 4 MB
Decompressed body	≤ 16 MB
Requests per minute	60 per API key
`user_agent` length	1,024 chars
`path` length	2,048 chars
`host` length	255 chars
`referrer` length	2,048 chars

For high-volume sites, send small batches frequently (e.g., every 30–60 seconds) rather than a few very large requests.

Accepted timestamp window

Server Logs data is retained for 90 days. To avoid storing records that would immediately expire or land in stale partitions, the API only accepts timestamps within:

now − 90 days  ≤  timestamp  ≤  now + 10 minutes

Records outside this window are rejected individually and reported in the response. Backfilling data older than 90 days is not supported.

What gets stored

Ansehn stores only recognized AI crawler rows. You can safely send your full, unfiltered request logs — non-AI-crawler traffic is detected and dropped before it is ever persisted, which keeps your storage footprint and privacy exposure low.

Every response tells you exactly what happened:

{
  "status": "accepted",
  "batch_id": "f1c2…",
  "request_id": "logs-prod-edge-001-2026-06-08T12:00:00Z-shard-a",
  "deduplicated": false,
  "received_records": 1000,
  "accepted_records": 998,
  "stored_ai_crawler_records": 42,
  "dropped_non_crawler_records": 956,
  "rejected_records": 2,
  "quarantined_records": 0,
  "errors": [
    {
      "index": 17,
      "field": "timestamp",
      "message": "timestamp is outside the accepted retention window"
    }
  ]
}

received_records — rows in your batch.
accepted_records — rows that passed validation.
stored_ai_crawler_records — AI crawler rows actually written to your analytics.
dropped_non_crawler_records — valid rows that were not AI crawlers (intentionally not stored).
rejected_records — malformed rows; see errors for details.

A genuine batch with no AI crawlers returns stored_ai_crawler_records: 0 with "status": "accepted". A duplicate retry returns "status": "duplicate" instead — the two are always distinguishable.

Response status codes

Condition	Status
Success (including zero AI crawler rows)	`200 OK`
Duplicate retry (same key + same body)	`200 OK` (`"status": "duplicate"`)
Missing or invalid API key	`401 Unauthorized`
API key missing the `logs:ingest` scope	`403 Forbidden`
Missing `Idempotency-Key` or invalid JSON	`400 Bad Request`
Idempotency key reused with a different body	`409 Conflict`
Too many records or body too large	`413 Payload Too Large`
Rate limit exceeded	`429 Too Many Requests`
No valid rows in the batch	`422 Unprocessable Entity`
Temporary ingest issue	`503 Service Unavailable` (safe to retry with the same `Idempotency-Key`)

Examples

cURL

curl https://ingest.ansehn.com/v1/logs/custom \
  -X POST \
  -H "Content-Type: application/json" \
  -H "x-ansehn-api-key: ank_your_key" \
  -H "Idempotency-Key: logs-prod-edge-001-2026-06-08T12:00:00Z-shard-a" \
  --data '{
    "logs": [
      {
        "timestamp": "2026-06-08T12:00:00Z",
        "method": "GET",
        "host": "example.com",
        "path": "/blog/ai-search",
        "status_code": 200,
        "ip": "203.0.113.10",
        "user_agent": "Mozilla/5.0; compatible; GPTBot/1.2; +https://openai.com/gptbot",
        "referer": "-",
        "bytes_sent": 18422,
        "duration_ms": 91
      }
    ]
  }'

Node.js

await fetch("https://ingest.ansehn.com/v1/logs/custom", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    "x-ansehn-api-key": process.env.ANSEHN_API_KEY,
    // Stable per batch — encode the source and exact log window.
    "Idempotency-Key": "logs-prod-edge-001-2026-06-08T12:00:00Z-shard-a",
  },
  body: JSON.stringify({
    logs: [
      {
        timestamp: "2026-06-08T12:00:00Z",
        method: "GET",
        host: "example.com",
        path: "/",
        status_code: 200,
        ip: "203.0.113.10",
        user_agent: "GPTBot/1.2",
      },
    ],
  }),
});

Python

import requests

response = requests.post(
    "https://ingest.ansehn.com/v1/logs/custom",
    headers={
        "Content-Type": "application/json",
        "x-ansehn-api-key": "ank_your_key",
        # Stable per batch — encode the source and exact log window.
        "Idempotency-Key": "logs-prod-edge-001-2026-06-08T12:00:00Z-shard-a",
    },
    json={
        "logs": [
            {
                "timestamp": "2026-06-08T12:00:00Z",
                "method": "GET",
                "host": "example.com",
                "path": "/",
                "status_code": 200,
                "ip": "203.0.113.10",
                "user_agent": "GPTBot/1.2",
            }
        ]
    },
)

print(response.json())

Privacy & IP addresses

The ip field accepts whatever you send — raw, anonymized (e.g., 203.0.113.xx), or hashed. Because IP addresses are personal data in many jurisdictions, we recommend anonymizing or hashing IPs before sending them if your use case does not require the raw value. AI crawler detection relies on the User-Agent, not the IP, so anonymized IPs do not reduce detection accuracy.

All data is scoped exclusively to your project and isolated from other Ansehn customers. Your API key is stored as a one-way hash and cannot be recovered after generation.

Supported AI crawlers

Ansehn detects a broad and growing set of AI crawlers from your User-Agent strings (case-insensitive), including:

OpenAI: GPTBot, OAI-SearchBot, ChatGPT-User
Anthropic: ClaudeBot, Claude-User, Claude-SearchBot
Perplexity: PerplexityBot, Perplexity-User
Google: Gemini, Google-Extended
Meta: meta-externalagent
…and more across providers such as ByteDance, Amazon, Apple, Common Crawl, Cohere, You.com, Mistral, and DuckDuckGo.

Found a new AI crawler we don't yet recognize? Let us know and we'll add support.

Verification

After sending your first batch:

Confirm the response shows stored_ai_crawler_records greater than 0 (your batch must contain recognized AI crawler User-Agents).
Open your Ansehn Server Logs dashboard — activity should appear under Daily Visits within a minute or two.
The connection status indicator next to the tab bar shows whether logs are flowing and when the last ingest occurred. Hover over it for diagnostics.

Troubleshooting

Symptom	Meaning	Fix
`200 OK` but `stored_ai_crawler_records` is 0	The batch contained no recognized AI crawler User-Agents	Confirm your logs include AI bot traffic; check the User-Agent strings
`401 Unauthorized`	Missing, invalid, or expired API key	Regenerate the key in Project Settings → API Keys and update your shipper
`403 Forbidden`	Key lacks the `logs:ingest` scope	Generate a key with the `logs:ingest` scope
`400 Bad Request`	Missing `Idempotency-Key`, or the body is not valid JSON	Add the header; validate your JSON
`409 Conflict`	An `Idempotency-Key` was reused with different content	Use a new, unique key for each distinct batch
`413 Payload Too Large`	Over 1,000 records or body too large	Send smaller batches; enable `Content-Encoding: gzip`
`429 Too Many Requests`	Over 60 requests/minute for the key	Reduce request frequency or batch more rows per request
`422 Unprocessable Entity`	No valid rows after validation	Check the `errors` array for per-row reasons (e.g., bad timestamps)
Rows missing from the dashboard	Records outside the 90-day retention window	Only send logs from the last 90 days

Frequently asked questions

Do I need to pre-filter my logs to AI crawlers only?

No. You can send your full request logs. Ansehn detects AI crawler traffic server-side and stores only those rows. Pre-filtering is optional and only reduces bandwidth.

Will this slow down my website?

No. Log shipping happens off the critical path on your side; your visitors are never affected.

What happens if a request fails or times out?

Retry with the same Idempotency-Key. If the original batch was already stored, the retry is recognized as a duplicate and not counted twice.

Can multiple sources feed the same project?

Yes. Use one API key per source if you like, and give each batch its own Idempotency-Key. All data flows into the same project.

How is this different from the CloudFront integration?

The CloudFront integration is purpose-built for AWS CloudFront and requires no code. The Custom Log API is provider-agnostic — use it for any web server, CDN, edge platform, or custom log pipeline.

Pro tip: Compare crawler activity here with your insights in Citation Analytics to understand how crawling correlates with actual citations in AI answers.

Need help?

If you run into any issues during setup or have questions about the integration, reach out to us at [email protected] and mention "Custom Log API" in the subject line. We're happy to help you wire up your log pipeline.

Was this helpful?