Server Logs Analytics

Custom Log API

The Custom Log API lets you send normalized web request logs to Ansehn directly over HTTPS from any source — your web server, CDN, edge worker, serverless platform, or a small log-shipping script. Unlike the AWS CloudFront Integration, which is tied to a specific provider, the Custom Log API accepts a simple JSON payload, so you can stream AI crawler activity into Ansehn no matter where your logs live.

Once connected, Ansehn automatically detects AI crawler traffic (GPTBot, OAI-SearchBot, ClaudeBot, PerplexityBot, Gemini, and more) and surfaces it in your Server Logs dashboard and Our Pages analytics — the same views used by every other ingestion method.

Which method should I use?

Method

Best for

Setup effort

Manual Upload

One-time analysis, quick checks, any web server

Low — upload a .log/.txt file from your dashboard

AWS CloudFront Integration

Continuous monitoring of CloudFront distributions

Medium — one-time AWS setup, then fully automated

Custom Log API (this page)

Continuous, automated monitoring from any platform

Medium — point a log drain or script at our endpoint


How it works

Your platform (web server, CDN, edge worker, script)
        │
        │  Batched request logs (JSON over HTTPS)
        ▼
Ansehn Custom Log API  (POST /v1/logs/custom)
        │
        │  API key auth · AI crawler detection · validation
        ▼
Your Server Logs & Our Pages Analytics

You collect request logs on your side, batch them into JSON, and POST them to Ansehn using an API key you generate in your dashboard. Ansehn validates each record, detects AI crawler traffic by User-Agent, stores only the recognized AI crawler rows, and returns a transparent summary of exactly what was received, stored, and dropped.


Prerequisites

  • An Ansehn account with an active project for the domain you want to monitor.

  • An API key with the logs:ingest scope (see Step 1).

  • The ability to send an HTTPS POST request from your platform (any language or tool that can make HTTP requests).


Quick start

  1. Generate a logs:ingest API key in your Ansehn dashboard (Step 1 below).

  2. Collect your request logs and shape them into the JSON format below.

  3. POST a batch to https://ingest.ansehn.com/v1/logs/custom with your API key and an Idempotency-Key.

  4. Read the JSON response to confirm how many AI crawler rows were stored.

  5. Open your Server Logs dashboard to see Daily Visits, Top Pages, and Top Crawlers populate.

Step 1 — Generate an API key in Ansehn

API key generation is available in your dashboard under Project Settings → API Keys.

  1. Navigate to Projects → [Your Project] → Settings → API Keys

  2. Click + New API Key

  3. Give the key a descriptive name (e.g., Production Log Shipper)

  4. Under Scopes, select logs:ingest

  5. Under Project, select the project for the domain you want to monitor

  6. Click Generate and copy the key immediately — it is only shown once

Your key will look like: ank_a1b2c3d4e5f6...

The key is scoped to a single project. Ansehn derives the project from the key itself, so you never send a project ID in the request body — logs are always attributed to the project the key belongs to.


API reference

Endpoint

POST https://ingest.ansehn.com/v1/logs/custom

Authentication

Send your API key using either header:

x-ansehn-api-key: ank_your_key
Authorization: Bearer ank_your_key

Required headers

Header

Value

Content-Type

application/json

Idempotency-Key

A stable, unique ID for this batch (see below)

Optional:

Header

Value

Content-Encoding

gzip (recommended for larger batches)

Idempotency

The Idempotency-Key header is required. It protects you from accidental double-counting when a network retry resends the same batch.

  • Same key + same body → the batch is recognized as a duplicate and not ingested again. You get 200 OK with "status": "duplicate".

  • Same key + different body → rejected with 409 Conflict. Each unique batch must use its own key.

Use a stable, unique key per batch — never a random value regenerated on retry, and never the same key for different data. A good key encodes the source and the exact log window, for example: logs-prod-edge-001-2026-06-08T12:00:00Z-shard-a. Avoid keys like Date.now() that change on every retry.

Request body

You can send either a wrapped object:

{
  "logs": [
    {
      "timestamp": "2026-06-08T12:00:00Z",
      "method": "GET",
      "host": "example.com",
      "path": "/blog/article",
      "status_code": 200,
      "ip": "203.0.113.10",
      "user_agent": "Mozilla/5.0 ... GPTBot/1.2",
      "referer": "https://example.com",
      "bytes_sent": 12345,
      "duration_ms": 120
    }
  ]
}

…or a bare array, for easier integration:

[
  {
    "timestamp": "2026-06-08T12:00:00Z",
    "method": "GET",
    "host": "example.com",
    "path": "/blog/article",
    "status_code": 200,
    "ip": "203.0.113.10",
    "user_agent": "Mozilla/5.0 ... GPTBot/1.2"
  }
]

Required fields

Field

Type

Notes

timestamp

ISO 8601 string, epoch seconds, or epoch milliseconds

Must fall within the accepted retention window (see below)

method

string

HTTP method; uppercased and normalized (unknown verbs stored as OTHER)

host

string

Host header, max 255 chars

path

string

Request path, max 2048 chars

status_code

integer

Between 100 and 599

ip

string

May be raw, anonymized, or hashed (see Privacy)

user_agent

string

Max 1024 chars; used for AI crawler detection

Optional fields

Field

Type

Maps to

referer / referrer

string

Referrer

bytes_sent / response_size

integer

Response size in bytes

duration_ms

number

Request duration in milliseconds (converted to seconds internally)

time_taken

number

Request duration in seconds

Compression

For larger batches, gzip your JSON body and set Content-Encoding: gzip. This keeps you comfortably under request size limits and is the recommended approach for high-volume sites.

Limits

Limit

Value

Records per request

1,000

Compressed request body

≤ 4 MB

Decompressed body

≤ 16 MB

Requests per minute

60 per API key

user_agent length

1,024 chars

path length

2,048 chars

host length

255 chars

referrer length

2,048 chars

For high-volume sites, send small batches frequently (e.g., every 30–60 seconds) rather than a few very large requests.

Accepted timestamp window

Server Logs data is retained for 90 days. To avoid storing records that would immediately expire or land in stale partitions, the API only accepts timestamps within:

now − 90 days  ≤  timestamp  ≤  now + 10 minutes

Records outside this window are rejected individually and reported in the response. Backfilling data older than 90 days is not supported.


What gets stored

Ansehn stores only recognized AI crawler rows. You can safely send your full, unfiltered request logs — non-AI-crawler traffic is detected and dropped before it is ever persisted, which keeps your storage footprint and privacy exposure low.

Every response tells you exactly what happened:

{
  "status": "accepted",
  "batch_id": "f1c2…",
  "request_id": "logs-prod-edge-001-2026-06-08T12:00:00Z-shard-a",
  "deduplicated": false,
  "received_records": 1000,
  "accepted_records": 998,
  "stored_ai_crawler_records": 42,
  "dropped_non_crawler_records": 956,
  "rejected_records": 2,
  "quarantined_records": 0,
  "errors": [
    {
      "index": 17,
      "field": "timestamp",
      "message": "timestamp is outside the accepted retention window"
    }
  ]
}
  • received_records — rows in your batch.

  • accepted_records — rows that passed validation.

  • stored_ai_crawler_records — AI crawler rows actually written to your analytics.

  • dropped_non_crawler_records — valid rows that were not AI crawlers (intentionally not stored).

  • rejected_records — malformed rows; see errors for details.

A genuine batch with no AI crawlers returns stored_ai_crawler_records: 0 with "status": "accepted". A duplicate retry returns "status": "duplicate" instead — the two are always distinguishable.

Response status codes

Condition

Status

Success (including zero AI crawler rows)

200 OK

Duplicate retry (same key + same body)

200 OK ("status": "duplicate")

Missing or invalid API key

401 Unauthorized

API key missing the logs:ingest scope

403 Forbidden

Missing Idempotency-Key or invalid JSON

400 Bad Request

Idempotency key reused with a different body

409 Conflict

Too many records or body too large

413 Payload Too Large

Rate limit exceeded

429 Too Many Requests

No valid rows in the batch

422 Unprocessable Entity

Temporary ingest issue

503 Service Unavailable (safe to retry with the same Idempotency-Key)


Examples

cURL

curl https://ingest.ansehn.com/v1/logs/custom \
  -X POST \
  -H "Content-Type: application/json" \
  -H "x-ansehn-api-key: ank_your_key" \
  -H "Idempotency-Key: logs-prod-edge-001-2026-06-08T12:00:00Z-shard-a" \
  --data '{
    "logs": [
      {
        "timestamp": "2026-06-08T12:00:00Z",
        "method": "GET",
        "host": "example.com",
        "path": "/blog/ai-search",
        "status_code": 200,
        "ip": "203.0.113.10",
        "user_agent": "Mozilla/5.0; compatible; GPTBot/1.2; +https://openai.com/gptbot",
        "referer": "-",
        "bytes_sent": 18422,
        "duration_ms": 91
      }
    ]
  }'

Node.js

await fetch("https://ingest.ansehn.com/v1/logs/custom", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    "x-ansehn-api-key": process.env.ANSEHN_API_KEY,
    // Stable per batch — encode the source and exact log window.
    "Idempotency-Key": "logs-prod-edge-001-2026-06-08T12:00:00Z-shard-a",
  },
  body: JSON.stringify({
    logs: [
      {
        timestamp: "2026-06-08T12:00:00Z",
        method: "GET",
        host: "example.com",
        path: "/",
        status_code: 200,
        ip: "203.0.113.10",
        user_agent: "GPTBot/1.2",
      },
    ],
  }),
});

Python

import requests

response = requests.post(
    "https://ingest.ansehn.com/v1/logs/custom",
    headers={
        "Content-Type": "application/json",
        "x-ansehn-api-key": "ank_your_key",
        # Stable per batch — encode the source and exact log window.
        "Idempotency-Key": "logs-prod-edge-001-2026-06-08T12:00:00Z-shard-a",
    },
    json={
        "logs": [
            {
                "timestamp": "2026-06-08T12:00:00Z",
                "method": "GET",
                "host": "example.com",
                "path": "/",
                "status_code": 200,
                "ip": "203.0.113.10",
                "user_agent": "GPTBot/1.2",
            }
        ]
    },
)

print(response.json())

Privacy & IP addresses

The ip field accepts whatever you send — raw, anonymized (e.g., 203.0.113.xx), or hashed. Because IP addresses are personal data in many jurisdictions, we recommend anonymizing or hashing IPs before sending them if your use case does not require the raw value. AI crawler detection relies on the User-Agent, not the IP, so anonymized IPs do not reduce detection accuracy.

All data is scoped exclusively to your project and isolated from other Ansehn customers. Your API key is stored as a one-way hash and cannot be recovered after generation.


Supported AI crawlers

Ansehn detects a broad and growing set of AI crawlers from your User-Agent strings (case-insensitive), including:

  • OpenAI: GPTBot, OAI-SearchBot, ChatGPT-User

  • Anthropic: ClaudeBot, Claude-User, Claude-SearchBot

  • Perplexity: PerplexityBot, Perplexity-User

  • Google: Gemini, Google-Extended

  • Meta: meta-externalagent

  • …and more across providers such as ByteDance, Amazon, Apple, Common Crawl, Cohere, You.com, Mistral, and DuckDuckGo.

Found a new AI crawler we don't yet recognize? Let us know and we'll add support.


Verification

After sending your first batch:

  1. Confirm the response shows stored_ai_crawler_records greater than 0 (your batch must contain recognized AI crawler User-Agents).

  2. Open your Ansehn Server Logs dashboard — activity should appear under Daily Visits within a minute or two.

  3. The connection status indicator next to the tab bar shows whether logs are flowing and when the last ingest occurred. Hover over it for diagnostics.


Troubleshooting

Symptom

Meaning

Fix

200 OK but stored_ai_crawler_records is 0

The batch contained no recognized AI crawler User-Agents

Confirm your logs include AI bot traffic; check the User-Agent strings

401 Unauthorized

Missing, invalid, or expired API key

Regenerate the key in Project Settings → API Keys and update your shipper

403 Forbidden

Key lacks the logs:ingest scope

Generate a key with the logs:ingest scope

400 Bad Request

Missing Idempotency-Key, or the body is not valid JSON

Add the header; validate your JSON

409 Conflict

An Idempotency-Key was reused with different content

Use a new, unique key for each distinct batch

413 Payload Too Large

Over 1,000 records or body too large

Send smaller batches; enable Content-Encoding: gzip

429 Too Many Requests

Over 60 requests/minute for the key

Reduce request frequency or batch more rows per request

422 Unprocessable Entity

No valid rows after validation

Check the errors array for per-row reasons (e.g., bad timestamps)

Rows missing from the dashboard

Records outside the 90-day retention window

Only send logs from the last 90 days


Frequently asked questions

Do I need to pre-filter my logs to AI crawlers only?

No. You can send your full request logs. Ansehn detects AI crawler traffic server-side and stores only those rows. Pre-filtering is optional and only reduces bandwidth.

Will this slow down my website?

No. Log shipping happens off the critical path on your side; your visitors are never affected.

What happens if a request fails or times out?

Retry with the same Idempotency-Key. If the original batch was already stored, the retry is recognized as a duplicate and not counted twice.

Can multiple sources feed the same project?

Yes. Use one API key per source if you like, and give each batch its own Idempotency-Key. All data flows into the same project.

How is this different from the CloudFront integration?

The CloudFront integration is purpose-built for AWS CloudFront and requires no code. The Custom Log API is provider-agnostic — use it for any web server, CDN, edge platform, or custom log pipeline.

Pro tip: Compare crawler activity here with your insights in Citation Analytics to understand how crawling correlates with actual citations in AI answers.


Need help?

If you run into any issues during setup or have questions about the integration, reach out to us at [email protected] and mention "Custom Log API" in the subject line. We're happy to help you wire up your log pipeline.

Was this helpful?