No. You can send your full request logs. Ansehn detects AI crawler traffic server-side and stores only those rows. Pre-filtering is optional and only reduces bandwidth.
Custom Log API
The Custom Log API lets you send normalized web request logs to Ansehn directly over HTTPS from any source — your web server, CDN, edge worker, serverless platform, or a small log-shipping script. Unlike the AWS CloudFront Integration, which is tied to a specific provider, the Custom Log API accepts a simple JSON payload, so you can stream AI crawler activity into Ansehn no matter where your logs live.
Once connected, Ansehn automatically detects AI crawler traffic (GPTBot, OAI-SearchBot, ClaudeBot, PerplexityBot, Gemini, and more) and surfaces it in your Server Logs dashboard and Our Pages analytics — the same views used by every other ingestion method.
Which method should I use?
Method | Best for | Setup effort |
|---|---|---|
One-time analysis, quick checks, any web server | Low — upload a | |
Continuous monitoring of CloudFront distributions | Medium — one-time AWS setup, then fully automated | |
Custom Log API (this page) | Continuous, automated monitoring from any platform | Medium — point a log drain or script at our endpoint |
How it works
Your platform (web server, CDN, edge worker, script)
│
│ Batched request logs (JSON over HTTPS)
▼
Ansehn Custom Log API (POST /v1/logs/custom)
│
│ API key auth · AI crawler detection · validation
▼
Your Server Logs & Our Pages AnalyticsYou collect request logs on your side, batch them into JSON, and POST them to Ansehn using an API key you generate in your dashboard. Ansehn validates each record, detects AI crawler traffic by User-Agent, stores only the recognized AI crawler rows, and returns a transparent summary of exactly what was received, stored, and dropped.
Prerequisites
An Ansehn account with an active project for the domain you want to monitor.
An API key with the
logs:ingestscope (see Step 1).The ability to send an HTTPS
POSTrequest from your platform (any language or tool that can make HTTP requests).
Quick start
Generate a
logs:ingestAPI key in your Ansehn dashboard (Step 1 below).Collect your request logs and shape them into the JSON format below.
POSTa batch tohttps://ingest.ansehn.com/v1/logs/customwith your API key and anIdempotency-Key.Read the JSON response to confirm how many AI crawler rows were stored.
Open your Server Logs dashboard to see Daily Visits, Top Pages, and Top Crawlers populate.
Step 1 — Generate an API key in Ansehn
API key generation is available in your dashboard under Project Settings → API Keys.
Navigate to Projects → [Your Project] → Settings → API Keys
Click + New API Key
Give the key a descriptive name (e.g.,
Production Log Shipper)Under Scopes, select logs:ingest
Under Project, select the project for the domain you want to monitor
Click Generate and copy the key immediately — it is only shown once
Your key will look like: ank_a1b2c3d4e5f6...
The key is scoped to a single project. Ansehn derives the project from the key itself, so you never send a project ID in the request body — logs are always attributed to the project the key belongs to.
API reference
Endpoint
POST https://ingest.ansehn.com/v1/logs/customAuthentication
Send your API key using either header:
x-ansehn-api-key: ank_your_keyAuthorization: Bearer ank_your_keyRequired headers
Header | Value |
|---|---|
|
|
| A stable, unique ID for this batch (see below) |
Optional:
Header | Value |
|---|---|
|
|
Idempotency
The Idempotency-Key header is required. It protects you from accidental double-counting when a network retry resends the same batch.
Same key + same body → the batch is recognized as a duplicate and not ingested again. You get
200 OKwith"status": "duplicate".Same key + different body → rejected with
409 Conflict. Each unique batch must use its own key.
Use a stable, unique key per batch — never a random value regenerated on retry, and never the same key for different data. A good key encodes the source and the exact log window, for example: logs-prod-edge-001-2026-06-08T12:00:00Z-shard-a. Avoid keys like Date.now() that change on every retry.
Request body
You can send either a wrapped object:
{
"logs": [
{
"timestamp": "2026-06-08T12:00:00Z",
"method": "GET",
"host": "example.com",
"path": "/blog/article",
"status_code": 200,
"ip": "203.0.113.10",
"user_agent": "Mozilla/5.0 ... GPTBot/1.2",
"referer": "https://example.com",
"bytes_sent": 12345,
"duration_ms": 120
}
]
}…or a bare array, for easier integration:
[
{
"timestamp": "2026-06-08T12:00:00Z",
"method": "GET",
"host": "example.com",
"path": "/blog/article",
"status_code": 200,
"ip": "203.0.113.10",
"user_agent": "Mozilla/5.0 ... GPTBot/1.2"
}
]Required fields
Field | Type | Notes |
|---|---|---|
| ISO 8601 string, epoch seconds, or epoch milliseconds | Must fall within the accepted retention window (see below) |
| string | HTTP method; uppercased and normalized (unknown verbs stored as |
| string | Host header, max 255 chars |
| string | Request path, max 2048 chars |
| integer | Between 100 and 599 |
| string | May be raw, anonymized, or hashed (see Privacy) |
| string | Max 1024 chars; used for AI crawler detection |
Optional fields
Field | Type | Maps to |
|---|---|---|
| string | Referrer |
| integer | Response size in bytes |
| number | Request duration in milliseconds (converted to seconds internally) |
| number | Request duration in seconds |
Compression
For larger batches, gzip your JSON body and set Content-Encoding: gzip. This keeps you comfortably under request size limits and is the recommended approach for high-volume sites.
Limits
Limit | Value |
|---|---|
Records per request | 1,000 |
Compressed request body | ≤ 4 MB |
Decompressed body | ≤ 16 MB |
Requests per minute | 60 per API key |
| 1,024 chars |
| 2,048 chars |
| 255 chars |
| 2,048 chars |
For high-volume sites, send small batches frequently (e.g., every 30–60 seconds) rather than a few very large requests.
Accepted timestamp window
Server Logs data is retained for 90 days. To avoid storing records that would immediately expire or land in stale partitions, the API only accepts timestamps within:
now − 90 days ≤ timestamp ≤ now + 10 minutesRecords outside this window are rejected individually and reported in the response. Backfilling data older than 90 days is not supported.
What gets stored
Ansehn stores only recognized AI crawler rows. You can safely send your full, unfiltered request logs — non-AI-crawler traffic is detected and dropped before it is ever persisted, which keeps your storage footprint and privacy exposure low.
Every response tells you exactly what happened:
{
"status": "accepted",
"batch_id": "f1c2…",
"request_id": "logs-prod-edge-001-2026-06-08T12:00:00Z-shard-a",
"deduplicated": false,
"received_records": 1000,
"accepted_records": 998,
"stored_ai_crawler_records": 42,
"dropped_non_crawler_records": 956,
"rejected_records": 2,
"quarantined_records": 0,
"errors": [
{
"index": 17,
"field": "timestamp",
"message": "timestamp is outside the accepted retention window"
}
]
}received_records— rows in your batch.accepted_records— rows that passed validation.stored_ai_crawler_records— AI crawler rows actually written to your analytics.dropped_non_crawler_records— valid rows that were not AI crawlers (intentionally not stored).rejected_records— malformed rows; seeerrorsfor details.
A genuine batch with no AI crawlers returns stored_ai_crawler_records: 0 with "status": "accepted". A duplicate retry returns "status": "duplicate" instead — the two are always distinguishable.
Response status codes
Condition | Status |
|---|---|
Success (including zero AI crawler rows) |
|
Duplicate retry (same key + same body) |
|
Missing or invalid API key |
|
API key missing the |
|
Missing |
|
Idempotency key reused with a different body |
|
Too many records or body too large |
|
Rate limit exceeded |
|
No valid rows in the batch |
|
Temporary ingest issue |
|
Examples
cURL
curl https://ingest.ansehn.com/v1/logs/custom \
-X POST \
-H "Content-Type: application/json" \
-H "x-ansehn-api-key: ank_your_key" \
-H "Idempotency-Key: logs-prod-edge-001-2026-06-08T12:00:00Z-shard-a" \
--data '{
"logs": [
{
"timestamp": "2026-06-08T12:00:00Z",
"method": "GET",
"host": "example.com",
"path": "/blog/ai-search",
"status_code": 200,
"ip": "203.0.113.10",
"user_agent": "Mozilla/5.0; compatible; GPTBot/1.2; +https://openai.com/gptbot",
"referer": "-",
"bytes_sent": 18422,
"duration_ms": 91
}
]
}'Node.js
await fetch("https://ingest.ansehn.com/v1/logs/custom", {
method: "POST",
headers: {
"Content-Type": "application/json",
"x-ansehn-api-key": process.env.ANSEHN_API_KEY,
// Stable per batch — encode the source and exact log window.
"Idempotency-Key": "logs-prod-edge-001-2026-06-08T12:00:00Z-shard-a",
},
body: JSON.stringify({
logs: [
{
timestamp: "2026-06-08T12:00:00Z",
method: "GET",
host: "example.com",
path: "/",
status_code: 200,
ip: "203.0.113.10",
user_agent: "GPTBot/1.2",
},
],
}),
});Python
import requests
response = requests.post(
"https://ingest.ansehn.com/v1/logs/custom",
headers={
"Content-Type": "application/json",
"x-ansehn-api-key": "ank_your_key",
# Stable per batch — encode the source and exact log window.
"Idempotency-Key": "logs-prod-edge-001-2026-06-08T12:00:00Z-shard-a",
},
json={
"logs": [
{
"timestamp": "2026-06-08T12:00:00Z",
"method": "GET",
"host": "example.com",
"path": "/",
"status_code": 200,
"ip": "203.0.113.10",
"user_agent": "GPTBot/1.2",
}
]
},
)
print(response.json())Privacy & IP addresses
The ip field accepts whatever you send — raw, anonymized (e.g., 203.0.113.xx), or hashed. Because IP addresses are personal data in many jurisdictions, we recommend anonymizing or hashing IPs before sending them if your use case does not require the raw value. AI crawler detection relies on the User-Agent, not the IP, so anonymized IPs do not reduce detection accuracy.
All data is scoped exclusively to your project and isolated from other Ansehn customers. Your API key is stored as a one-way hash and cannot be recovered after generation.
Supported AI crawlers
Ansehn detects a broad and growing set of AI crawlers from your User-Agent strings (case-insensitive), including:
OpenAI:
GPTBot,OAI-SearchBot,ChatGPT-UserAnthropic:
ClaudeBot,Claude-User,Claude-SearchBotPerplexity:
PerplexityBot,Perplexity-UserGoogle:
Gemini,Google-ExtendedMeta:
meta-externalagent…and more across providers such as ByteDance, Amazon, Apple, Common Crawl, Cohere, You.com, Mistral, and DuckDuckGo.
Found a new AI crawler we don't yet recognize? Let us know and we'll add support.
Verification
After sending your first batch:
Confirm the response shows
stored_ai_crawler_recordsgreater than 0 (your batch must contain recognized AI crawler User-Agents).Open your Ansehn Server Logs dashboard — activity should appear under Daily Visits within a minute or two.
The connection status indicator next to the tab bar shows whether logs are flowing and when the last ingest occurred. Hover over it for diagnostics.
Troubleshooting
Symptom | Meaning | Fix |
|---|---|---|
| The batch contained no recognized AI crawler User-Agents | Confirm your logs include AI bot traffic; check the User-Agent strings |
| Missing, invalid, or expired API key | Regenerate the key in Project Settings → API Keys and update your shipper |
| Key lacks the | Generate a key with the |
| Missing | Add the header; validate your JSON |
| An | Use a new, unique key for each distinct batch |
| Over 1,000 records or body too large | Send smaller batches; enable |
| Over 60 requests/minute for the key | Reduce request frequency or batch more rows per request |
| No valid rows after validation | Check the |
Rows missing from the dashboard | Records outside the 90-day retention window | Only send logs from the last 90 days |
Frequently asked questions
Do I need to pre-filter my logs to AI crawlers only?
Will this slow down my website?
No. Log shipping happens off the critical path on your side; your visitors are never affected.
What happens if a request fails or times out?
Retry with the same Idempotency-Key. If the original batch was already stored, the retry is recognized as a duplicate and not counted twice.
Can multiple sources feed the same project?
Yes. Use one API key per source if you like, and give each batch its own Idempotency-Key. All data flows into the same project.
How is this different from the CloudFront integration?
The CloudFront integration is purpose-built for AWS CloudFront and requires no code. The Custom Log API is provider-agnostic — use it for any web server, CDN, edge platform, or custom log pipeline.
Pro tip: Compare crawler activity here with your insights in Citation Analytics to understand how crawling correlates with actual citations in AI answers.
Need help?
If you run into any issues during setup or have questions about the integration, reach out to us at [email protected] and mention "Custom Log API" in the subject line. We're happy to help you wire up your log pipeline.