AWS CloudFront Integration
Ansehn's automated CloudFront integration uses Amazon Data Firehose to stream your CloudFront access logs directly to Ansehn in real time — no manual uploads, no scripts, no maintenance. Once connected, Ansehn will automatically detect and surface AI crawler activity (GPTBot, ClaudeBot, PerplexityBot, Gemini, and more) across your distribution.
Prerequisites
Before you begin, confirm you have the following:
- An active AWS account with permissions to create Amazon Data Firehose delivery streams
- A CloudFront distribution with Standard Logging v2 available (available on all distributions — no additional CloudFront configuration required beforehand)
- An Ansehn account with an active project for the domain associated with your distribution
How it works
CloudFront Distribution
│
│ Real-time log events (JSON)
▼
Amazon Data Firehose
│
│ HTTPS · GZIP · API Key auth
▼
Ansehn Ingest API → Your Crawler Analytics Dashboard
CloudFront streams log events to a Firehose delivery stream you control. Firehose batches and forwards those events securely to Ansehn using an API key you generate in your Ansehn dashboard. Zero latency is added to your end-user requests — all log forwarding happens asynchronously off the critical path.
Step-by-step setup
Step 1 — Generate an API key in Ansehn
API key generation is available in your dashboard under Project Settings → API Keys.
In your Ansehn dashboard:
- Navigate to Projects → [Your Project] → Settings → API Keys
- Click + New API Key
- Give the key a descriptive name (e.g.,
Production CloudFront) - Under Scopes, select logs:ingest
- Under Project, select the project for the domain associated with your CloudFront distribution
- Click Generate and copy the key immediately — it is only shown once
Your key will look like: ank_a1b2c3d4e5f6...
Step 2 — Create a Firehose delivery stream
- Sign in to the AWS Console and navigate to Amazon Data Firehose
- Click Create Firehose stream
- Configure the stream:
- Source: Direct PUT
- Destination: HTTP endpoint
- Stream name: Something descriptive, e.g.,
ansehn-cloudfront-logs
Step 3 — Configure the HTTP endpoint
Within the Destination settings section:
| Field | Value |
|---|---|
| HTTP endpoint URL | https://ingest.ansehn.com/v1/logs/aws_cloudfront |
| Access key | Your Ansehn API key (see below for secure storage) |
| Content encoding | GZIP |
Storing your API key securely (recommended)
We strongly recommend storing your API key in AWS Secrets Manager rather than entering it as plain text in the Firehose configuration. Firehose natively supports Secrets Manager — this keeps your key encrypted at rest, auditable via CloudTrail, and rotatable without recreating the stream.
- Open AWS Secrets Manager and create a new secret
- Select Other type of secret and enter the following JSON:
{ "api_key": "ank_your_key_here" } - Name the secret something descriptive (e.g.,
ansehn/cloudfront-logs/api-key) - Back in the Firehose configuration, select Use AWS Secrets Manager and choose the secret you just created
If you prefer to enter the key directly (e.g., for a quick test), you can paste it into the Access key field — but we recommend migrating to Secrets Manager before going to production.
Under Backup settings, create or select an S3 bucket to store any failed delivery records. This is important — the S3 backup bucket is your safety net. If any log records cannot be delivered (e.g., during a brief service interruption), they are written to this bucket. Ansehn support can help you replay these records if needed. We recommend setting a 14-day retention policy on this bucket.
Step 4 — Attach the stream to your CloudFront distribution
- Open the CloudFront Console and select your distribution
- Click the Logging tab
- Click Add and select Amazon Kinesis Data Firehose (Kinesis Data Firehose is the legacy name for Amazon Data Firehose — they are the same service)
- Select the delivery stream you created in Step 2
Step 5 — Configure log fields
In the Additional settings section, select exactly the following 12 fields. Ansehn requires all of them to parse your logs correctly — omitting any field will cause those log records to be skipped.
| # | Field | Description |
|---|---|---|
| 1 | timestamp | Unix timestamp of the request |
| 2 | c-ip | Client IP address |
| 3 | sc-status | HTTP response status code |
| 4 | cs-method | HTTP request method (GET, POST, etc.) |
| 5 | cs-uri-stem | Request path (e.g., /blog/post-title) |
| 6 | cs-uri-query | Query string |
| 7 | cs(Host) | Host header value |
| 8 | cs(User-Agent) | Client user agent string |
| 9 | cs(Referer) | Referrer header |
| 10 | sc-content-type | Response content type |
| 11 | sc-bytes | Response size in bytes |
| 12 | time-taken | Request processing time in seconds |
Set the Output format to JSON.
Pre-submit checklist
Before clicking Submit, verify the following. Misconfiguration here is the #1 cause of logs not appearing in Ansehn:
- All 12 fields above are selected (no more, no fewer)
- Output format is set to JSON (not CSV or plain text)
- The correct Firehose delivery stream is selected (the one from Step 2)
- The stream is attached to the correct CloudFront distribution
Click Submit to save.
Verification
Once the integration is live and your Firehose stream is active, you can verify everything is working by:
- Visiting a page on your site from a browser
- Waiting approximately 60–90 seconds (Firehose default buffer interval)
- Checking your Ansehn Server Logs dashboard — you should see activity appear under Daily Visits
Your dashboard will also display a connection status indicator next to the tab bar showing whether logs are flowing and when the last ingest occurred. Hover over it for detailed diagnostics including total batches processed and AI crawler detection status.
What you will see in Ansehn
Once connected, your Server Logs dashboard will automatically show:
- Daily AI crawler visits — broken down by GPTBot, OAI-SearchBot, ClaudeBot, PerplexityBot, Gemini, Google Extended, and more
- Top crawled pages — which URLs AI search engines are indexing most frequently
- Crawler breakdown per page — understand which AI assistants are most interested in which content
- Historical trends — track changes over time as you publish new content
All data is scoped to your project domain and isolated from other Ansehn customers.
Frequently asked questions
Will this slow down my website? No. Firehose processes logs entirely asynchronously. Your visitors experience zero added latency.
How long does it take for logs to appear? CloudFront streams logs to Firehose in real time. Firehose buffers records and forwards them in batches — typically every 60 seconds (configurable). Expect logs to appear in your Ansehn dashboard within 1–2 minutes of a request hitting your distribution.
What happens if the Ansehn API is temporarily unavailable? Firehose has built-in retry logic and will re-attempt delivery with exponential backoff. Failed records are saved to your S3 backup bucket so no data is permanently lost.
Can I connect multiple CloudFront distributions to the same Ansehn project? Yes. You can generate multiple API keys and create separate Firehose streams — one per distribution — all feeding into the same Ansehn project.
Is my log data secure? All data is transmitted over HTTPS with GZIP encoding. Your API key is stored as a one-way hash — even Ansehn cannot recover it after generation. Log data is scoped exclusively to your organization and never shared.
I don't use CloudFront — can I still use this? The CloudFront integration is the first automated source we are shipping. Integrations for Cloudflare, Fastly, Nginx, and Vercel are on the roadmap. In the meantime, manual log upload is available today from your Server Logs dashboard.
Troubleshooting
If logs are not appearing in your Ansehn dashboard after 5 minutes, use the following diagnostics:
1. Check Firehose delivery status
- Open the Amazon Data Firehose console
- Select your delivery stream (e.g.,
ansehn-cloudfront-logs) - Click the Monitoring tab
- Look at these CloudWatch metrics:
| Metric | What it tells you |
|---|---|
| IncomingRecords | CloudFront is sending logs to Firehose. If 0, the logging config in Step 4 may be misconfigured. |
| DeliveryToHttpEndpoint.Success | Firehose successfully delivered batches to Ansehn. If 0 while IncomingRecords > 0, there is a delivery issue. |
| DeliveryToHttpEndpoint.Errors | Delivery failures. Check the S3 backup bucket for error details. |
2. Check the S3 backup bucket and dashboard for error details
Ansehn returns HTTP 200 for permanent errors (like invalid API keys) to prevent Firehose from endlessly retrying unrecoverable issues. This means a "successful" delivery status in Firehose does not guarantee data reached your dashboard. Always verify by checking the connection status indicator in your Server Logs dashboard.
| Symptom | Meaning | Fix |
|---|---|---|
| Firehose shows successful delivery but no data appears in Ansehn | API key is invalid, expired, lacks logs:ingest scope, or belongs to a different project | Check the connection status indicator in your Server Logs dashboard. Regenerate the key in Ansehn (Project Settings → API Keys) and update the Firehose configuration or Secrets Manager. |
| 400 | Payload rejected — likely missing or extra fields | Re-check the field selection in Step 5 (must be exactly 12 fields) |
| 500 | Temporary Ansehn service issue | Firehose will retry automatically — no action needed unless persistent |
3. Verify CloudFront is generating logs
- Open your CloudFront distribution → Logging tab
- Confirm a logging destination is listed and its status is Enabled
- Visit a page on your site and check the Firehose IncomingRecords metric after 60 seconds
4. Common mistakes
- Wrong field selection: Adding or removing fields from the required 12 will cause all records to be rejected. Double-check against the table in Step 5.
- Output format not JSON: If set to CSV or plain text, Ansehn cannot parse the logs. Must be JSON.
- Stream attached to wrong distribution: Verify the delivery stream is linked to the correct CloudFront distribution in the Logging tab.
- API key from wrong project: The API key must belong to the project that matches the domain on the CloudFront distribution. A key from a different project will be rejected.
Need help?
If you run into any issues during setup or have questions about the integration, reach out to us at support@ansehn.com and mention "CloudFront Integration" in the subject line. Our team is happy to walk through the AWS configuration with you directly.