> ## Documentation Index
> Fetch the complete documentation index at: https://docs2.zenskar.com/llms.txt
> Use this file to discover all available pages before exploring further.

# How-to: Handle ingestion failures

When sending usage events to Zenskar via API, ingestion can fail silently if your system doesn't handle errors correctly. This document explains how failures happen, how to detect them, and how to build a reliable pipeline that prevents data loss.

***

## Core concepts

### How ingestion can fail

Ingestion failures fall into two categories with very different implications.

**Rejected by Zenskar (non-retriable)**

Zenskar received the event but rejected it because the payload was invalid. The API responds with an HTTP `4xx` status code and a descriptive error message. If your system doesn't inspect that response, the event is silently dropped.

**Never reached Zenskar (retriable)**

The event never arrived due to a network outage on your side, an intermediate routing failure, or a Zenskar service disruption. Because the event never arrived, Zenskar has no record of it and cannot alert you. If you don't retry or persist the event locally, it is lost permanently.

### What is a dead letter queue (DLQ)?

A dead letter queue (DLQ) is a holding area for events that failed to ingest, regardless of the reason. Instead of discarding a failed event, your system routes it to the DLQ so it can be retried, inspected, or manually replayed later. A DLQ is the primary mechanism for guaranteeing no usage data is lost.

```mermaid theme={null}
sequenceDiagram
    participant S as Your system
    participant Z as Zenskar API
    participant D as DLQ

    S->>Z: POST /ingest (event payload)

    alt 200 OK
        Z-->>S: 200 OK
        Note over S: Event ingested successfully

    else 4xx validation error
        Z-->>S: 4xx + error message
        S->>D: Write event + error reason (non-retriable)
        Note over D: Awaits manual inspection
        D-->>S: Corrected payload
        S->>Z: POST /ingest (corrected payload)
        Z-->>S: 200 OK

    else 5xx server error
        Z-->>S: 5xx
        loop Retry with exponential backoff
            S->>Z: POST /ingest (same payload)
            Z-->>S: 5xx
        end
        S->>D: Write event + error reason (retriable, retries exhausted)

    else No response (network error)
        S-xZ: POST /ingest (no response)
        loop Retry with exponential backoff
            S->>Z: POST /ingest (same payload)
            S-xZ: No response
        end
        S->>D: Write event + error reason (retriable, retries exhausted)
    end
```

### Retriable vs. non-retriable failures

| Failure type                 | Cause                        | Should you retry?                      |
| ---------------------------- | ---------------------------- | -------------------------------------- |
| `4xx` validation error       | Malformed or invalid payload | No: Fix the payload first, then retry. |
| Network / connectivity error | No response received         | Yes: Retry with backoff                |
| `5xx` server error           | Zenskar-side issue           | Yes: Retry with backoff                |

<Callout icon="📚" theme="default">
  **Important:** Retrying a `4xx` error without fixing the payload will always fail again. Route these events to the DLQ for inspection and correction before re-sending.
</Callout>

***

## Quickstart guide

This walkthrough shows you how to send a usage event with basic error handling that routes failures to a DLQ. It assumes you are calling the Zenskar ingestion API directly over HTTP.

### Step 1: Send the event

Send a `POST` request with your event payload. A valid payload looks like this:

```json theme={null}
[
  {
    "data": {
      "campaign_id": "sample_campaign_id_8",
      "impressions": 74
    },
    "timestamp": "2025-06-28 23:44:47",
    "customer_id": "c03"
  }
]
```

### Step 2: Inspect the response

Always read the HTTP status code and response body. Do not assume success if you receive any response.

* `200`: Event accepted. No further action needed.
* `4xx`: Event rejected. Read the error message, fix the payload, then retry. Do not retry the original payload.
* `5xx` or no response: Delivery failed. Retry with exponential backoff (see Step 4).

### Step 3: Route failures to a DLQ

If the event cannot be delivered (network error or `5xx`) or was rejected due to a validation error (`4xx`), write it to your DLQ immediately. Include the original payload, the error reason, a timestamp, and the failure type (retriable vs. non-retriable) so you can process them correctly later.

### Step 4: Retry retriable failures with backoff

For network errors and `5xx` responses, retry using exponential backoff with jitter to avoid thundering-herd problems. A reasonable starting point:

* Initial delay: 1 second
* Multiplier: 2×
* Maximum delay: 60 seconds
* Maximum attempts: 5

After exhausting retries, move the event to the DLQ rather than discarding it.

### Step 5: Drain the DLQ

Periodically process events in the DLQ. For non-retriable (`4xx`) failures, inspect the error message, correct the payload, and re-send. For retriable failures that were exhausted, re-attempt delivery.

***

## How-to guides

### Choose a DLQ implementation

The right approach depends on your event volume and existing infrastructure.

| Method                                   | Description                                                           | Best for                                                      |
| ---------------------------------------- | --------------------------------------------------------------------- | ------------------------------------------------------------- |
| **File-based logging**                   | Write failed events to a local file                                   | Low volume, simple setups, local development                  |
| **Database table**                       | Store failed events in a dedicated table for review and manual replay | Moderate volume, teams that want SQL-queryable failure logs   |
| **Message queue (e.g. Kafka, RabbitMQ)** | Publish failed events to a dedicated DLQ topic or queue               | High volume, existing queue infrastructure                    |
| **Cloud-managed DLQ (e.g. AWS SQS DLQ)** | Use a managed queue with built-in retry and failure handling          | Cloud-native stacks, teams that prefer managed infrastructure |

### Make events idempotent

Before retrying, ensure your events carry a stable unique identifier (e.g. a UUID tied to the originating action). Submit this as part of the payload so that if a retry delivers a duplicate, Zenskar can deduplicate it on ingestion. This prevents double-counting usage when a network failure causes an event to be delivered more than once.

### Validate payloads before sending

Run basic schema validation on your side before calling the API. Check that:

* All required keys are present (`data`, `timestamp`, `customer_id`)
* All values match the expected types (see the Reference section below)
* No unexpected keys are included in the `data` object
* The timestamp is in the correct format (`YYYY-MM-DD HH:MM:SS`)
* The total payload size is under 1 MB

Catching these errors locally avoids unnecessary API calls and keeps your DLQ free of easily preventable failures.

***

## Reference

### Valid payload structure

```json theme={null}
[
  {
    "data": {
      "campaign_id": "sample_campaign_id_8",
      "impressions": 74
    },
    "timestamp": "2025-06-28 23:44:47",
    "customer_id": "c03"
  }
]
```

The request body must be a JSON array. Each element represents one usage event.

| Field         | Type     | Required | Notes                                                      |
| ------------- | -------- | -------- | ---------------------------------------------------------- |
| `customer_id` | String   | Yes      | Must match a customer in Zenskar                           |
| `timestamp`   | DateTime | Yes      | Format: `YYYY-MM-DD HH:MM:SS`                              |
| `data`        | Object   | Yes      | Keys and value types must match your metric schema exactly |

### HTTP error codes

<Callout icon="🚧" theme="warn">
  **Note:** The `404` status code below is returned by Zenskar specifically for unparseable JSON bodies. This is non-standard: most APIs use `400 Bad Request` for this case. If your HTTP client or logging tooling maps `404` to "resource not found," add explicit handling to avoid misclassifying this error.
</Callout>

| Status | Meaning                                           | Retriable?                     | Example error message                                              |
| ------ | ------------------------------------------------- | ------------------------------ | ------------------------------------------------------------------ |
| `404`  | Request body is not valid JSON (unparseable)      | No: Fix the JSON               | `invalid character '}' looking for beginning of object key string` |
| `413`  | Payload exceeds 1 MB                              | No: Split into smaller batches | `Payload too large`                                                |
| `422`  | Payload is valid JSON but fails schema validation | No: Fix the payload            | `Invalid type for key: impressions. Expected Int64, got string`    |
| `5xx`  | Zenskar server error                              | Yes: Retry with backoff        | —                                                                  |

### Validation error messages

When the API returns `422`, the response body contains a message describing the exact problem.

**Missing or unexpected keys**

```json theme={null}
{ "error": "Missing key: impressions" }
{ "error": "Unexpected key in payload: extra_field" }
```

**Type mismatches**

```json theme={null}
{ "error": "Invalid type for key: campaign_id. Expected String, got float64" }
{ "error": "Invalid type for key: impressions. Expected Int64, got string" }
{ "error": "Invalid type for key: value. Expected Float64, got string" }
{ "error": "Invalid type for key: is_active. Expected Bool, got string" }
```

**Date and time format errors**

```json theme={null}
{ "error": "Invalid type for key: start_date. Expected Date32, got string" }
{ "error": "Invalid type for key: timestamp. Expected Date32/DateTime64, got string" }
```

**UUID format errors**

```json theme={null}
{ "error": "Invalid type for key: user_id. Expected UUID, got string" }
```

**Nested object errors**

```json theme={null}
{ "error": "Invalid type for key: data. Expected Object, got string" }
{ "error": "Invalid type for key: nested_field. Expected Int64, got string" }
```

### Worked examples

**Example 1: Type mismatch: `impressions` sent as a string instead of an integer**

Request:

```json theme={null}
{
  "data": { "campaign_id": "sample_campaign_id_8", "impressions": "74" },
  "timestamp": "2025-06-28 23:44:47",
  "customer_id": "c03"
}
```

Response:

```json theme={null}
{ "error": "Invalid type for key: impressions. Expected Int64, got string" }
```

Fix: Send `74` (integer), not `"74"` (string).

***

**Example 2: Missing required key**

Request:

```json theme={null}
{
  "data": { "campaign_id": "sample_campaign_id_8" },
  "timestamp": "2025-06-28 23:44:47",
  "customer_id": "c03"
}
```

Response:

```json theme={null}
{ "error": "Missing key: impressions" }
```

Fix: Include all required fields defined in your metric schema.

***

**Example 3: Unexpected key in payload**

Request:

```json theme={null}
{
  "data": { "campaign_id": "sample_campaign_id_8", "impressions": 74, "extra_field": "not_allowed" },
  "timestamp": "2025-06-28 23:44:47",
  "customer_id": "c03"
}
```

Response:

```json theme={null}
{ "error": "Unexpected key in payload: extra_field" }
```

Fix: Remove any fields not defined in your metric schema.
