# Response Processing

This document explains how Sling processes API responses, including format handling, record extraction, and data transformations.

## Response Flow Overview

{% @mermaid/diagram content="graph TD
A\[HTTP Response] --> B{Detect Format}
B -->|JSON| C\[Parse JSON]
B -->|CSV| D\[Parse CSV to Records]
B -->|XML| E\[Parse XML to JSON]
B -->|Auto-detect| F\[Check Content-Type Header]

```
C --> G[Apply JMESPath / jq]
D --> G
E --> G

G --> H[Extract Records Array]
H --> I[Deduplicate Records]
I --> J[Apply Processors]
J --> K[Output to Destination]

style A fill:#ff8c42,stroke:#ffffff,stroke-width:2px,color:#ffffff
style B fill:#ffd54f,stroke:#ffffff,stroke-width:2px,color:#000000
style G fill:#7cb342,stroke:#ffffff,stroke-width:2px,color:#ffffff
style J fill:#5c6bc0,stroke:#ffffff,stroke-width:2px,color:#ffffff
style K fill:#ab47bc,stroke:#ffffff,stroke-width:2px,color:#ffffff" %}
```

## Response Formats

Sling can automatically handle multiple response formats based on the API's `Content-Type` header or explicit configuration.

### Automatic Format Detection

By default, Sling detects the format from the `Content-Type` response header:

| Content-Type Header             | Format Detected | Processing                  |
| ------------------------------- | --------------- | --------------------------- |
| `application/json`              | JSON            | Direct JSON parsing         |
| `application/xml` or `text/xml` | XML             | Converted to JSON structure |
| `text/csv`                      | CSV             | Converted to JSON records   |
| Others                          | JSON (default)  | Attempts JSON parsing       |

### Explicit Format Configuration

You can override automatic detection by specifying the format explicitly:

```yaml
response:
  format: json  # Force format interpretation
  records:
    jmespath: "data[]"
```

Supported format values:

* `json` - Standard JSON response
* `csv` - Comma-separated values
* `xml` - XML response
* `jsonl` or `jsonlines` - JSON Lines (one JSON object per line)

## Format-Specific Processing

### JSON Responses

The most common API response format. Sling parses JSON and extracts records using JMESPath or jq:

```yaml
response:
  format: json  # Optional, auto-detected
  records:
    # Extract the array of user objects
    jmespath: "data.users[]"
    # or use jq syntax:
    # jq: ".data.users[]"
```

Example JSON response:

```json
{
  "data": {
    "users": [
      {"id": 1, "name": "Alice"},
      {"id": 2, "name": "Bob"}
    ]
  },
  "meta": {
    "total": 2
  }
}
```

### CSV Responses

CSV responses are automatically converted to JSON records:

```yaml
response:
  format: csv
  records:
    # For CSV, jmespath typically extracts all records
    jmespath: "[*]"
    primary_key: ["id"]
```

**CSV Processing Rules:**

1. First row is treated as the header row (column names)
2. Subsequent rows become records
3. Minimum 2 rows required (header + at least one data row)
4. Each row is converted to a JSON object with header names as keys

Example CSV response:

```csv
id,name,email
1,Alice,alice@example.com
2,Bob,bob@example.com
```

Becomes:

```json
[
  {"id": "1", "name": "Alice", "email": "alice@example.com"},
  {"id": "2", "name": "Bob", "email": "bob@example.com"}
]
```

> 📝 **Note:** CSV values are always strings. Use [processors](/concepts/api-specs/advanced.md#data-processors) to convert them to other types if needed.

### XML Responses

XML responses are automatically converted to JSON before record extraction:

```yaml
response:
  format: xml
  records:
    jmespath: "root.users.user[]"
```

Example XML response:

```xml
<root>
  <users>
    <user>
      <id>1</id>
      <name>Alice</name>
    </user>
    <user>
      <id>2</id>
      <name>Bob</name>
    </user>
  </users>
</root>
```

Becomes JSON:

```json
{
  "root": {
    "users": {
      "user": [
        {"id": "1", "name": "Alice"},
        {"id": "2", "name": "Bob"}
      ]
    }
  }
}
```

> ⚠️ **Warning:** XML to JSON conversion follows standard rules: attributes become fields with `@` prefix, text content becomes `#text` field.

### JSON Lines (JSONL)

For streaming JSON responses where each line is a complete JSON object:

```yaml
response:
  format: jsonl
  records:
    # Each line is already a record
    jmespath: "[*]"
```

Example JSONL response:

```jsonl
{"id": 1, "name": "Alice", "email": "alice@example.com"}
{"id": 2, "name": "Bob", "email": "bob@example.com"}
{"id": 3, "name": "Charlie", "email": "charlie@example.com"}
```

## Record Extraction

After format conversion, records are extracted using JMESPath or jq expressions.

### Basic Extraction

```yaml
response:
  records:
    # Extract top-level array
    jmespath: "[*]"
```

### Nested Extraction

```yaml
response:
  records:
    # Extract nested array
    jmespath: "response.data.items[]"
```

### Conditional Extraction

```yaml
response:
  records:
    # Extract only active users (JMESPath)
    jmespath: "users[?status=='active']"
    # or with jq:
    # jq: '[.users[] | select(.status == "active")]'
```

### Projection and Transformation

```yaml
response:
  records:
    # Extract and reshape data
    jmespath: "data[].{user_id: id, full_name: name, contact: email}"
```

## Deduplication

When `primary_key` is defined, Sling automatically deduplicates records:

```yaml
response:
  records:
    jmespath: "data[]"
    primary_key: ["id"]  # Single field
    # OR
    # primary_key: ["id", "location_id"]  # Composite key
```

### Deduplication Strategies

#### 1. In-Memory Deduplication (Default)

For datasets with reasonable record counts:

```yaml
response:
  records:
    primary_key: ["id"]
    # Uses hash map in memory
```

**Characteristics:**

* Fast and accurate
* Memory usage grows with unique record count
* Suitable for datasets up to \~1 million records

#### 2. Bloom Filter Deduplication

For very large datasets where memory is constrained:

```yaml
response:
  records:
    primary_key: ["id"]
    duplicate_tolerance: "10000000,0.001"  # capacity,error_rate
```

**Characteristics:**

* Probabilistic deduplication (small false positive rate)
* Fixed memory footprint
* Suitable for datasets with millions of records

**Format:** `"capacity,error_rate"`

* `capacity`: Expected number of unique records
* `error_rate`: Acceptable false positive rate (e.g., 0.001 = 0.1%)

> 💡 **Tip:** Use Bloom filter for datasets over 1 million records or when memory is limited. The error rate determines memory usage - lower rates use more memory.

## Response State

All response data is accessible in the `response` state variable for use in expressions:

| Response Property  | Description             | Example Usage               |
| ------------------ | ----------------------- | --------------------------- |
| `response.status`  | HTTP status code        | `response.status == 200`    |
| `response.headers` | Response headers        | `response.headers.link`     |
| `response.text`    | Raw response body       | `length(response.text) > 0` |
| `response.json`    | Parsed JSON response    | `response.json.has_more`    |
| `response.records` | Extracted records array | `length(response.records)`  |

### Using Response State in Pagination

```yaml
pagination:
  next_state:
    cursor: '{jmespath(response.json, "pagination.next_cursor")}'
  stop_condition: 'jmespath(response.json, "has_more") == false'
```

### Using Response State in Rules

```yaml
rules:
  - action: retry
    condition: "response.status == 429"
    max_attempts: 5

  - action: stop
    condition: "length(response.records) == 0"
    message: "No more records available"
```

### Using Response State in [Processors](/concepts/api-specs/advanced.md#data-processors)

```yaml
processors:
  # Add metadata from response to each record
  - expression: "response.json.request_id"
    output: "record.api_request_id"

  # Conditional processing based on response
  - if: "response.status == 206"  # Partial content
    expression: "record.id"
    output: "queue.incomplete_records"
```

## Conditional Processing with IF Conditions

Processors support an optional `if` field to conditionally execute based on runtime conditions.

### Basic Syntax

```yaml
processors:
  # Only process non-null values
  - expression: "lower(record.email)"
    if: "!is_null(record.email) && record.email != ''"
    output: "record.email_normalized"

  # Only queue US customers
  - expression: "record.id"
    if: "record.country == 'US'"
    output: "queue.us_customer_ids"

  # Track max timestamp only for completed records
  - expression: "record.updated_at"
    if: "record.status == 'completed'"
    output: "state.last_completed_timestamp"
    aggregation: "maximum"
```

### How It Works

* **Evaluation**: The `if` condition is evaluated **before** the expression
* **Skip on False**: If false, the entire processor is skipped for that record
* **Access**: Has access to `record`, `state`, `response`, `env`, `secrets`

### Common Patterns

```yaml
processors:
  # Null/empty checks
  - expression: 'cast(record.age, "int")'
    if: "!is_null(record.age)"
    output: "record.age_int"

  # Type validation with try_cast
  - expression: 'cast(record.value, "int")'
    if: "is_null(try_cast(record.value, 'int')) == false"
    output: "record.value_int"

  # Date filtering
  - expression: "record.id"
    if: "date_parse(record.created_at, 'auto') > date_add(now(), -7, 'day')"
    output: "queue.recent_ids"

  # Response-based conditions
  - expression: "response.json.request_id"
    if: "response.status == 200"
    output: "record.api_request_id"
```

> 💡 **Tip:** Always check for null before accessing field properties to avoid errors.

> ⚠️ **Warning:** IF conditions are evaluated for every record. Avoid expensive operations.

## Overwriting Records with `output: "record"`

Setting `output: "record"` completely replaces the entire record with the result of the expression. All existing fields are discarded unless explicitly included.

### Common Use Cases

**1. Select Specific Fields**

Keep only essential fields from large API responses:

```yaml
processors:
  - expression: >
      object(
        "user_id", record.id,
        "username", record.username,
        "email", record.email
      )
    output: "record"
```

**2. Rename Fields**

Transform field names to match your schema:

```yaml
processors:
  - expression: >
      object(
        "customer_id", record.id,
        "full_name", record.name,
        "contact_email", record.email
      )
    output: "record"
```

**3. Flatten Nested Data**

Convert nested structures into flat records using JMESPath or jq:

```yaml
processors:
  # Using JMESPath:
  - expression: >
      jmespath(record, "{
        id: id,
        name: user.profile.name,
        email: user.contact.email,
        country: user.address.country,
        plan_type: subscription.plan.type
      }")
    output: "record"

  # Or using jq:
  - expression: >
      jq(record, "{id, name: .user.profile.name, email: .user.contact.email, country: .user.address.country, plan_type: .subscription.plan.type}")
    output: "record"
```

**4. Add Computed Fields**

Create records with derived values:

```yaml
processors:
  - expression: >
      object(
        "order_id", record.id,
        "subtotal", record.subtotal,
        "tax", record.subtotal * 0.08,
        "total", record.subtotal * 1.08
      )
    output: "record"
```

### Important Warnings

⚠️ **All previous fields are discarded** - Must explicitly include every field you want to keep

⚠️ **Order matters** - If you overwrite the record, then add fields afterward:

```yaml
processors:
  # First: Overwrite to simplify
  - expression: 'object("id", record.id, "name", record.name)'
    output: "record"

  # Then: Add new fields to simplified record
  - expression: "upper(record.name)"
    output: "record.name_upper"
```

⚠️ **Include primary keys** - For deduplication to work, primary key fields must be in the new record

> 💡 **Tip:** Use JMESPath projection syntax for cleaner nested data transformations.

## Error Handling

### Invalid Response Format

When Sling cannot parse the response in the expected format:

```yaml
rules:
  - action: fail
    condition: "response.status >= 400"
    message: "API returned error: {response.status}"
```

### Empty or Missing Records

Handle cases where no records are found:

```yaml
pagination:
  # Stop if no records returned
  stop_condition: "length(response.records) == 0"
```

### Partial Responses

Some APIs return partial data on errors:

```yaml
rules:
  # Continue processing partial results
  - action: continue
    condition: "response.status == 206"
    message: "Partial content received, processing available data"
```

## Complete Example

Here's a comprehensive example showing all response processing features:

```yaml
endpoints:
  user_activity:
    request:
      url: "{state.base_url}/users/activity"
      parameters:
        limit: 100

    response:
      # Explicitly set format (usually auto-detected)
      format: json

      records:
        # Extract nested records
        jmespath: "data.activities[]"

        # Deduplicate by composite key
        primary_key: ["user_id", "activity_id"]

        # Limit total records for testing
        limit: 5000

        # Use Bloom filter for large datasets
        duplicate_tolerance: "1000000,0.001"

      processors:
        # Transform timestamp field
        - expression: 'date_parse(record.timestamp, "auto")'
          output: "record.activity_date"

        # Add response metadata
        - expression: "response.json.request_id"
          output: "record.api_request_id"

        # Track max timestamp for incremental sync
        - expression: "record.timestamp"
          output: "state.last_activity_timestamp"
          aggregation: maximum

        # Send user IDs to queue for detail lookup
        - expression: "record.user_id"
          output: "queue.user_ids"

      rules:
        # Retry on rate limit
        - action: retry
          condition: "response.status == 429"
          max_attempts: 5
          backoff: exponential

        # Continue on not found (user may have been deleted)
        - action: continue
          condition: "response.status == 404"
          message: "Resource not found, continuing"

        # Fail on auth errors
        - action: fail
          condition: "response.status == 401 || response.status == 403"
          message: "Authentication failed"

    pagination:
      next_state:
        cursor: '{jmespath(response.json, "pagination.next_cursor")}'
      stop_condition: 'is_null(jmespath(response.json, "pagination.next_cursor")) || length(response.records) == 0'
```

## Best Practices

### 1. Always Define Primary Keys

Even if the API doesn't explicitly require deduplication, defining primary keys helps ensure data quality:

```yaml
response:
  records:
    primary_key: ["id"]  # Prevents accidental duplicates
```

### 2. Use Appropriate Deduplication

Choose the right strategy based on your dataset size:

```yaml
# For < 1M records (default)
primary_key: ["id"]

# For > 1M records
primary_key: ["id"]
duplicate_tolerance: "10000000,0.001"
```

### 3. Handle Multiple Content Types

If your API might return different formats:

```yaml
rules:
  # Handle JSON errors
  - action: fail
    condition: 'response.status >= 400 && response.headers["content-type"] == "application/json"'
    message: "API error: {response.json.error}"

  # Handle HTML errors (often 500 errors)
  - action: fail
    condition: 'response.status >= 400 && jmespath(response.headers, "\"content-type\"") == "text/html"'
    message: "Server error (HTML response)"
```

### 4. Validate Records Structure

Use processors to validate critical fields:

```yaml
processors:
  # Ensure required field exists
  - expression: 'require(record.id, "Record missing required id field")'
    output: "record.id_validated"
```

### 5. Log Response Details for Debugging

During development, use processors to log response information:

```yaml
processors:
  # Log response summary
  - expression: >
      log("Response status: " + string(response.status) +
          ", Records: " + string(length(response.records)))
    output: ""  # Empty output means don't store anywhere
```

## Troubleshooting

### No Records Extracted

If you're not getting any records:

1. Check your JMESPath or jq expression:

```bash
sling conns test API_NAME --endpoints ENDPOINT_NAME --trace
```

2. Look at the raw response in trace output
3. Verify the path to your records array
4. Test expressions using online tools: [jmespath.org](https://jmespath.org/) for JMESPath, [jqplay.org](https://jqplay.org/) for jq

### CSV Parsing Errors

Common CSV issues:

```yaml
# Error: "need at least 2 lines to build records from csv"
# Solution: Ensure API returns header + at least one data row
```

### Deduplication Not Working

Verify your primary key fields exist:

```yaml
processors:
  # Log primary key values
  - if: "!is_null(record.id)"
    expression: 'log("Found ID: " + string(record.id))'
    output: ""
```

> 💡 **Tip:** Use `--trace` flag to see detailed response processing including format detection, record extraction, and deduplication results.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.slingdata.io/concepts/api-specs/response.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
