# Response Processing

This document explains how Sling processes API responses, including format handling, record extraction, and data transformations.

## Response Flow Overview

{% @mermaid/diagram content="graph TD
A\[HTTP Response] --> B{Detect Format}
B -->|JSON| C\[Parse JSON]
B -->|CSV| D\[Parse CSV to Records]
B -->|XML| E\[Parse XML to JSON]
B -->|Auto-detect| F\[Check Content-Type Header]

```
C --> G[Apply JMESPath / jq]
D --> G
E --> G

G --> H[Extract Records Array]
H --> I[Deduplicate Records]
I --> J[Apply Processors]
J --> K[Output to Destination]

style A fill:#ff8c42,stroke:#ffffff,stroke-width:2px,color:#ffffff
style B fill:#ffd54f,stroke:#ffffff,stroke-width:2px,color:#000000
style G fill:#7cb342,stroke:#ffffff,stroke-width:2px,color:#ffffff
style J fill:#5c6bc0,stroke:#ffffff,stroke-width:2px,color:#ffffff
style K fill:#ab47bc,stroke:#ffffff,stroke-width:2px,color:#ffffff" %}
```

## Response Formats

Sling can automatically handle multiple response formats based on the API's `Content-Type` header or explicit configuration.

### Automatic Format Detection

By default, Sling detects the format from the `Content-Type` response header:

| Content-Type Header             | Format Detected | Processing                  |
| ------------------------------- | --------------- | --------------------------- |
| `application/json`              | JSON            | Direct JSON parsing         |
| `application/xml` or `text/xml` | XML             | Converted to JSON structure |
| `text/csv`                      | CSV             | Converted to JSON records   |
| Others                          | JSON (default)  | Attempts JSON parsing       |

### Explicit Format Configuration

You can override automatic detection by specifying the format explicitly:

```yaml
response:
  format: json  # Force format interpretation
  records:
    jmespath: "data[]"
```

Supported format values:

* `json` - Standard JSON response
* `csv` - Comma-separated values
* `xml` - XML response
* `jsonl` or `jsonlines` - JSON Lines (one JSON object per line)

## Format-Specific Processing

### JSON Responses

The most common API response format. Sling parses JSON and extracts records using JMESPath or jq:

```yaml
response:
  format: json  # Optional, auto-detected
  records:
    # Extract the array of user objects
    jmespath: "data.users[]"
    # or use jq syntax:
    # jq: ".data.users[]"
```

Example JSON response:

```json
{
  "data": {
    "users": [
      {"id": 1, "name": "Alice"},
      {"id": 2, "name": "Bob"}
    ]
  },
  "meta": {
    "total": 2
  }
}
```

### CSV Responses

CSV responses are automatically converted to JSON records:

```yaml
response:
  format: csv
  records:
    # For CSV, jmespath typically extracts all records
    jmespath: "[*]"
    primary_key: ["id"]
```

**CSV Processing Rules:**

1. First row is treated as the header row (column names)
2. Subsequent rows become records
3. Minimum 2 rows required (header + at least one data row)
4. Each row is converted to a JSON object with header names as keys

Example CSV response:

```csv
id,name,email
1,Alice,alice@example.com
2,Bob,bob@example.com
```

Becomes:

```json
[
  {"id": "1", "name": "Alice", "email": "alice@example.com"},
  {"id": "2", "name": "Bob", "email": "bob@example.com"}
]
```

> 📝 **Note:** CSV values are always strings. Use [processors](https://docs.slingdata.io/concepts/advanced#data-processors) to convert them to other types if needed.

### XML Responses

XML responses are automatically converted to JSON before record extraction:

```yaml
response:
  format: xml
  records:
    jmespath: "root.users.user[]"
```

Example XML response:

```xml
<root>
  <users>
    <user>
      <id>1</id>
      <name>Alice</name>
    </user>
    <user>
      <id>2</id>
      <name>Bob</name>
    </user>
  </users>
</root>
```

Becomes JSON:

```json
{
  "root": {
    "users": {
      "user": [
        {"id": "1", "name": "Alice"},
        {"id": "2", "name": "Bob"}
      ]
    }
  }
}
```

> ⚠️ **Warning:** XML to JSON conversion follows standard rules: attributes become fields with `@` prefix, text content becomes `#text` field.

### JSON Lines (JSONL)

For streaming JSON responses where each line is a complete JSON object:

```yaml
response:
  format: jsonl
  records:
    # Each line is already a record
    jmespath: "[*]"
```

Example JSONL response:

```jsonl
{"id": 1, "name": "Alice", "email": "alice@example.com"}
{"id": 2, "name": "Bob", "email": "bob@example.com"}
{"id": 3, "name": "Charlie", "email": "charlie@example.com"}
```

## Record Extraction

After format conversion, records are extracted using JMESPath or jq expressions.

### Basic Extraction

```yaml
response:
  records:
    # Extract top-level array
    jmespath: "[*]"
```

### Nested Extraction

```yaml
response:
  records:
    # Extract nested array
    jmespath: "response.data.items[]"
```

### Conditional Extraction

```yaml
response:
  records:
    # Extract only active users (JMESPath)
    jmespath: "users[?status=='active']"
    # or with jq:
    # jq: '[.users[] | select(.status == "active")]'
```

### Projection and Transformation

```yaml
response:
  records:
    # Extract and reshape data
    jmespath: "data[].{user_id: id, full_name: name, contact: email}"
```

## Deduplication

When `primary_key` is defined, Sling automatically deduplicates records:

```yaml
response:
  records:
    jmespath: "data[]"
    primary_key: ["id"]  # Single field
    # OR
    # primary_key: ["id", "location_id"]  # Composite key
```

### Deduplication Strategies

#### 1. In-Memory Deduplication (Default)

For datasets with reasonable record counts:

```yaml
response:
  records:
    primary_key: ["id"]
    # Uses hash map in memory
```

**Characteristics:**

* Fast and accurate
* Memory usage grows with unique record count
* Suitable for datasets up to \~1 million records

#### 2. Bloom Filter Deduplication

For very large datasets where memory is constrained:

```yaml
response:
  records:
    primary_key: ["id"]
    duplicate_tolerance: "10000000,0.001"  # capacity,error_rate
```

**Characteristics:**

* Probabilistic deduplication (small false positive rate)
* Fixed memory footprint
* Suitable for datasets with millions of records

**Format:** `"capacity,error_rate"`

* `capacity`: Expected number of unique records
* `error_rate`: Acceptable false positive rate (e.g., 0.001 = 0.1%)

> 💡 **Tip:** Use Bloom filter for datasets over 1 million records or when memory is limited. The error rate determines memory usage - lower rates use more memory.

## Response State

All response data is accessible in the `response` state variable for use in expressions:

| Response Property  | Description             | Example Usage               |
| ------------------ | ----------------------- | --------------------------- |
| `response.status`  | HTTP status code        | `response.status == 200`    |
| `response.headers` | Response headers        | `response.headers.link`     |
| `response.text`    | Raw response body       | `length(response.text) > 0` |
| `response.json`    | Parsed JSON response    | `response.json.has_more`    |
| `response.records` | Extracted records array | `length(response.records)`  |

### Using Response State in Pagination

```yaml
pagination:
  next_state:
    cursor: '{jmespath(response.json, "pagination.next_cursor")}'
  stop_condition: 'jmespath(response.json, "has_more") == false'
```

### Using Response State in Rules

```yaml
rules:
  - action: retry
    condition: "response.status == 429"
    max_attempts: 5

  - action: stop
    condition: "length(response.records) == 0"
    message: "No more records available"
```

### Using Response State in [Processors](https://docs.slingdata.io/concepts/advanced#data-processors)

```yaml
processors:
  # Add metadata from response to each record
  - expression: "response.json.request_id"
    output: "record.api_request_id"

  # Conditional processing based on response
  - if: "response.status == 206"  # Partial content
    expression: "record.id"
    output: "queue.incomplete_records"
```

## Conditional Processing with IF Conditions

Processors support an optional `if` field to conditionally execute based on runtime conditions.

### Basic Syntax

```yaml
processors:
  # Only process non-null values
  - expression: "lower(record.email)"
    if: "!is_null(record.email) && record.email != ''"
    output: "record.email_normalized"

  # Only queue US customers
  - expression: "record.id"
    if: "record.country == 'US'"
    output: "queue.us_customer_ids"

  # Track max timestamp only for completed records
  - expression: "record.updated_at"
    if: "record.status == 'completed'"
    output: "state.last_completed_timestamp"
    aggregation: "maximum"
```

### How It Works

* **Evaluation**: The `if` condition is evaluated **before** the expression
* **Skip on False**: If false, the entire processor is skipped for that record
* **Access**: Has access to `record`, `state`, `response`, `env`, `secrets`

### Common Patterns

```yaml
processors:
  # Null/empty checks
  - expression: 'cast(record.age, "int")'
    if: "!is_null(record.age)"
    output: "record.age_int"

  # Type validation with try_cast
  - expression: 'cast(record.value, "int")'
    if: "is_null(try_cast(record.value, 'int')) == false"
    output: "record.value_int"

  # Date filtering
  - expression: "record.id"
    if: "date_parse(record.created_at, 'auto') > date_add(now(), -7, 'day')"
    output: "queue.recent_ids"

  # Response-based conditions
  - expression: "response.json.request_id"
    if: "response.status == 200"
    output: "record.api_request_id"
```

> 💡 **Tip:** Always check for null before accessing field properties to avoid errors.

> ⚠️ **Warning:** IF conditions are evaluated for every record. Avoid expensive operations.

## Overwriting Records with `output: "record"`

Setting `output: "record"` completely replaces the entire record with the result of the expression. All existing fields are discarded unless explicitly included.

### Common Use Cases

**1. Select Specific Fields**

Keep only essential fields from large API responses:

```yaml
processors:
  - expression: >
      object(
        "user_id", record.id,
        "username", record.username,
        "email", record.email
      )
    output: "record"
```

**2. Rename Fields**

Transform field names to match your schema:

```yaml
processors:
  - expression: >
      object(
        "customer_id", record.id,
        "full_name", record.name,
        "contact_email", record.email
      )
    output: "record"
```

**3. Flatten Nested Data**

Convert nested structures into flat records using JMESPath or jq:

```yaml
processors:
  # Using JMESPath:
  - expression: >
      jmespath(record, "{
        id: id,
        name: user.profile.name,
        email: user.contact.email,
        country: user.address.country,
        plan_type: subscription.plan.type
      }")
    output: "record"

  # Or using jq:
  - expression: >
      jq(record, "{id, name: .user.profile.name, email: .user.contact.email, country: .user.address.country, plan_type: .subscription.plan.type}")
    output: "record"
```

**4. Add Computed Fields**

Create records with derived values:

```yaml
processors:
  - expression: >
      object(
        "order_id", record.id,
        "subtotal", record.subtotal,
        "tax", record.subtotal * 0.08,
        "total", record.subtotal * 1.08
      )
    output: "record"
```

### Important Warnings

⚠️ **All previous fields are discarded** - Must explicitly include every field you want to keep

⚠️ **Order matters** - If you overwrite the record, then add fields afterward:

```yaml
processors:
  # First: Overwrite to simplify
  - expression: 'object("id", record.id, "name", record.name)'
    output: "record"

  # Then: Add new fields to simplified record
  - expression: "upper(record.name)"
    output: "record.name_upper"
```

⚠️ **Include primary keys** - For deduplication to work, primary key fields must be in the new record

> 💡 **Tip:** Use JMESPath projection syntax for cleaner nested data transformations.

## Error Handling

### Invalid Response Format

When Sling cannot parse the response in the expected format:

```yaml
rules:
  - action: fail
    condition: "response.status >= 400"
    message: "API returned error: {response.status}"
```

### Empty or Missing Records

Handle cases where no records are found:

```yaml
pagination:
  # Stop if no records returned
  stop_condition: "length(response.records) == 0"
```

### Partial Responses

Some APIs return partial data on errors:

```yaml
rules:
  # Continue processing partial results
  - action: continue
    condition: "response.status == 206"
    message: "Partial content received, processing available data"
```

## Complete Example

Here's a comprehensive example showing all response processing features:

```yaml
endpoints:
  user_activity:
    request:
      url: "{state.base_url}/users/activity"
      parameters:
        limit: 100

    response:
      # Explicitly set format (usually auto-detected)
      format: json

      records:
        # Extract nested records
        jmespath: "data.activities[]"

        # Deduplicate by composite key
        primary_key: ["user_id", "activity_id"]

        # Limit total records for testing
        limit: 5000

        # Use Bloom filter for large datasets
        duplicate_tolerance: "1000000,0.001"

      processors:
        # Transform timestamp field
        - expression: 'date_parse(record.timestamp, "auto")'
          output: "record.activity_date"

        # Add response metadata
        - expression: "response.json.request_id"
          output: "record.api_request_id"

        # Track max timestamp for incremental sync
        - expression: "record.timestamp"
          output: "state.last_activity_timestamp"
          aggregation: maximum

        # Send user IDs to queue for detail lookup
        - expression: "record.user_id"
          output: "queue.user_ids"

      rules:
        # Retry on rate limit
        - action: retry
          condition: "response.status == 429"
          max_attempts: 5
          backoff: exponential

        # Continue on not found (user may have been deleted)
        - action: continue
          condition: "response.status == 404"
          message: "Resource not found, continuing"

        # Fail on auth errors
        - action: fail
          condition: "response.status == 401 || response.status == 403"
          message: "Authentication failed"

    pagination:
      next_state:
        cursor: '{jmespath(response.json, "pagination.next_cursor")}'
      stop_condition: 'is_null(jmespath(response.json, "pagination.next_cursor")) || length(response.records) == 0'
```

## Best Practices

### 1. Always Define Primary Keys

Even if the API doesn't explicitly require deduplication, defining primary keys helps ensure data quality:

```yaml
response:
  records:
    primary_key: ["id"]  # Prevents accidental duplicates
```

### 2. Use Appropriate Deduplication

Choose the right strategy based on your dataset size:

```yaml
# For < 1M records (default)
primary_key: ["id"]

# For > 1M records
primary_key: ["id"]
duplicate_tolerance: "10000000,0.001"
```

### 3. Handle Multiple Content Types

If your API might return different formats:

```yaml
rules:
  # Handle JSON errors
  - action: fail
    condition: 'response.status >= 400 && response.headers["content-type"] == "application/json"'
    message: "API error: {response.json.error}"

  # Handle HTML errors (often 500 errors)
  - action: fail
    condition: 'response.status >= 400 && jmespath(response.headers, "\"content-type\"") == "text/html"'
    message: "Server error (HTML response)"
```

### 4. Validate Records Structure

Use processors to validate critical fields:

```yaml
processors:
  # Ensure required field exists
  - expression: 'require(record.id, "Record missing required id field")'
    output: "record.id_validated"
```

### 5. Log Response Details for Debugging

During development, use processors to log response information:

```yaml
processors:
  # Log response summary
  - expression: >
      log("Response status: " + string(response.status) +
          ", Records: " + string(length(response.records)))
    output: ""  # Empty output means don't store anywhere
```

## Troubleshooting

### No Records Extracted

If you're not getting any records:

1. Check your JMESPath or jq expression:

```bash
sling conns test API_NAME --endpoints ENDPOINT_NAME --trace
```

2. Look at the raw response in trace output
3. Verify the path to your records array
4. Test expressions using online tools: [jmespath.org](https://jmespath.org/) for JMESPath, [jqplay.org](https://jqplay.org/) for jq

### CSV Parsing Errors

Common CSV issues:

```yaml
# Error: "need at least 2 lines to build records from csv"
# Solution: Ensure API returns header + at least one data row
```

### Deduplication Not Working

Verify your primary key fields exist:

```yaml
processors:
  # Log primary key values
  - if: "!is_null(record.id)"
    expression: 'log("Found ID: " + string(record.id))'
    output: ""
```

> 💡 **Tip:** Use `--trace` flag to see detailed response processing including format detection, record extraction, and deduplication results.
