Response Processing

This document explains how Sling processes API responses, including format handling, record extraction, and data transformations.

Response Flow Overview

Response Formats

Sling can automatically handle multiple response formats based on the API's Content-Type header or explicit configuration.

Automatic Format Detection

By default, Sling detects the format from the Content-Type response header:

Content-Type Header

Format Detected

Processing

application/json

JSON

Direct JSON parsing

application/xml or text/xml

XML

Converted to JSON structure

text/csv

CSV

Converted to JSON records

Others

JSON (default)

Attempts JSON parsing

Explicit Format Configuration

You can override automatic detection by specifying the format explicitly:

response:
  format: json  # Force format interpretation
  records:
    jmespath: "data[]"

Supported format values:

json - Standard JSON response
csv - Comma-separated values
xml - XML response
jsonl or jsonlines - JSON Lines (one JSON object per line)

Format-Specific Processing

JSON Responses

The most common API response format. Sling parses JSON and extracts records using JMESPath:

response:
  format: json  # Optional, auto-detected
  records:
    # Extract the array of user objects
    jmespath: "data.users[]"

Example JSON response:

{
  "data": {
    "users": [
      {"id": 1, "name": "Alice"},
      {"id": 2, "name": "Bob"}
    ]
  },
  "meta": {
    "total": 2
  }
}

CSV Responses

CSV responses are automatically converted to JSON records:

response:
  format: csv
  records:
    # For CSV, jmespath typically extracts all records
    jmespath: "[*]"
    primary_key: ["id"]

CSV Processing Rules:

First row is treated as the header row (column names)
Subsequent rows become records
Minimum 2 rows required (header + at least one data row)
Each row is converted to a JSON object with header names as keys

Example CSV response:

id,name,email
1,Alice,[email protected]
2,Bob,[email protected]

Becomes:

[
  {"id": "1", "name": "Alice", "email": "[email protected]"},
  {"id": "2", "name": "Bob", "email": "[email protected]"}
]

📝 Note: CSV values are always strings. Use processors to convert them to other types if needed.

XML Responses

XML responses are automatically converted to JSON before record extraction:

response:
  format: xml
  records:
    jmespath: "root.users.user[]"

Example XML response:

<root>
  <users>
    <user>
      <id>1</id>
      <name>Alice</name>
    </user>
    <user>
      <id>2</id>
      <name>Bob</name>
    </user>
  </users>
</root>

Becomes JSON:

{
  "root": {
    "users": {
      "user": [
        {"id": "1", "name": "Alice"},
        {"id": "2", "name": "Bob"}
      ]
    }
  }
}

⚠️ Warning: XML to JSON conversion follows standard rules: attributes become fields with @ prefix, text content becomes #text field.

JSON Lines (JSONL)

For streaming JSON responses where each line is a complete JSON object:

response:
  format: jsonl
  records:
    # Each line is already a record
    jmespath: "[*]"

Example JSONL response:

{"id": 1, "name": "Alice", "email": "[email protected]"}
{"id": 2, "name": "Bob", "email": "[email protected]"}
{"id": 3, "name": "Charlie", "email": "[email protected]"}

Record Extraction

After format conversion, records are extracted using JMESPath expressions.

Basic Extraction

response:
  records:
    # Extract top-level array
    jmespath: "[*]"

Nested Extraction

response:
  records:
    # Extract nested array
    jmespath: "response.data.items[]"

Conditional Extraction

response:
  records:
    # Extract only active users
    jmespath: "users[?status=='active']"

Projection and Transformation

response:
  records:
    # Extract and reshape data
    jmespath: "data[].{user_id: id, full_name: name, contact: email}"

Deduplication

When primary_key is defined, Sling automatically deduplicates records:

response:
  records:
    jmespath: "data[]"
    primary_key: ["id"]  # Single field
    # OR
    # primary_key: ["id", "location_id"]  # Composite key

Deduplication Strategies

1. In-Memory Deduplication (Default)

For datasets with reasonable record counts:

response:
  records:
    primary_key: ["id"]
    # Uses hash map in memory

Characteristics:

Fast and accurate
Memory usage grows with unique record count
Suitable for datasets up to ~1 million records

2. Bloom Filter Deduplication

For very large datasets where memory is constrained:

response:
  records:
    primary_key: ["id"]
    duplicate_tolerance: "10000000,0.001"  # capacity,error_rate

Characteristics:

Probabilistic deduplication (small false positive rate)
Fixed memory footprint
Suitable for datasets with millions of records

Format: "capacity,error_rate"

capacity: Expected number of unique records
error_rate: Acceptable false positive rate (e.g., 0.001 = 0.1%)

💡 Tip: Use Bloom filter for datasets over 1 million records or when memory is limited. The error rate determines memory usage - lower rates use more memory.

Response State

All response data is accessible in the response state variable for use in expressions:

Response Property

Description

Example Usage

response.status

HTTP status code

response.status == 200

response.headers

Response headers

response.headers.link

response.text

Raw response body

length(response.text) > 0

response.json

Parsed JSON response

response.json.has_more

response.records

Extracted records array

length(response.records)

Using Response State in Pagination

pagination:
  next_state:
    cursor: '{jmespath(response.json, "pagination.next_cursor")}'
  stop_condition: 'jmespath(response.json, "has_more") == false'

Using Response State in Rules

rules:
  - action: retry
    condition: "response.status == 429"
    max_attempts: 5

  - action: stop
    condition: "length(response.records) == 0"
    message: "No more records available"

Using Response State in Processors

processors:
  # Add metadata from response to each record
  - expression: "response.json.request_id"
    output: "record.api_request_id"

  # Conditional processing based on response
  - if: "response.status == 206"  # Partial content
    expression: "record.id"
    output: "queue.incomplete_records"

Conditional Processing with IF Conditions

Processors support an optional if field to conditionally execute based on runtime conditions.

Basic Syntax

processors:
  # Only process non-null values
  - expression: "lower(record.email)"
    if: "!is_null(record.email) && record.email != ''"
    output: "record.email_normalized"

  # Only queue US customers
  - expression: "record.id"
    if: "record.country == 'US'"
    output: "queue.us_customer_ids"

  # Track max timestamp only for completed records
  - expression: "record.updated_at"
    if: "record.status == 'completed'"
    output: "state.last_completed_timestamp"
    aggregation: "maximum"

How It Works

Evaluation: The if condition is evaluated before the expression
Skip on False: If false, the entire processor is skipped for that record
Access: Has access to record, state, response, env, secrets

Common Patterns

processors:
  # Null/empty checks
  - expression: 'cast(record.age, "int")'
    if: "!is_null(record.age)"
    output: "record.age_int"

  # Type validation with try_cast
  - expression: 'cast(record.value, "int")'
    if: "is_null(try_cast(record.value, 'int')) == false"
    output: "record.value_int"

  # Date filtering
  - expression: "record.id"
    if: "date_parse(record.created_at, 'auto') > date_add(now(), -7, 'day')"
    output: "queue.recent_ids"

  # Response-based conditions
  - expression: "response.json.request_id"
    if: "response.status == 200"
    output: "record.api_request_id"

💡 Tip: Always check for null before accessing field properties to avoid errors.

⚠️ Warning: IF conditions are evaluated for every record. Avoid expensive operations.

Overwriting Records with `output: "record"`

Setting output: "record" completely replaces the entire record with the result of the expression. All existing fields are discarded unless explicitly included.

Common Use Cases

1. Select Specific Fields

Keep only essential fields from large API responses:

processors:
  - expression: >
      object(
        "user_id", record.id,
        "username", record.username,
        "email", record.email
      )
    output: "record"

2. Rename Fields

Transform field names to match your schema:

processors:
  - expression: >
      object(
        "customer_id", record.id,
        "full_name", record.name,
        "contact_email", record.email
      )
    output: "record"

3. Flatten Nested Data

Convert nested structures into flat records using JMESPath:

processors:
  - expression: >
      jmespath(record, "{
        id: id,
        name: user.profile.name,
        email: user.contact.email,
        country: user.address.country,
        plan_type: subscription.plan.type
      }")
    output: "record"

4. Add Computed Fields

Create records with derived values:

processors:
  - expression: >
      object(
        "order_id", record.id,
        "subtotal", record.subtotal,
        "tax", record.subtotal * 0.08,
        "total", record.subtotal * 1.08
      )
    output: "record"

Important Warnings

⚠️ All previous fields are discarded - Must explicitly include every field you want to keep

⚠️ Order matters - If you overwrite the record, then add fields afterward:

processors:
  # First: Overwrite to simplify
  - expression: 'object("id", record.id, "name", record.name)'
    output: "record"

  # Then: Add new fields to simplified record
  - expression: "upper(record.name)"
    output: "record.name_upper"

⚠️ Include primary keys - For deduplication to work, primary key fields must be in the new record

💡 Tip: Use JMESPath projection syntax for cleaner nested data transformations.

Error Handling

Invalid Response Format

When Sling cannot parse the response in the expected format:

rules:
  - action: fail
    condition: "response.status >= 400"
    message: "API returned error: {response.status}"

Empty or Missing Records

Handle cases where no records are found:

pagination:
  # Stop if no records returned
  stop_condition: "length(response.records) == 0"

Partial Responses

Some APIs return partial data on errors:

rules:
  # Continue processing partial results
  - action: continue
    condition: "response.status == 206"
    message: "Partial content received, processing available data"

Complete Example

Here's a comprehensive example showing all response processing features:

endpoints:
  user_activity:
    request:
      url: "{state.base_url}/users/activity"
      parameters:
        limit: 100

    response:
      # Explicitly set format (usually auto-detected)
      format: json

      records:
        # Extract nested records
        jmespath: "data.activities[]"

        # Deduplicate by composite key
        primary_key: ["user_id", "activity_id"]

        # Limit total records for testing
        limit: 5000

        # Use Bloom filter for large datasets
        duplicate_tolerance: "1000000,0.001"

      processors:
        # Transform timestamp field
        - expression: 'date_parse(record.timestamp, "auto")'
          output: "record.activity_date"

        # Add response metadata
        - expression: "response.json.request_id"
          output: "record.api_request_id"

        # Track max timestamp for incremental sync
        - expression: "record.timestamp"
          output: "state.last_activity_timestamp"
          aggregation: maximum

        # Send user IDs to queue for detail lookup
        - expression: "record.user_id"
          output: "queue.user_ids"

      rules:
        # Retry on rate limit
        - action: retry
          condition: "response.status == 429"
          max_attempts: 5
          backoff: exponential

        # Continue on not found (user may have been deleted)
        - action: continue
          condition: "response.status == 404"
          message: "Resource not found, continuing"

        # Fail on auth errors
        - action: fail
          condition: "response.status == 401 || response.status == 403"
          message: "Authentication failed"

    pagination:
      next_state:
        cursor: '{jmespath(response.json, "pagination.next_cursor")}'
      stop_condition: 'is_null(jmespath(response.json, "pagination.next_cursor")) || length(response.records) == 0'

Best Practices

1. Always Define Primary Keys

Even if the API doesn't explicitly require deduplication, defining primary keys helps ensure data quality:

response:
  records:
    primary_key: ["id"]  # Prevents accidental duplicates

2. Use Appropriate Deduplication

Choose the right strategy based on your dataset size:

# For < 1M records (default)
primary_key: ["id"]

# For > 1M records
primary_key: ["id"]
duplicate_tolerance: "10000000,0.001"

3. Handle Multiple Content Types

If your API might return different formats:

rules:
  # Handle JSON errors
  - action: fail
    condition: 'response.status >= 400 && response.headers["content-type"] == "application/json"'
    message: "API error: {response.json.error}"

  # Handle HTML errors (often 500 errors)
  - action: fail
    condition: 'response.status >= 400 && jmespath(response.headers, "\"content-type\"") == "text/html"'
    message: "Server error (HTML response)"

4. Validate Records Structure

Use processors to validate critical fields:

processors:
  # Ensure required field exists
  - expression: 'require(record.id, "Record missing required id field")'
    output: "record.id_validated"

5. Log Response Details for Debugging

During development, use processors to log response information:

processors:
  # Log response summary
  - expression: >
      log("Response status: " + string(response.status) +
          ", Records: " + string(length(response.records)))
    output: ""  # Empty output means don't store anywhere

Troubleshooting

No Records Extracted

If you're not getting any records:

Check your JMESPath expression:

sling conns test API_NAME --endpoints ENDPOINT_NAME --trace

Look at the raw response in trace output
Verify the path to your records array
Test JMESPath expressions using online tools

CSV Parsing Errors

Common CSV issues:

# Error: "need at least 2 lines to build records from csv"
# Solution: Ensure API returns header + at least one data row

Deduplication Not Working

Verify your primary key fields exist:

processors:
  # Log primary key values
  - if: "!is_null(record.id)"
    expression: 'log("Found ID: " + string(record.id))'
    output: ""

💡 Tip: Use --trace flag to see detailed response processing including format detection, record extraction, and deduplication results.

PreviousRequests & Iteration NextAdvanced Features

Last updated 24 days ago

Was this helpful?