# Advanced Features

This document covers advanced capabilities within Sling API specifications: pagination strategies, expression functions, incremental sync, and rules for error handling with retry logic.

For response processing basics (format handling, record extraction, deduplication, processors), see [Response Processing](/concepts/api-specs/response.md).

## Content Overview

* [Pagination](#pagination)
* [Functions](#functions)
* [Sync State for Incremental Loads](#sync-state-for-incremental-loads)
* [Rules & Retries](#rules-and-retries)

## Pagination

Pagination controls how Sling navigates through multiple pages of results for *each iteration* (if `iterate` is used) or for the single endpoint execution (if `iterate` is not used).

### Pagination Flow

{% @mermaid/diagram content="graph TD
A\[Make Request] --> B\[Process Response]
B --> C{Check stop\_condition}
C -->|True| D\[Stop Pagination]
C -->|False| E\[Evaluate next\_state]
E --> F\[Update state variables]
F --> A

```
style A fill:#ff8c42,stroke:#ffffff,stroke-width:2px,color:#ffffff
style B fill:#7cb342,stroke:#ffffff,stroke-width:2px,color:#ffffff
style C fill:#ffd54f,stroke:#ffffff,stroke-width:2px,color:#000000
style D fill:#ef5350,stroke:#ffffff,stroke-width:2px,color:#ffffff
style E fill:#5c6bc0,stroke:#ffffff,stroke-width:2px,color:#ffffff
style F fill:#5c6bc0,stroke:#ffffff,stroke-width:2px,color:#ffffff" %}
```

### Common Pagination Patterns

#### 1. Cursor-based Pagination

Uses the ID of the last record to fetch the next page.

```yaml
pagination:
  next_state:
    # Use ID of last record for next request
    starting_after: "{response.records[-1].id}"
  stop_condition: 'jmespath(response.json, "has_more") == false || length(response.records) == 0'
```

#### 2. Page Number Pagination

Increments a page number for each request.

```yaml
pagination:
  next_state:
    # Increment page number
    page: "{state.page + 1}"
  stop_condition: 'state.page >= jmespath(response.json, "total_pages") || length(response.records) == 0'
```

#### 3. Offset Pagination

Increments an offset value based on records received.

```yaml
pagination:
  next_state:
    # Increase offset by limit
    offset: "{state.offset + state.limit}"
  stop_condition: "length(response.records) < state.limit"
```

#### 4. Link Header Pagination

Extracts the next page URL from the response headers.

```yaml
pagination:
  next_state:
    # Extract URL from Link header
    url: >
      {
        if(
          contains(response.headers.link, "rel=\"next\""),
          trim(split_part(split(response.headers.link, ",")[0], ";", 0), "<>"),
          null
        )
      }
  stop_condition: '!contains(response.headers.link, "rel=\"next\"")'
```

> 💡 **Tip:** For better performance, avoid using `response` variables in `next_state` expressions when possible. This allows Sling to prepare the next request before the current one finishes, increasing parallelism.

## Functions

Functions are the building blocks of dynamic expressions in Sling API specifications. They enable sophisticated data transformations, validations, and manipulations within your API configurations.

### Using Functions

Functions can be used throughout your API specification wherever expressions are supported, including:

* **Request Configuration**: Dynamic URLs, headers, parameters, and payloads
* **Response Processing**: Data transformation and extraction
* **Pagination Logic**: Computing next page parameters
* **Conditional Logic**: Rules, iteration conditions, and stop conditions
* **State Management**: Transforming and aggregating state variables

### Common Function Patterns

#### Dynamic Request Construction

```yaml
request:
  url: '{state.base_url}/users/{state.user_id}'
  headers:
    Authorization: "Bearer {auth.token}"
    X-Request-ID: "{uuid()}"
  parameters:
    updated_since: '{date_format(date_add(now(), -1, "day"), "%Y-%m-%dT%H:%M:%SZ")}'
    limit: '{coalesce(state.page_size, 100)}'
```

#### Data Transformation in Processors

```yaml
processors:
  # Parse and format timestamps
  - expression: 'date_parse(record.created_at, "auto")'
    output: "record.created_timestamp"
  
  # Clean and validate data
  - expression: "trim(upper(record.status))"
    output: "record.status_clean"
  
  # Extract nested values
  - expression: 'get_path(record, "user.profile.email")'
    output: "record.user_email"
  
  # Conditional processing
  - expression: 'if(record.active, "ACTIVE", "INACTIVE")'
    output: "record.status_label"
```

#### Pagination with Functions

```yaml
pagination:
  next_state:
    # Extract cursor from response
    cursor: "get_path(response.json, 'pagination.next_cursor')"
    
    # Increment page number
    page: "{coalesce(state.page, 0) + 1}"
    
    # Dynamic limit based on response size
    limit: "{if(length(response.records) < 100, 50, state.limit)}"

  # jmespath: is_null(jmespath(response.json, "pagination.next_cursor"))
  # jq:      is_null(jq(response.json, ".pagination.next_cursor")[0])
  stop_condition: 'is_null(jmespath(response.json, "pagination.next_cursor")) || length(response.records) == 0'
```

#### Advanced Data Processing

```yaml
processors:
  # Filter and transform arrays
  - expression: 'filter(record.tags, "length(value) > 0")'
    output: "record.valid_tags"
  
  # Extract specific fields using jq
  - expression: 'jq(record, "[.items[] | select(.price > 100) | {name, price}]")[0]'
    output: "record.expensive_items"
  
  # Hash sensitive data
  - expression: hash(record.email, "sha256")
    output: "record.email_hash"
  
  # Generate derived fields
  - expression: join([record.first_name, record.last_name], " ")
    output: "record.full_name"
```

### Function Error Handling

Functions can help with graceful error handling and fallback values:

```yaml
processors:
  # Safe parsing with fallback
  - expression: coalesce(try_cast(record.age, "int"), 0)
    output: "record.age_int"
  
  # Required field validation
  - expression: require(record.user_id, "User ID is required")
    output: "record.validated_user_id"
  
  # Conditional field access
  - expression: if(is_null(record.metadata), "", get_path(record.metadata, "source"))
    output: "record.source"
```

### Best Practices

1. **Use `coalesce()` for Defaults**: Always provide fallback values for optional fields
2. **Validate Required Fields**: Use `require()` to ensure critical data is present
3. **Handle Date Formats**: Use `date_parse()` with "auto" format when possible
4. **Escape Special Characters**: Use encoding functions for URLs and other special contexts
5. **Test Complex Expressions**: Break down complex expressions into smaller, testable parts

For a complete reference of all available functions, see the [Functions documentation](/concepts/functions.md).

> 💡 **Tip**: Use the `log()` function during development to debug complex expressions: `log("Processing record: " + record.id)`

> ⚠️ **Warning**: Functions are evaluated for each record or iteration. Avoid expensive operations in frequently-called expressions.

## Sync State for Incremental Loads

The `sync` key allows persisting state variables between runs, enabling incremental data loading (fetching only new or updated data).

### Incremental Sync Workflow

{% @mermaid/diagram content="sequenceDiagram
participant Previous as Previous Run
participant Current as Current Run
participant API as API
participant Next as Next Run

```
Note over Previous: Stores state.last_sync_ts
Previous->>Current: sync.last_sync_ts

Note over Current: Init state variables
Current->>Current: state.start_timestamp = sync.last_sync_ts or default

Current->>API: Request with updated_since=state.start_timestamp
API->>Current: Response with records

Note over Current: Process & track max timestamp
Current->>Current: Find max of record.updated_at → state.last_sync_ts

Current->>Next: Persist state.last_sync_ts as sync.last_sync_ts" %}
```

### Example: Timestamp-Based Incremental Sync

```yaml
endpoints:
  incremental_data:
    state:
      # Get previous timestamp or default to 7 days ago
      start_timestamp: >
        {
          coalesce(
            sync.last_sync_ts,
            date_format(date_add(now(), -7, 'day'), '%Y-%m-%dT%H:%M:%SZ')
          )
        }
      
      # Initialize tracking variable with start timestamp
      last_sync_ts: '{state.start_timestamp}'

    # List of state variables to persist for next run
    sync: [last_sync_ts]

    request:
      parameters:
        # Filter by timestamp from last run
        updated_since: '{state.start_timestamp}'

    response:
      processors:
        # Track maximum timestamp seen
        - expression: "record.updated_at"
          output: "state.last_sync_ts"
          aggregation: "maximum"
```

> 💡 **Tip:** Always use `coalesce()` with sync variables to handle the first run when no previous state exists.

### Combining Incremental Sync with Context Variables

For advanced scenarios, you can combine sync state with **context variables** to support both incremental loading and backfilling. Context variables are runtime values passed from the replication configuration.

**Key context variables for incremental sync:**

* `context.range_start` - Start of backfill range (from `source_options.range`)
* `context.range_end` - End of backfill range (from `source_options.range`)
* `context.mode` - Replication mode (`incremental`, `full-refresh`, `backfill`)
* `context.limit` - Maximum records to fetch (from `source_options.limit`)

**Example: Incremental with Backfill Support**

```yaml
endpoints:
  events:
    sync: [last_date]

    iterate:
      # Backfill mode: Use context.range_start/range_end
      # Incremental mode: Use sync.last_date
      over: >
        range(
          coalesce(context.range_start, sync.last_date, date_format(date_add(now(), -7, "day"), "%Y-%m-%d")),
          coalesce(context.range_end, date_format(now(), "%Y-%m-%d")),
          "1d"
        )
      into: "state.current_date"

    request:
      url: "{state.base_url}/events"
      parameters:
        date: "{state.current_date}"

    response:
      records:
        jmespath: "events[]"
        primary_key: ["event_id"]

      processors:
        # Track last processed date for next incremental run
        - expression: "state.current_date"
          output: "state.last_date"
          aggregation: "maximum"
```

**Backfill Usage:**

```yaml
# replication.yaml
source: MY_API
target: MY_TARGET_DB

streams:
  events:
    object: analytics.events
    source_options:
      # Backfill January 2024
      range: '2024-01-01,2024-01-31'
```

**Incremental Usage:**

```yaml
# replication.yaml (without range)
source: MY_API
target: MY_TARGET_DB

streams:
  events:
    object: analytics.events
    # No range - uses sync.last_date
```

This pattern allows the same endpoint to handle both historical backfills and ongoing incremental updates. See [Context Variables](/concepts/api-specs/structure.md#context-variables) for full details.

## Rules & Retries

Rules define actions based on response conditions (status codes, headers, body content), providing fine-grained control over error handling and retries.

### Rules Evaluation Flow

{% @mermaid/diagram content="graph TD
A\[Receive Response] --> B\[Evaluate Rules in Order]
B --> C{Rule Condition Match?}
C -->|No| D\[Try Next Rule]
D --> C
C -->|Yes| E{What Action?}
E -->|retry| F\[Wait Based on Backoff]
F --> G\[Retry Request]
E -->|continue| H\[Process Response]
E -->|skip| I\[Skip This Request]
E -->|break| J\[Stop Iteration Gracefully]
E -->|stop| K\[Stop This Endpoint]
E -->|fail| L\[Fail with Error]

```
style A fill:#ff8c42,stroke:#ffffff,stroke-width:2px,color:#ffffff
style B fill:#5c6bc0,stroke:#ffffff,stroke-width:2px,color:#ffffff
style C fill:#ffd54f,stroke:#ffffff,stroke-width:2px,color:#000000
style E fill:#ffd54f,stroke:#ffffff,stroke-width:2px,color:#000000
style F fill:#7cb342,stroke:#ffffff,stroke-width:2px,color:#ffffff
style H fill:#7cb342,stroke:#ffffff,stroke-width:2px,color:#ffffff
style J fill:#ef5350,stroke:#ffffff,stroke-width:2px,color:#ffffff
style K fill:#ef5350,stroke:#ffffff,stroke-width:2px,color:#ffffff
style L fill:#ef5350,stroke:#ffffff,stroke-width:2px,color:#ffffff" %}
```

### Rule Properties

| Property       | Required       | Description                                | Example                                                          |
| -------------- | -------------- | ------------------------------------------ | ---------------------------------------------------------------- |
| `action`       | Yes            | Action to take when condition is true      | `"retry"`, `"continue"`, `"stop"`, `"break"`, `"skip"`, `"fail"` |
| `condition`    | Yes            | Expression that triggers the action        | `"response.status == 429"`                                       |
| `max_attempts` | No (for retry) | Max number of retry attempts               | `5` (default: 3)                                                 |
| `backoff`      | No (for retry) | Strategy for delay between retries         | `"exponential"`, `"linear"`, `"constant"`, `"jitter"`, `"none"`  |
| `backoff_base` | No (for retry) | Initial delay in seconds                   | `2` (default: 1)                                                 |
| `message`      | No             | Message for logging (supports expressions) | `"Rate limit hit, retrying..."`                                  |

### Rule Actions

| Action     | Description                                                 | Use Case                                               |
| ---------- | ----------------------------------------------------------- | ------------------------------------------------------ |
| `retry`    | Retry the request after delay                               | Rate limits (429), server errors (>=500)               |
| `continue` | Process response, ignore error                              | Non-critical errors (e.g., 404 for optional resources) |
| `skip`     | Break out of the rule evaluation loop and skip this request | When a request should not be processed                 |
| `break`    | Stop the current iteration gracefully without error         | Stop iteration within loops when processing complete   |
| `stop`     | Stop current endpoint/iteration                             | When further requests would be useless                 |
| `fail`     | Stop Sling run with error                                   | Critical errors (auth failure, invalid parameters)     |

### Example Rules

```yaml
rules:
  # Rule 1: Retry rate limits and server errors
  - action: "retry"
    condition: "response.status == 429 || response.status >= 500"
    max_attempts: 5
    backoff: "exponential"
    backoff_base: 2
    message: "Server error or rate limit hit, retrying..."

  # Rule 2: Fail on authentication errors
  - action: "fail"
    condition: "response.status == 401 || response.status == 403"
    message: "Authentication failed"

  # Rule 3: Ignore 404 errors
  - action: "continue"
    condition: "response.status == 404"
    message: "Resource not found, continuing"

  # Rule 4: Skip invalid records in iteration
  - action: "skip"
    condition: "is_null(record.id)"
    message: "Skipping record without ID"

  # Rule 5: Stop iteration when reaching limit
  - action: "break"
    condition: "state.records_processed >= state.limit"
    message: "Processed limit reached, breaking iteration"
```

> 📝 **Note:** Rules are evaluated in order. The first matching rule's action is executed.

## Backoff Strategies

When a rule uses the `retry` action, the backoff strategy determines how long to wait between retry attempts.

### Backoff Types

| Type          | Calculation                 | Use Case                           | Example Delays (backoff\_base=1) |
| ------------- | --------------------------- | ---------------------------------- | -------------------------------- |
| `none`        | No delay                    | Immediate retries (use cautiously) | 0s, 0s, 0s, ...                  |
| `constant`    | Fixed delay                 | Predictable retry timing           | 1s, 1s, 1s, ...                  |
| `linear`      | base × attempt              | Gradual backoff                    | 1s, 2s, 3s, 4s, 5s, ...          |
| `exponential` | base × 2^(attempt-1)        | Aggressive backoff (recommended)   | 1s, 2s, 4s, 8s, 16s, ...         |
| `jitter`      | exponential + random(0-50%) | Avoid thundering herd              | 1s, 3s, 5s, 10s, 20s, ...        |

### Backoff Examples with Timing

#### None (No Backoff)

```yaml
rules:
  - action: retry
    condition: "response.status >= 500"
    max_attempts: 3
    backoff: none
```

**Retry Timeline:**

* Request 1 (fails) → 0s wait
* Request 2 (fails) → 0s wait
* Request 3 (fails) → Give up

> ⚠️ **Warning:** No backoff can overwhelm failing services. Use only when retries must be immediate.

#### Constant Backoff

```yaml
rules:
  - action: retry
    condition: "response.status >= 500"
    max_attempts: 5
    backoff: constant
    backoff_base: 2  # 2 seconds between each retry
```

**Retry Timeline:**

* Request 1 (fails) → Wait 2s
* Request 2 (fails) → Wait 2s
* Request 3 (fails) → Wait 2s
* Request 4 (fails) → Wait 2s
* Request 5 (fails) → Give up

**Total time:** \~8 seconds

#### Linear Backoff

```yaml
rules:
  - action: retry
    condition: "response.status == 429"
    max_attempts: 5
    backoff: linear
    backoff_base: 3  # Base delay of 3 seconds
```

**Retry Timeline:**

* Request 1 (fails) → Wait 3s (3 × 1)
* Request 2 (fails) → Wait 6s (3 × 2)
* Request 3 (fails) → Wait 9s (3 × 3)
* Request 4 (fails) → Wait 12s (3 × 4)
* Request 5 (fails) → Give up

**Total time:** \~30 seconds

#### Exponential Backoff (Recommended)

```yaml
rules:
  - action: retry
    condition: "response.status == 429 || response.status >= 500"
    max_attempts: 5
    backoff: exponential
    backoff_base: 2  # Base delay of 2 seconds
```

**Retry Timeline:**

* Request 1 (fails) → Wait 2s (2 × 2⁰ = 2)
* Request 2 (fails) → Wait 4s (2 × 2¹ = 4)
* Request 3 (fails) → Wait 8s (2 × 2² = 8)
* Request 4 (fails) → Wait 16s (2 × 2³ = 16)
* Request 5 (fails) → Give up

**Total time:** \~30 seconds

> 💡 **Tip:** Exponential backoff is the industry standard for API retries. It quickly backs off from transient failures while giving services time to recover.

#### Jitter Backoff (Best for High Concurrency)

```yaml
rules:
  - action: retry
    condition: "response.status == 429"
    max_attempts: 5
    backoff: jitter
    backoff_base: 2
```

**Retry Timeline (example with random jitter):**

* Request 1 (fails) → Wait 2.3s (2s + 15% jitter)
* Request 2 (fails) → Wait 5.1s (4s + 28% jitter)
* Request 3 (fails) → Wait 10.4s (8s + 30% jitter)
* Request 4 (fails) → Wait 20.8s (16s + 30% jitter)
* Request 5 (fails) → Give up

**Total time:** \~38 seconds (varies due to randomness)

> 📝 **Note:** Jitter adds 0-50% random delay to exponential backoff. This prevents multiple clients from retrying simultaneously (thundering herd problem).

### Choosing the Right Backoff Strategy

| Scenario                   | Recommended Strategy      | Reasoning                       |
| -------------------------- | ------------------------- | ------------------------------- |
| Rate limits (429)          | `exponential` or `jitter` | Gives API time to recover quota |
| Server errors (5xx)        | `exponential`             | Allows server recovery time     |
| Temporary network issues   | `linear`                  | Moderate, predictable backoff   |
| Must retry immediately     | `constant` with low base  | Fast retries, simple timing     |
| High-concurrency scenarios | `jitter`                  | Prevents retry storms           |

## Rate Limit Handling

Sling automatically detects and respects rate limit headers from API responses. This works in conjunction with backoff strategies to optimize retry timing.

### Automatic Rate Limit Detection

When a `retry` rule triggers on a 429 status, Sling automatically checks for rate limit headers:

```yaml
rules:
  - action: retry
    condition: "response.status == 429"
    max_attempts: 5
    backoff: exponential  # Fallback if no rate limit headers
    backoff_base: 2
```

### Supported Rate Limit Headers

Sling checks for these headers in order of priority:

#### 1. IETF Standard Headers (Preferred)

| Header                | Description                  | Example                |
| --------------------- | ---------------------------- | ---------------------- |
| `RateLimit-Reset`     | Seconds until quota resets   | `60` (wait 60 seconds) |
| `RateLimit-Remaining` | Requests remaining in window | `0`                    |
| `RateLimit-Policy`    | Rate limit window and quota  | `"60;q=100;w=60"`      |

```http
HTTP/1.1 429 Too Many Requests
RateLimit-Reset: 30
RateLimit-Remaining: 0
RateLimit-Policy: "minute";q=60;w=60
```

**Behavior:** Sling will wait 30 seconds before retrying.

#### 2. Legacy/Alternative Headers

| Header              | Description                         | Example                                  |
| ------------------- | ----------------------------------- | ---------------------------------------- |
| `Retry-After`       | Seconds or HTTP date to retry after | `120` or `Wed, 21 Oct 2025 07:28:00 GMT` |
| `X-RateLimit-Reset` | Unix timestamp when quota resets    | `1743158739`                             |

```http
HTTP/1.1 429 Too Many Requests
Retry-After: 60
```

**Behavior:** Sling will wait 60 seconds before retrying.

### Rate Limit Header Processing

When rate limit headers are detected, they **override** the backoff calculation:

{% @mermaid/diagram content="graph TD
A\[Retry Triggered] --> B{Status 429?}
B -->|No| C\[Use Backoff Strategy]
B -->|Yes| D{RateLimit Headers?}
D -->|Yes| E\[Extract Reset Time]
D -->|No| F{Retry-After Header?}
F -->|Yes| G\[Parse Retry-After]
F -->|No| C
E --> H\[Wait for Reset Time]
G --> H
C --> I\[Wait for Backoff Duration]
H --> J\[Retry Request]
I --> J

```
style A fill:#ff8c42,stroke:#ffffff,stroke-width:2px,color:#ffffff
style B fill:#ffd54f,stroke:#ffffff,stroke-width:2px,color:#000000
style D fill:#ffd54f,stroke:#ffffff,stroke-width:2px,color:#000000
style H fill:#7cb342,stroke:#ffffff,stroke-width:2px,color:#ffffff
style I fill:#5c6bc0,stroke:#ffffff,stroke-width:2px,color:#ffffff" %}
```

### Rate Limit Policy Parsing

For APIs using the IETF `RateLimit-Policy` header:

```http
RateLimit-Policy: "hour";q=1000;w=3600, "day";q=5000;w=86400
RateLimit-Remaining: 0
```

Format: `"name";q=quota;w=window`

* `q`: Quota (number of requests)
* `w`: Window duration (seconds)

**Sling's behavior:**

* If `RateLimit-Remaining` is 0, waits for the full window
* Otherwise, calculates proportional wait: `window × (1 - remaining/quota)`

### Complete Rate Limit Example

```yaml
endpoints:
  api_data:
    request:
      url: "{state.base_url}/data"
      rate: 10  # Max 10 requests per second normally

    response:
      rules:
        # Rule 1: Handle rate limits with header-aware retry
        - action: retry
          condition: "response.status == 429"
          max_attempts: 5
          backoff: exponential  # Fallback strategy
          backoff_base: 2
          message: "Rate limited - waiting {response.headers['ratelimit-reset']}s"

        # Rule 2: Fail on repeated rate limits
        - action: fail
          condition: "response.status == 429 && request.attempts >= 5"
          message: "Rate limit exceeded after 5 retries"

        # Rule 3: Handle server errors differently
        - action: retry
          condition: "response.status >= 500"
          max_attempts: 3
          backoff: jitter
          backoff_base: 5
```

**What happens:**

1. On first 429, checks for `RateLimit-Reset` header
2. If found, waits that duration (ignoring backoff calculation)
3. If not found, uses exponential backoff (2s, 4s, 8s, ...)
4. Retries up to 5 times
5. Fails if still getting 429 after all retries

### Testing Rate Limits

Use the trace flag to see rate limit handling in action:

```bash
sling conns test MY_API --endpoints data_endpoint --trace
```

Look for output like:

```
DBG r.0001.abc   response code=429 duration=234ms
DBG r.0001.abc   using rate limit headers for backoff: 30s
DBG r.0001.abc   rule met to retry (attempt=2) with backoff=30s: response.status == 429
```

> 💡 **Tip:** Most well-designed APIs include rate limit headers. Always use `exponential` or `jitter` backoff as a fallback for APIs that don't.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.slingdata.io/concepts/api-specs/advanced.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
