# Advanced Features

This document covers advanced capabilities within Sling API specifications: pagination strategies, expression functions, incremental sync, and rules for error handling with retry logic.

For response processing basics (format handling, record extraction, deduplication, processors), see [Response Processing](https://docs.slingdata.io/concepts/api-specs/response).

## Content Overview

* [Pagination](#pagination)
* [Functions](#functions)
* [Sync State for Incremental Loads](#sync-state-for-incremental-loads)
* [Rules & Retries](#rules-and-retries)

## Pagination

Pagination controls how Sling navigates through multiple pages of results for *each iteration* (if `iterate` is used) or for the single endpoint execution (if `iterate` is not used).

### Pagination Flow

{% @mermaid/diagram content="graph TD
A\[Make Request] --> B\[Process Response]
B --> C{Check stop\_condition}
C -->|True| D\[Stop Pagination]
C -->|False| E\[Evaluate next\_state]
E --> F\[Update state variables]
F --> A

```
style A fill:#ff8c42,stroke:#ffffff,stroke-width:2px,color:#ffffff
style B fill:#7cb342,stroke:#ffffff,stroke-width:2px,color:#ffffff
style C fill:#ffd54f,stroke:#ffffff,stroke-width:2px,color:#000000
style D fill:#ef5350,stroke:#ffffff,stroke-width:2px,color:#ffffff
style E fill:#5c6bc0,stroke:#ffffff,stroke-width:2px,color:#ffffff
style F fill:#5c6bc0,stroke:#ffffff,stroke-width:2px,color:#ffffff" %}
```

### Common Pagination Patterns

#### 1. Cursor-based Pagination

Uses the ID of the last record to fetch the next page.

```yaml
pagination:
  next_state:
    # Use ID of last record for next request
    starting_after: "{response.records[-1].id}"
  stop_condition: 'jmespath(response.json, "has_more") == false || length(response.records) == 0'
```

#### 2. Page Number Pagination

Increments a page number for each request.

```yaml
pagination:
  next_state:
    # Increment page number
    page: "{state.page + 1}"
  stop_condition: 'state.page >= jmespath(response.json, "total_pages") || length(response.records) == 0'
```

#### 3. Offset Pagination

Increments an offset value based on records received.

```yaml
pagination:
  next_state:
    # Increase offset by limit
    offset: "{state.offset + state.limit}"
  stop_condition: "length(response.records) < state.limit"
```

#### 4. Link Header Pagination

Extracts the next page URL from the response headers.

```yaml
pagination:
  next_state:
    # Extract URL from Link header
    url: >
      {
        if(
          contains(response.headers.link, "rel=\"next\""),
          trim(split_part(split(response.headers.link, ",")[0], ";", 0), "<>"),
          null
        )
      }
  stop_condition: '!contains(response.headers.link, "rel=\"next\"")'
```

> 💡 **Tip:** For better performance, avoid using `response` variables in `next_state` expressions when possible. This allows Sling to prepare the next request before the current one finishes, increasing parallelism.

## Functions

Functions are the building blocks of dynamic expressions in Sling API specifications. They enable sophisticated data transformations, validations, and manipulations within your API configurations.

### Using Functions

Functions can be used throughout your API specification wherever expressions are supported, including:

* **Request Configuration**: Dynamic URLs, headers, parameters, and payloads
* **Response Processing**: Data transformation and extraction
* **Pagination Logic**: Computing next page parameters
* **Conditional Logic**: Rules, iteration conditions, and stop conditions
* **State Management**: Transforming and aggregating state variables

### Common Function Patterns

#### Dynamic Request Construction

```yaml
request:
  url: '{state.base_url}/users/{state.user_id}'
  headers:
    Authorization: "Bearer {auth.token}"
    X-Request-ID: "{uuid()}"
  parameters:
    updated_since: '{date_format(date_add(now(), -1, "day"), "%Y-%m-%dT%H:%M:%SZ")}'
    limit: '{coalesce(state.page_size, 100)}'
```

#### Data Transformation in Processors

```yaml
processors:
  # Parse and format timestamps
  - expression: 'date_parse(record.created_at, "auto")'
    output: "record.created_timestamp"
  
  # Clean and validate data
  - expression: "trim(upper(record.status))"
    output: "record.status_clean"
  
  # Extract nested values
  - expression: 'get_path(record, "user.profile.email")'
    output: "record.user_email"
  
  # Conditional processing
  - expression: 'if(record.active, "ACTIVE", "INACTIVE")'
    output: "record.status_label"
```

#### Pagination with Functions

```yaml
pagination:
  next_state:
    # Extract cursor from response
    cursor: "get_path(response.json, 'pagination.next_cursor')"
    
    # Increment page number
    page: "{coalesce(state.page, 0) + 1}"
    
    # Dynamic limit based on response size
    limit: "{if(length(response.records) < 100, 50, state.limit)}"

  # jmespath: is_null(jmespath(response.json, "pagination.next_cursor"))
  # jq:      is_null(jq(response.json, ".pagination.next_cursor")[0])
  stop_condition: 'is_null(jmespath(response.json, "pagination.next_cursor")) || length(response.records) == 0'
```

#### Advanced Data Processing

```yaml
processors:
  # Filter and transform arrays
  - expression: 'filter(record.tags, "length(value) > 0")'
    output: "record.valid_tags"
  
  # Extract specific fields using jq
  - expression: 'jq(record, "[.items[] | select(.price > 100) | {name, price}]")[0]'
    output: "record.expensive_items"
  
  # Hash sensitive data
  - expression: hash(record.email, "sha256")
    output: "record.email_hash"
  
  # Generate derived fields
  - expression: join([record.first_name, record.last_name], " ")
    output: "record.full_name"
```

### Function Error Handling

Functions can help with graceful error handling and fallback values:

```yaml
processors:
  # Safe parsing with fallback
  - expression: coalesce(try_cast(record.age, "int"), 0)
    output: "record.age_int"
  
  # Required field validation
  - expression: require(record.user_id, "User ID is required")
    output: "record.validated_user_id"
  
  # Conditional field access
  - expression: if(is_null(record.metadata), "", get_path(record.metadata, "source"))
    output: "record.source"
```

### Best Practices

1. **Use `coalesce()` for Defaults**: Always provide fallback values for optional fields
2. **Validate Required Fields**: Use `require()` to ensure critical data is present
3. **Handle Date Formats**: Use `date_parse()` with "auto" format when possible
4. **Escape Special Characters**: Use encoding functions for URLs and other special contexts
5. **Test Complex Expressions**: Break down complex expressions into smaller, testable parts

For a complete reference of all available functions, see the [Functions documentation](https://docs.slingdata.io/concepts/functions).

> 💡 **Tip**: Use the `log()` function during development to debug complex expressions: `log("Processing record: " + record.id)`

> ⚠️ **Warning**: Functions are evaluated for each record or iteration. Avoid expensive operations in frequently-called expressions.

## Sync State for Incremental Loads

The `sync` key allows persisting state variables between runs, enabling incremental data loading (fetching only new or updated data).

### Incremental Sync Workflow

{% @mermaid/diagram content="sequenceDiagram
participant Previous as Previous Run
participant Current as Current Run
participant API as API
participant Next as Next Run

```
Note over Previous: Stores state.last_sync_ts
Previous->>Current: sync.last_sync_ts

Note over Current: Init state variables
Current->>Current: state.start_timestamp = sync.last_sync_ts or default

Current->>API: Request with updated_since=state.start_timestamp
API->>Current: Response with records

Note over Current: Process & track max timestamp
Current->>Current: Find max of record.updated_at → state.last_sync_ts

Current->>Next: Persist state.last_sync_ts as sync.last_sync_ts" %}
```

### Example: Timestamp-Based Incremental Sync

```yaml
endpoints:
  incremental_data:
    state:
      # Get previous timestamp or default to 7 days ago
      start_timestamp: >
        {
          coalesce(
            sync.last_sync_ts,
            date_format(date_add(now(), -7, 'day'), '%Y-%m-%dT%H:%M:%SZ')
          )
        }
      
      # Initialize tracking variable with start timestamp
      last_sync_ts: '{state.start_timestamp}'

    # List of state variables to persist for next run
    sync: [last_sync_ts]

    request:
      parameters:
        # Filter by timestamp from last run
        updated_since: '{state.start_timestamp}'

    response:
      processors:
        # Track maximum timestamp seen
        - expression: "record.updated_at"
          output: "state.last_sync_ts"
          aggregation: "maximum"
```

> 💡 **Tip:** Always use `coalesce()` with sync variables to handle the first run when no previous state exists.

### Combining Incremental Sync with Context Variables

For advanced scenarios, you can combine sync state with **context variables** to support both incremental loading and backfilling. Context variables are runtime values passed from the replication configuration.

**Key context variables for incremental sync:**

* `context.range_start` - Start of backfill range (from `source_options.range`)
* `context.range_end` - End of backfill range (from `source_options.range`)
* `context.mode` - Replication mode (`incremental`, `full-refresh`, `backfill`)
* `context.limit` - Maximum records to fetch (from `source_options.limit`)

**Example: Incremental with Backfill Support**

```yaml
endpoints:
  events:
    sync: [last_date]

    iterate:
      # Backfill mode: Use context.range_start/range_end
      # Incremental mode: Use sync.last_date
      over: >
        range(
          coalesce(context.range_start, sync.last_date, date_format(date_add(now(), -7, "day"), "%Y-%m-%d")),
          coalesce(context.range_end, date_format(now(), "%Y-%m-%d")),
          "1d"
        )
      into: "state.current_date"

    request:
      url: "{state.base_url}/events"
      parameters:
        date: "{state.current_date}"

    response:
      records:
        jmespath: "events[]"
        primary_key: ["event_id"]

      processors:
        # Track last processed date for next incremental run
        - expression: "state.current_date"
          output: "state.last_date"
          aggregation: "maximum"
```

**Backfill Usage:**

```yaml
# replication.yaml
source: MY_API
target: MY_TARGET_DB

streams:
  events:
    object: analytics.events
    source_options:
      # Backfill January 2024
      range: '2024-01-01,2024-01-31'
```

**Incremental Usage:**

```yaml
# replication.yaml (without range)
source: MY_API
target: MY_TARGET_DB

streams:
  events:
    object: analytics.events
    # No range - uses sync.last_date
```

This pattern allows the same endpoint to handle both historical backfills and ongoing incremental updates. See [Context Variables](https://docs.slingdata.io/concepts/structure#context-variables) for full details.

## Rules & Retries

Rules define actions based on response conditions (status codes, headers, body content), providing fine-grained control over error handling and retries.

### Rules Evaluation Flow

{% @mermaid/diagram content="graph TD
A\[Receive Response] --> B\[Evaluate Rules in Order]
B --> C{Rule Condition Match?}
C -->|No| D\[Try Next Rule]
D --> C
C -->|Yes| E{What Action?}
E -->|retry| F\[Wait Based on Backoff]
F --> G\[Retry Request]
E -->|continue| H\[Process Response]
E -->|skip| I\[Skip This Request]
E -->|break| J\[Stop Iteration Gracefully]
E -->|stop| K\[Stop This Endpoint]
E -->|fail| L\[Fail with Error]

```
style A fill:#ff8c42,stroke:#ffffff,stroke-width:2px,color:#ffffff
style B fill:#5c6bc0,stroke:#ffffff,stroke-width:2px,color:#ffffff
style C fill:#ffd54f,stroke:#ffffff,stroke-width:2px,color:#000000
style E fill:#ffd54f,stroke:#ffffff,stroke-width:2px,color:#000000
style F fill:#7cb342,stroke:#ffffff,stroke-width:2px,color:#ffffff
style H fill:#7cb342,stroke:#ffffff,stroke-width:2px,color:#ffffff
style J fill:#ef5350,stroke:#ffffff,stroke-width:2px,color:#ffffff
style K fill:#ef5350,stroke:#ffffff,stroke-width:2px,color:#ffffff
style L fill:#ef5350,stroke:#ffffff,stroke-width:2px,color:#ffffff" %}
```

### Rule Properties

| Property       | Required       | Description                                | Example                                                          |
| -------------- | -------------- | ------------------------------------------ | ---------------------------------------------------------------- |
| `action`       | Yes            | Action to take when condition is true      | `"retry"`, `"continue"`, `"stop"`, `"break"`, `"skip"`, `"fail"` |
| `condition`    | Yes            | Expression that triggers the action        | `"response.status == 429"`                                       |
| `max_attempts` | No (for retry) | Max number of retry attempts               | `5` (default: 3)                                                 |
| `backoff`      | No (for retry) | Strategy for delay between retries         | `"exponential"`, `"linear"`, `"constant"`, `"jitter"`, `"none"`  |
| `backoff_base` | No (for retry) | Initial delay in seconds                   | `2` (default: 1)                                                 |
| `message`      | No             | Message for logging (supports expressions) | `"Rate limit hit, retrying..."`                                  |

### Rule Actions

| Action     | Description                                                 | Use Case                                               |
| ---------- | ----------------------------------------------------------- | ------------------------------------------------------ |
| `retry`    | Retry the request after delay                               | Rate limits (429), server errors (>=500)               |
| `continue` | Process response, ignore error                              | Non-critical errors (e.g., 404 for optional resources) |
| `skip`     | Break out of the rule evaluation loop and skip this request | When a request should not be processed                 |
| `break`    | Stop the current iteration gracefully without error         | Stop iteration within loops when processing complete   |
| `stop`     | Stop current endpoint/iteration                             | When further requests would be useless                 |
| `fail`     | Stop Sling run with error                                   | Critical errors (auth failure, invalid parameters)     |

### Example Rules

```yaml
rules:
  # Rule 1: Retry rate limits and server errors
  - action: "retry"
    condition: "response.status == 429 || response.status >= 500"
    max_attempts: 5
    backoff: "exponential"
    backoff_base: 2
    message: "Server error or rate limit hit, retrying..."

  # Rule 2: Fail on authentication errors
  - action: "fail"
    condition: "response.status == 401 || response.status == 403"
    message: "Authentication failed"

  # Rule 3: Ignore 404 errors
  - action: "continue"
    condition: "response.status == 404"
    message: "Resource not found, continuing"

  # Rule 4: Skip invalid records in iteration
  - action: "skip"
    condition: "is_null(record.id)"
    message: "Skipping record without ID"

  # Rule 5: Stop iteration when reaching limit
  - action: "break"
    condition: "state.records_processed >= state.limit"
    message: "Processed limit reached, breaking iteration"
```

> 📝 **Note:** Rules are evaluated in order. The first matching rule's action is executed.

## Backoff Strategies

When a rule uses the `retry` action, the backoff strategy determines how long to wait between retry attempts.

### Backoff Types

| Type          | Calculation                 | Use Case                           | Example Delays (backoff\_base=1) |
| ------------- | --------------------------- | ---------------------------------- | -------------------------------- |
| `none`        | No delay                    | Immediate retries (use cautiously) | 0s, 0s, 0s, ...                  |
| `constant`    | Fixed delay                 | Predictable retry timing           | 1s, 1s, 1s, ...                  |
| `linear`      | base × attempt              | Gradual backoff                    | 1s, 2s, 3s, 4s, 5s, ...          |
| `exponential` | base × 2^(attempt-1)        | Aggressive backoff (recommended)   | 1s, 2s, 4s, 8s, 16s, ...         |
| `jitter`      | exponential + random(0-50%) | Avoid thundering herd              | 1s, 3s, 5s, 10s, 20s, ...        |

### Backoff Examples with Timing

#### None (No Backoff)

```yaml
rules:
  - action: retry
    condition: "response.status >= 500"
    max_attempts: 3
    backoff: none
```

**Retry Timeline:**

* Request 1 (fails) → 0s wait
* Request 2 (fails) → 0s wait
* Request 3 (fails) → Give up

> ⚠️ **Warning:** No backoff can overwhelm failing services. Use only when retries must be immediate.

#### Constant Backoff

```yaml
rules:
  - action: retry
    condition: "response.status >= 500"
    max_attempts: 5
    backoff: constant
    backoff_base: 2  # 2 seconds between each retry
```

**Retry Timeline:**

* Request 1 (fails) → Wait 2s
* Request 2 (fails) → Wait 2s
* Request 3 (fails) → Wait 2s
* Request 4 (fails) → Wait 2s
* Request 5 (fails) → Give up

**Total time:** \~8 seconds

#### Linear Backoff

```yaml
rules:
  - action: retry
    condition: "response.status == 429"
    max_attempts: 5
    backoff: linear
    backoff_base: 3  # Base delay of 3 seconds
```

**Retry Timeline:**

* Request 1 (fails) → Wait 3s (3 × 1)
* Request 2 (fails) → Wait 6s (3 × 2)
* Request 3 (fails) → Wait 9s (3 × 3)
* Request 4 (fails) → Wait 12s (3 × 4)
* Request 5 (fails) → Give up

**Total time:** \~30 seconds

#### Exponential Backoff (Recommended)

```yaml
rules:
  - action: retry
    condition: "response.status == 429 || response.status >= 500"
    max_attempts: 5
    backoff: exponential
    backoff_base: 2  # Base delay of 2 seconds
```

**Retry Timeline:**

* Request 1 (fails) → Wait 2s (2 × 2⁰ = 2)
* Request 2 (fails) → Wait 4s (2 × 2¹ = 4)
* Request 3 (fails) → Wait 8s (2 × 2² = 8)
* Request 4 (fails) → Wait 16s (2 × 2³ = 16)
* Request 5 (fails) → Give up

**Total time:** \~30 seconds

> 💡 **Tip:** Exponential backoff is the industry standard for API retries. It quickly backs off from transient failures while giving services time to recover.

#### Jitter Backoff (Best for High Concurrency)

```yaml
rules:
  - action: retry
    condition: "response.status == 429"
    max_attempts: 5
    backoff: jitter
    backoff_base: 2
```

**Retry Timeline (example with random jitter):**

* Request 1 (fails) → Wait 2.3s (2s + 15% jitter)
* Request 2 (fails) → Wait 5.1s (4s + 28% jitter)
* Request 3 (fails) → Wait 10.4s (8s + 30% jitter)
* Request 4 (fails) → Wait 20.8s (16s + 30% jitter)
* Request 5 (fails) → Give up

**Total time:** \~38 seconds (varies due to randomness)

> 📝 **Note:** Jitter adds 0-50% random delay to exponential backoff. This prevents multiple clients from retrying simultaneously (thundering herd problem).

### Choosing the Right Backoff Strategy

| Scenario                   | Recommended Strategy      | Reasoning                       |
| -------------------------- | ------------------------- | ------------------------------- |
| Rate limits (429)          | `exponential` or `jitter` | Gives API time to recover quota |
| Server errors (5xx)        | `exponential`             | Allows server recovery time     |
| Temporary network issues   | `linear`                  | Moderate, predictable backoff   |
| Must retry immediately     | `constant` with low base  | Fast retries, simple timing     |
| High-concurrency scenarios | `jitter`                  | Prevents retry storms           |

## Rate Limit Handling

Sling automatically detects and respects rate limit headers from API responses. This works in conjunction with backoff strategies to optimize retry timing.

### Automatic Rate Limit Detection

When a `retry` rule triggers on a 429 status, Sling automatically checks for rate limit headers:

```yaml
rules:
  - action: retry
    condition: "response.status == 429"
    max_attempts: 5
    backoff: exponential  # Fallback if no rate limit headers
    backoff_base: 2
```

### Supported Rate Limit Headers

Sling checks for these headers in order of priority:

#### 1. IETF Standard Headers (Preferred)

| Header                | Description                  | Example                |
| --------------------- | ---------------------------- | ---------------------- |
| `RateLimit-Reset`     | Seconds until quota resets   | `60` (wait 60 seconds) |
| `RateLimit-Remaining` | Requests remaining in window | `0`                    |
| `RateLimit-Policy`    | Rate limit window and quota  | `"60;q=100;w=60"`      |

```http
HTTP/1.1 429 Too Many Requests
RateLimit-Reset: 30
RateLimit-Remaining: 0
RateLimit-Policy: "minute";q=60;w=60
```

**Behavior:** Sling will wait 30 seconds before retrying.

#### 2. Legacy/Alternative Headers

| Header              | Description                         | Example                                  |
| ------------------- | ----------------------------------- | ---------------------------------------- |
| `Retry-After`       | Seconds or HTTP date to retry after | `120` or `Wed, 21 Oct 2025 07:28:00 GMT` |
| `X-RateLimit-Reset` | Unix timestamp when quota resets    | `1743158739`                             |

```http
HTTP/1.1 429 Too Many Requests
Retry-After: 60
```

**Behavior:** Sling will wait 60 seconds before retrying.

### Rate Limit Header Processing

When rate limit headers are detected, they **override** the backoff calculation:

{% @mermaid/diagram content="graph TD
A\[Retry Triggered] --> B{Status 429?}
B -->|No| C\[Use Backoff Strategy]
B -->|Yes| D{RateLimit Headers?}
D -->|Yes| E\[Extract Reset Time]
D -->|No| F{Retry-After Header?}
F -->|Yes| G\[Parse Retry-After]
F -->|No| C
E --> H\[Wait for Reset Time]
G --> H
C --> I\[Wait for Backoff Duration]
H --> J\[Retry Request]
I --> J

```
style A fill:#ff8c42,stroke:#ffffff,stroke-width:2px,color:#ffffff
style B fill:#ffd54f,stroke:#ffffff,stroke-width:2px,color:#000000
style D fill:#ffd54f,stroke:#ffffff,stroke-width:2px,color:#000000
style H fill:#7cb342,stroke:#ffffff,stroke-width:2px,color:#ffffff
style I fill:#5c6bc0,stroke:#ffffff,stroke-width:2px,color:#ffffff" %}
```

### Rate Limit Policy Parsing

For APIs using the IETF `RateLimit-Policy` header:

```http
RateLimit-Policy: "hour";q=1000;w=3600, "day";q=5000;w=86400
RateLimit-Remaining: 0
```

Format: `"name";q=quota;w=window`

* `q`: Quota (number of requests)
* `w`: Window duration (seconds)

**Sling's behavior:**

* If `RateLimit-Remaining` is 0, waits for the full window
* Otherwise, calculates proportional wait: `window × (1 - remaining/quota)`

### Complete Rate Limit Example

```yaml
endpoints:
  api_data:
    request:
      url: "{state.base_url}/data"
      rate: 10  # Max 10 requests per second normally

    response:
      rules:
        # Rule 1: Handle rate limits with header-aware retry
        - action: retry
          condition: "response.status == 429"
          max_attempts: 5
          backoff: exponential  # Fallback strategy
          backoff_base: 2
          message: "Rate limited - waiting {response.headers['ratelimit-reset']}s"

        # Rule 2: Fail on repeated rate limits
        - action: fail
          condition: "response.status == 429 && request.attempts >= 5"
          message: "Rate limit exceeded after 5 retries"

        # Rule 3: Handle server errors differently
        - action: retry
          condition: "response.status >= 500"
          max_attempts: 3
          backoff: jitter
          backoff_base: 5
```

**What happens:**

1. On first 429, checks for `RateLimit-Reset` header
2. If found, waits that duration (ignoring backoff calculation)
3. If not found, uses exponential backoff (2s, 4s, 8s, ...)
4. Retries up to 5 times
5. Fails if still getting 429 after all retries

### Testing Rate Limits

Use the trace flag to see rate limit handling in action:

```bash
sling conns test MY_API --endpoints data_endpoint --trace
```

Look for output like:

```
DBG r.0001.abc   response code=429 duration=234ms
DBG r.0001.abc   using rate limit headers for backoff: 30s
DBG r.0001.abc   rule met to retry (attempt=2) with backoff=30s: response.status == 429
```

> 💡 **Tip:** Most well-designed APIs include rate limit headers. Always use `exponential` or `jitter` backoff as a fallback for APIs that don't.
