# Structure

This document covers the fundamental structure of a Sling API specification file.

## Root Level

At the root level, we have the following keys:

```yaml
# 'name', 'description' and 'endpoints' keys are required
name: <API display name>
description: <API description>

queues: [<array of queue names>]

defaults: <endpoint configuration map>

authentication: <authentication configuration map>

endpoints:
  <endpoint name>: <endpoint configuration map>
```

## Endpoint Level

The `<endpoint name>` identifies the API endpoint to interact with. This can be any descriptive name for the endpoint.

The `<endpoint configuration map>` is a map object which accepts the following keys:

```yaml
name: <endpoint name>
description: <endpoint description>
docs: <documentation URL>
disabled: true | false

state: {<map of state variables>}
sync: [<array of state variable names to persist>]

request: <request configuration map>
pagination: <pagination configuration map>
response: <response configuration map>

iterate: <iteration configuration map>
setup: [<array of setup calls>]
teardown: [<array of teardown calls>]

depends_on: [<array of upstream endpoint names>]
overrides: <stream processor configuration overrides>
```

## Request Configuration

The `<request configuration map>` accepts the keys below:

```yaml
url: <endpoint URL>
method: GET | POST | PUT | PATCH | DELETE | HEAD | OPTIONS | TRACE | CONNECT
timeout: <timeout in seconds>
headers: {<map of header name to value>}
parameters: {<map of parameter name to value>}
payload: <request body data>
rate: <maximum requests per second>
concurrency: <maximum concurrent requests>
```

## Pagination Configuration

The `<pagination configuration map>` accepts the keys below:

```yaml
next_state: {<map of state variables to update for next page>}
stop_condition: <expression to determine when to stop paginating>
```

## Response Configuration

The `<response configuration map>` accepts the keys below:

```yaml
format: json | csv | xml
records: <records extraction configuration map>
processors: [<array of processor configurations>]
rules: [<array of response rule configurations>]
```

## Records Configuration

The `<records extraction configuration map>` accepts the keys below:

```yaml
jmespath: <JMESPath expression to extract records>
jq: <jq expression to extract records>
primary_key: [<array of column names for primary key>]
update_key: <column name for incremental updates>
limit: <maximum number of records to process>
duplicate_tolerance: <bloom filter settings: "capacity,error_rate">
```

> ⚠️ **Note:** `jmespath` and `jq` are mutually exclusive — use one or the other, not both. See [Functions](https://docs.slingdata.io/concepts/functions) for syntax differences between the two.

> 💡 **Primary Key Priority:** When using API specs in replications, the primary key defined in the replication stream configuration takes priority over the primary key defined in the API spec. If no primary key is specified in the stream, the primary key from the spec will be used.

## Processor Configuration

Each processor in the processors array accepts:

```yaml
aggregation: none | maximum | minimum | collect | first | last
expression: <transformation expression>
output: <output destination (record field, state variable, queue, environment variable, or store)>
# Examples:
# - record.field_name (add/update field in record)
# - record (replace entire record)
# - state.variable_name (store in state, requires aggregation)
# - queue.queue_name (send to queue)
# - env.VAR_NAME (set environment variable, requires aggregation)
# - context.store.key_name (store in replication store, requires aggregation)
```

## Response Rules

Each rule in the rules array accepts:

```yaml
action: retry | continue | stop | fail
condition: <boolean expression>
max_attempts: <maximum retry attempts>
backoff: none | constant | linear | exponential | jitter
backoff_base: <base duration in seconds for backoff>
message: <custom message for rule execution>
```

## Authentication Configuration

The `<authentication configuration map>` accepts the keys below:

```yaml
type: none | static | basic | oauth2 | aws-sigv4 | hmac | sequence
expires: <re-authentication interval in seconds>

# Static header authentication
headers: {<map of header name to value>}

# Basic authentication
username: <username>
password: <password>

# OAuth2 authentication
flow: client_credentials | authorization_code | device_code
authentication_url: <OAuth token URL>
authorization_url: <OAuth authorization URL>
device_auth_url: <OAuth device auth URL>
client_id: <OAuth client ID>
client_secret: <OAuth client secret>
scopes: [<array of OAuth scopes>]
redirect_uri: <OAuth redirect URI>

# AWS Signature V4 authentication
aws_service: <AWS service name>
aws_access_key_id: <AWS access key>
aws_secret_access_key: <AWS secret key>
aws_session_token: <AWS session token>
aws_region: <AWS region>
aws_profile: <AWS profile>

# HMAC authentication
algorithm: sha256 | sha512
secret: <HMAC secret key>
signing_string: <template for string to sign>
request_headers: {<map of header name to value template>}
nonce_length: <random nonce length in bytes>

# Sequence authentication (custom calls)
sequence: [<array of authentication calls>]
```

## Iteration Configuration

The `<iteration configuration map>` accepts the keys below:

```yaml
over: <expression that evaluates to an array or queue>
into: <state variable name to store current iteration value>
if: <condition expression to evaluate before iteration>
concurrency: <maximum parallel iterations>
```

## Endpoint Dependencies

The `depends_on` field explicitly declares that an endpoint depends on other endpoints completing first. This is useful for controlling execution order.

```yaml
endpoints:
  # First endpoint: Collects customer IDs
  customers:
    request:
      url: "{state.base_url}/customers"
    response:
      processors:
        - expression: "record.id"
          output: "queue.customer_ids"

  # Second endpoint: Depends on customers endpoint
  customer_orders:
    depends_on: ["customers"]  # Wait for customers to complete first
    iterate:
      over: "queue.customer_ids"
      into: "state.customer_id"
    request:
      url: "{state.base_url}/customers/{state.customer_id}/orders"
```

> 📝 **Note:** When using queues with `iterate.over`, Sling automatically infers dependencies. The `depends_on` field is optional but can make dependencies explicit.

## Stream Overrides

The `overrides` field allows you to configure how the endpoint's data is processed when writing to a destination. This is used during replication to control stream-specific behavior.

### Basic Overrides

Control the replication mode for specific endpoints:

```yaml
endpoints:
  # Full refresh for dimension tables
  customers:
    request:
      url: "{state.base_url}/customers"
    response:
      records:
        jmespath: "data[]"
        primary_key: ["id"]

    overrides:
      mode: full-refresh  # Always replace all data

  # Incremental for fact tables
  transactions:
    request:
      url: "{state.base_url}/transactions"
      parameters:
        updated_since: "{state.last_sync_timestamp}"
    response:
      records:
        jmespath: "data[]"
        primary_key: ["id"]
        update_key: "updated_at"

    overrides:
      mode: incremental  # Only new/updated records. User would have to manually drop/truncate the table.
```

Available modes:

* `full-refresh`: Replace all data (truncate and load)
* `incremental`: Append new records only
* `snapshot`: Create versioned snapshots
* `backfill`: Historical data loading

### Hooks Override

Add post-processing hooks for specific endpoints. This is powerful for merge operations, data cleanup, or custom transformations:

```yaml
endpoints:
  customer_balance_transaction:
    request:
      url: "{state.base_url}/customers/{state.customer_id}/balance_transactions"

    iterate:
      over: "queue.customer_ids"
      into: "state.customer_id"

    response:
      records:
        jmespath: "data[]"
        primary_key: ["id"]

    overrides:
      mode: full-refresh
      hooks:
        post:
          # Check that parent customer data exists
          - type: check
            check: '!is_null(runs["customer"]) && run.total_rows > 0'
            failure_message: no customer records to merge with
            on_failure: break

          # Merge balance transactions into customer table
          - type: query
            id: customer-update-merge
            connection: '{target.name}'
            operation: merge
            on_failure: abort
            params:
              strategy: update
              source_table: '{run.object.full_name}'
              target_table: '{runs["customer"].object.full_name}'
              primary_key: [id]

          # Clean up temporary staging table
          - type: query
            connection: '{target.name}'
            operation: drop_table
            params:
              table: '{run.object.full_name}'
```

**Hook Types Available:**

* `check`: Validate conditions before proceeding
* `query`: Execute SQL operations (merge, drop, etc.)
* `log`: Log messages for debugging
* `http`: Call external APIs
* `command`: Run shell commands

See [Hooks documentation](https://docs.slingdata.io/concepts/hooks) for complete details.

> 💡 **Tip:** Overrides are most useful when extracting large datasets that need special handling during the write phase, or when implementing complex merge/upsert logic.

## State vs. Sync

Understanding the difference between `state` and `sync`:

### State Variables

The `state` field defines variables available during endpoint execution. State is:

* **Temporary**: Exists only during current run
* **Per-endpoint**: Each endpoint has its own state
* **Per-iteration**: Each iteration (if using `iterate`) gets its own state copy

```yaml
endpoints:
  daily_data:
    state:
      start_date: "{date_format(date_add(now(), -1, 'day'), '%Y-%m-%d')}"
      end_date: "{date_format(now(), '%Y-%m-%d')}"
      page_size: 100

    request:
      url: "{state.base_url}/data"
      parameters:
        from: "{state.start_date}"
        to: "{state.end_date}"
        limit: "{state.page_size}"
```

### Sync Variables

The `sync` field lists which state variables should **persist between runs**. This enables incremental data loading:

```yaml
endpoints:
  incremental_data:
    state:
      # Initialize from previous run, or default to 7 days ago
      last_sync_timestamp: >
        {
          coalesce(
            sync.last_sync_timestamp,
            date_format(date_add(now(), -7, 'day'), '%Y-%m-%dT%H:%M:%SZ')
          )
        }

    # Persist this variable for next run
    sync: [last_sync_timestamp]

    request:
      url: "{state.base_url}/data"
      parameters:
        updated_since: "{state.last_sync_timestamp}"

    response:
      processors:
        # Track the maximum timestamp seen
        - expression: "record.updated_at"
          output: "state.last_sync_timestamp"
          aggregation: maximum
```

**Key Differences:**

| Feature         | State                       | Sync                                            |
| --------------- | --------------------------- | ----------------------------------------------- |
| **Scope**       | Current run only            | Persisted between runs                          |
| **Purpose**     | Runtime variables           | Incremental tracking                            |
| **Declaration** | `state: {key: value}`       | `sync: [key]`                                   |
| **Access**      | `state.key`                 | `sync.key` (on load) → `state.key` (during run) |
| **Use Case**    | Configuration, calculations | Timestamps, cursors, offsets                    |

## Context Variables

Context variables are **read-only runtime values** passed from the replication configuration to the API spec. They enable endpoints to support both backfill and incremental modes with a single configuration.

**Available Context Variables:**

| Variable              | Type    | Description                                                                     | Set From                                                 |
| --------------------- | ------- | ------------------------------------------------------------------------------- | -------------------------------------------------------- |
| `context.mode`        | string  | Replication mode                                                                | Replication config `mode` field                          |
| `context.store`       | map     | [Store](https://docs.slingdata.io/concepts/hooks/store) values from replication | Replication `store` variable                             |
| `context.limit`       | integer | Maximum records to fetch                                                        | Replication config `source_options.limit`                |
| `context.range_start` | string  | Backfill range start                                                            | Replication config `source_options.range` (first value)  |
| `context.range_end`   | string  | Backfill range end                                                              | Replication config `source_options.range` (second value) |

**Context vs. State vs. Sync:**

| Feature        | Context            | State       | Sync              |
| -------------- | ------------------ | ----------- | ----------------- |
| **Source**     | Replication config | API spec    | Persisted storage |
| **Scope**      | Current run        | Current run | Between runs      |
| **Modifiable** | No (read-only)     | Yes         | Yes (via state)   |

**Common Pattern: Backfill with Incremental Fallback**

This pattern supports backfill (with range), incremental (with sync state), and first run (with default):

```yaml
endpoints:
  daily_events:
    sync: [last_date]  # Persist for incremental runs

    iterate:
      # Priority: context.range_start → sync.last_date → default
      over: >
        range(
          coalesce(context.range_start, sync.last_date, date_format(date_add(now(), -7, "day"), "%Y-%m-%d")),
          coalesce(context.range_end, date_format(now(), "%Y-%m-%d")),
          "1d"
        )
      into: "state.current_date"

    request:
      url: "{state.base_url}/events/daily/{state.current_date}"

    response:
      records:
        jmespath: "events[]"
        primary_key: ["event_id"]
      processors:
        - expression: "state.current_date"
          output: "state.last_date"
          aggregation: "maximum"
```

**Replication Configs:**

```yaml
# Backfill mode: Process specific date range
source_options:
  range: '2024-01-01,2024-01-31'  # Sets context.range_start and context.range_end

# Incremental mode: Use sync state (no range specified)
# Falls back to sync.last_date from previous run

# Testing mode: Limit records
source_options:
  limit: 100  # Sets context.limit
```

**Other Common Uses:**

```yaml
# Mode-specific behavior
state:
  batch_size: '{if(context.mode == "backfill", 1000, 100)}'

# Limit for testing/development
response:
  records:
    limit: '{coalesce(context.limit, null)}'

# Numeric ID ranges
iterate:
  over: >
    range(
      coalesce(context.range_start, sync.last_id, "1"),
      coalesce(context.range_end, "999999"),
      "1000"
    )
```

> 💡 **Best Practice:** Always use `coalesce()` with context variables to provide fallback values for when they're not set.

## Using Inputs

Inputs are custom configuration values passed from the connection definition to the API spec. Unlike secrets (which are for credentials), inputs are for non-sensitive options like field mappings, account IDs, or feature flags. Inputs are accessed via `{inputs.var_name}`, similar to `secrets` and `env`.

**Defining inputs in env.yaml:**

```yaml
# ~/.sling/env.yaml
connections:
  AIRTABLE:
    type: api
    spec: airtable
    secrets:
      api_key: "patXXXXXXXXXXXXXX"
    inputs:
      last_modified_field_map:
        'My Base Name':
          'My Table Name': 'Updated At'
        'Another Base':
          'Customers': 'Last Modified'
```

**Accessing inputs in your API spec:**

```yaml
# In your API spec
state:
  modified_field: >
    {
      jmespath(
        coalesce(inputs.last_modified_field_map, object()),
        "\"" + state.base_name + "\".\"" + state.table_name + "\""
      )
    }
```

**When to use inputs vs. secrets:**

| Use Case                     | Use `secrets` | Use `inputs` |
| ---------------------------- | ------------- | ------------ |
| API keys, tokens, passwords  | ✅             |              |
| Client IDs/secrets           | ✅             |              |
| Account IDs (non-sensitive)  |               | ✅            |
| Field name mappings          |               | ✅            |
| Feature flags                |               | ✅            |
| Custom configuration options |               | ✅            |

> 📝 **Note:** Inputs are defined by the API spec author. Check the specific API connector documentation to see what inputs are available.

## Queues

Queues allow you to pass data from one endpoint to another in a multi-step workflow:

```yaml
queues:
  - order_ids
  - customer_ids

endpoints:
  list_orders:
    response:
      processors:
        - expression: "record.id"
          output: "queue.order_ids"

  get_order_details:
    iterate:
      over: "queue.order_ids"
      into: "state.current_order_id"
    request:
      url: "{state.base_url}/orders/{state.current_order_id}"
```

For detailed information on queues, see [Queues](https://docs.slingdata.io/concepts/api-specs/queues).

## Sequence of Calls

A sequence is an ordered array of API calls that can be executed in workflows, authentication processes, and lifecycle hooks. Sequences are perfect for multi-step operations like async job workflows, custom authentication flows, or complex setup/teardown processes.

For detailed information on sequences, see [Sequences: Setup and Teardown](https://docs.slingdata.io/concepts/request#sequences-setup-and-teardown).

```yaml
if: <condition expression to evaluate before executing the call>
request: <request configuration map>
pagination: <pagination configuration map>
response: <response configuration map>
```

## Component Relationships

The following diagram shows how the major components relate to each other:

{% @mermaid/diagram content="graph TD
A\[API Spec] --> B\[Authentication]
A --> C\[Defaults]
A --> D\[Endpoints]
A --> E\[Queues]
A --> F\[Dynamic Endpoints]

```
C --> C1[Default State]
C --> C2[Default Request]
C --> C3[Default Pagination]
C --> C4[Default Response]

D --> D1[Endpoint 1]
D --> D2[Endpoint 2]
D --> D3[Endpoint 3]

D1 --> F1[State]
D1 --> F2[Request]
D1 --> F3[Pagination]
D1 --> F4[Response]
D1 --> F5[Iterate]
D1 --> F6[Setup/Teardown]

E -.-> F5
F4 -.-> E

classDef main fill:#4a9eff,stroke:#ffffff,stroke-width:2px,color:#ffffff
classDef section fill:#ff8c42,stroke:#ffffff,stroke-width:2px,color:#ffffff
classDef endpoint fill:#7cb342,stroke:#ffffff,stroke-width:2px,color:#ffffff

class A main
class B,C,D,E,F section
class D1,D2,D3 endpoint" %}
```

## Basic Example

Here's a minimal example showing the essential components:

```yaml
name: "GitHub API"
description: "API for accessing GitHub repositories and issues"

defaults:
  state:
    base_url: "https://api.github.com"
  request:
    headers:
      Accept: "application/vnd.github.v3+json"

endpoints:
  repos:
    description: "List repositories for a user"
    request:
      url: "{state.base_url}/users/{env.GITHUB_USERNAME}/repos"
    response:
      records:
        jmespath: "[*]"
```

## API Specification

Here we have the definitions for the accepted keys.

<table data-full-width="false"><thead><tr><th width="328">API Config Key</th><th>Description</th></tr></thead><tbody><tr><td><code>name</code></td><td>The display name of the API specification.</td></tr><tr><td><code>description</code></td><td>Brief description of what the API does.</td></tr><tr><td><code>queues</code></td><td>Array of queue names for passing data between endpoints.</td></tr><tr><td><code>defaults</code></td><td>Default endpoint configuration applied to all endpoints.</td></tr><tr><td><code>authentication</code></td><td>Authentication configuration for the API. See <a href="authentication">Authentication</a> for details.</td></tr><tr><td><code>endpoints.&#x3C;key></code></td><td>Named endpoints that define API interactions.</td></tr><tr><td><code>dynamic_endpoints</code></td><td>Array of endpoint configurations for dynamic endpoint generation. See <a href="dynamic-endpoints">Dynamic Endpoints</a> for details.</td></tr><tr><td><p><code>endpoints.&#x3C;key>.name</code></p><p>or <code>defaults.name</code></p></td><td>The endpoint name (defaults to the key).</td></tr><tr><td><p><code>endpoints.&#x3C;key>.description</code></p><p>or <code>defaults.description</code></p></td><td>Description of what the endpoint does.</td></tr><tr><td><p><code>endpoints.&#x3C;key>.docs</code></p><p>or <code>defaults.docs</code></p></td><td>URL to endpoint documentation.</td></tr><tr><td><p><code>endpoints.&#x3C;key>.disabled</code></p><p>or <code>defaults.disabled</code></p></td><td>Whether the endpoint is disabled (default: false).</td></tr><tr><td><p><code>endpoints.&#x3C;key>.state</code></p><p>or <code>defaults.state</code></p></td><td>Map of state variables available to the endpoint. See State vs. Sync section above.</td></tr><tr><td><p><code>endpoints.&#x3C;key>.sync</code></p><p>or <code>defaults.sync</code></p></td><td>Array of state variable names to persist between runs. See State vs. Sync section above.</td></tr><tr><td><p><code>endpoints.&#x3C;key>.request</code></p><p>or <code>defaults.request</code></p></td><td>HTTP request configuration. See <a href="https://github.com/slingdata-io/sling-docs/blob/master/concepts/api/requests.md">Requests</a> for details.</td></tr><tr><td><p><code>endpoints.&#x3C;key>.pagination</code></p><p>or <code>defaults.pagination</code></p></td><td>Pagination configuration. See <a href="../advanced#pagination">Pagination</a> for details.</td></tr><tr><td><p><code>endpoints.&#x3C;key>.response</code></p><p>or <code>defaults.response</code></p></td><td>Response processing configuration. See <a href="https://github.com/slingdata-io/sling-docs/blob/master/concepts/api/response-processing.md">Response Processing</a> for details.</td></tr><tr><td><p><code>endpoints.&#x3C;key>.iterate</code></p><p>or <code>defaults.iterate</code></p></td><td>Iteration configuration for looping over data. See <a href="https://github.com/slingdata-io/sling-docs/blob/master/concepts/api/requests.md#iteration-looping-requests">Iteration</a> for details.</td></tr><tr><td><p><code>endpoints.&#x3C;key>.setup</code></p><p>or <code>defaults.setup</code></p></td><td>Array of calls to execute before the main request. See <a href="https://github.com/slingdata-io/sling-docs/blob/master/concepts/api/requests.md#sequences-setup-and-teardown">Sequences</a> for details.</td></tr><tr><td><p><code>endpoints.&#x3C;key>.teardown</code></p><p>or <code>defaults.teardown</code></p></td><td>Array of calls to execute after the main request. See <a href="https://github.com/slingdata-io/sling-docs/blob/master/concepts/api/requests.md#sequences-setup-and-teardown">Sequences</a> for details.</td></tr><tr><td><p><code>endpoints.&#x3C;key>.depends_on</code></p><p>or <code>defaults.depends_on</code></p></td><td>Array of endpoint names this endpoint depends on. See Endpoint Dependencies section above.</td></tr><tr><td><p><code>endpoints.&#x3C;key>.overrides</code></p><p>or <code>defaults.overrides</code></p></td><td>Stream processing overrides for destination writing. See Stream Overrides section above.</td></tr></tbody></table>

> 💡 **Tip:** Start with the basic example and gradually add complexity as needed. Use the defaults section to avoid repetition across endpoints.
