Structure

This document covers the fundamental structure of a Sling API specification file.

Root Level

At the root level, we have the following keys:

# 'name', 'description' and 'endpoints' keys are required
name: <API display name>
description: <API description>

queues: [<array of queue names>]

defaults: <endpoint configuration map>

authentication: <authentication configuration map>

endpoints:
  <endpoint name>: <endpoint configuration map>

Endpoint Level

The <endpoint name> identifies the API endpoint to interact with. This can be any descriptive name for the endpoint.

The <endpoint configuration map> is a map object which accepts the following keys:

name: <endpoint name>
description: <endpoint description>
docs: <documentation URL>
disabled: true | false

state: {<map of state variables>}
sync: [<array of state variable names to persist>]

request: <request configuration map>
pagination: <pagination configuration map>
response: <response configuration map>

iterate: <iteration configuration map>
setup: [<array of setup calls>]
teardown: [<array of teardown calls>]

depends_on: [<array of upstream endpoint names>]
overrides: <stream processor configuration overrides>

Request Configuration

The <request configuration map> accepts the keys below:

url: <endpoint URL>
method: GET | POST | PUT | PATCH | DELETE | HEAD | OPTIONS | TRACE | CONNECT
timeout: <timeout in seconds>
headers: {<map of header name to value>}
parameters: {<map of parameter name to value>}
payload: <request body data>
rate: <maximum requests per second>
concurrency: <maximum concurrent requests>

Pagination Configuration

The <pagination configuration map> accepts the keys below:

next_state: {<map of state variables to update for next page>}
stop_condition: <expression to determine when to stop paginating>

Response Configuration

The <response configuration map> accepts the keys below:

format: json | csv | xml
records: <records extraction configuration map>
processors: [<array of processor configurations>]
rules: [<array of response rule configurations>]

Records Configuration

The <records extraction configuration map> accepts the keys below:

jmespath: <JMESPath expression to extract records>
primary_key: [<array of column names for primary key>]
update_key: <column name for incremental updates>
limit: <maximum number of records to process>
duplicate_tolerance: <bloom filter settings: "capacity,error_rate">

💡 Primary Key Priority: When using API specs in replications, the primary key defined in the replication stream configuration takes priority over the primary key defined in the API spec. If no primary key is specified in the stream, the primary key from the spec will be used.

Processor Configuration

Each processor in the processors array accepts:

aggregation: none | maximum | minimum | collect | first | last
expression: <transformation expression>
output: <output destination (record field, state variable, queue, environment variable, or store)>
# Examples:
# - record.field_name (add/update field in record)
# - record (replace entire record)
# - state.variable_name (store in state, requires aggregation)
# - queue.queue_name (send to queue)
# - env.VAR_NAME (set environment variable, requires aggregation)
# - context.store.key_name (store in replication store, requires aggregation)

Response Rules

Each rule in the rules array accepts:

action: retry | continue | stop | fail
condition: <boolean expression>
max_attempts: <maximum retry attempts>
backoff: none | constant | linear | exponential | jitter
backoff_base: <base duration in seconds for backoff>
message: <custom message for rule execution>

Authentication Configuration

The <authentication configuration map> accepts the keys below:

type: none | static | basic | oauth2 | aws-sigv4 | hmac | sequence
expires: <re-authentication interval in seconds>

# Static header authentication
headers: {<map of header name to value>}

# Basic authentication
username: <username>
password: <password>

# OAuth2 authentication
flow: client_credentials | authorization_code | device_code
authentication_url: <OAuth token URL>
authorization_url: <OAuth authorization URL>
device_auth_url: <OAuth device auth URL>
client_id: <OAuth client ID>
client_secret: <OAuth client secret>
scopes: [<array of OAuth scopes>]
redirect_uri: <OAuth redirect URI>

# AWS Signature V4 authentication
aws_service: <AWS service name>
aws_access_key_id: <AWS access key>
aws_secret_access_key: <AWS secret key>
aws_session_token: <AWS session token>
aws_region: <AWS region>
aws_profile: <AWS profile>

# HMAC authentication
algorithm: sha256 | sha512
secret: <HMAC secret key>
signing_string: <template for string to sign>
request_headers: {<map of header name to value template>}
nonce_length: <random nonce length in bytes>

# Sequence authentication (custom calls)
sequence: [<array of authentication calls>]

Iteration Configuration

The <iteration configuration map> accepts the keys below:

over: <expression that evaluates to an array or queue>
into: <state variable name to store current iteration value>
if: <condition expression to evaluate before iteration>
concurrency: <maximum parallel iterations>

Endpoint Dependencies

The depends_on field explicitly declares that an endpoint depends on other endpoints completing first. This is useful for controlling execution order.

endpoints:
  # First endpoint: Collects customer IDs
  customers:
    request:
      url: "{state.base_url}/customers"
    response:
      processors:
        - expression: "record.id"
          output: "queue.customer_ids"

  # Second endpoint: Depends on customers endpoint
  customer_orders:
    depends_on: ["customers"]  # Wait for customers to complete first
    iterate:
      over: "queue.customer_ids"
      into: "state.customer_id"
    request:
      url: "{state.base_url}/customers/{state.customer_id}/orders"

📝 Note: When using queues with iterate.over, Sling automatically infers dependencies. The depends_on field is optional but can make dependencies explicit.

Stream Overrides

The overrides field allows you to configure how the endpoint's data is processed when writing to a destination. This is used during replication to control stream-specific behavior.

Basic Overrides

Control the replication mode for specific endpoints:

endpoints:
  # Full refresh for dimension tables
  customers:
    request:
      url: "{state.base_url}/customers"
    response:
      records:
        jmespath: "data[]"
        primary_key: ["id"]

    overrides:
      mode: full-refresh  # Always replace all data

  # Incremental for fact tables
  transactions:
    request:
      url: "{state.base_url}/transactions"
      parameters:
        updated_since: "{state.last_sync_timestamp}"
    response:
      records:
        jmespath: "data[]"
        primary_key: ["id"]
        update_key: "updated_at"

    overrides:
      mode: incremental  # Only new/updated records. User would have to manually drop/truncate the table.

Available modes:

  • full-refresh: Replace all data (truncate and load)

  • incremental: Append new records only

  • snapshot: Create versioned snapshots

  • backfill: Historical data loading

Hooks Override

Add post-processing hooks for specific endpoints. This is powerful for merge operations, data cleanup, or custom transformations:

endpoints:
  customer_balance_transaction:
    request:
      url: "{state.base_url}/customers/{state.customer_id}/balance_transactions"

    iterate:
      over: "queue.customer_ids"
      into: "state.customer_id"

    response:
      records:
        jmespath: "data[]"
        primary_key: ["id"]

    overrides:
      mode: full-refresh
      hooks:
        post:
          # Check that parent customer data exists
          - type: check
            check: '!is_null(runs["customer"]) && run.total_rows > 0'
            failure_message: no customer records to merge with
            on_failure: break

          # Merge balance transactions into customer table
          - type: query
            id: customer-update-merge
            connection: '{target.name}'
            operation: merge
            on_failure: abort
            params:
              strategy: update
              source_table: '{run.object.full_name}'
              target_table: '{runs["customer"].object.full_name}'
              primary_key: [id]

          # Clean up temporary staging table
          - type: query
            connection: '{target.name}'
            operation: drop_table
            params:
              table: '{run.object.full_name}'

Hook Types Available:

  • check: Validate conditions before proceeding

  • query: Execute SQL operations (merge, drop, etc.)

  • log: Log messages for debugging

  • http: Call external APIs

  • command: Run shell commands

See Hooks documentation for complete details.

💡 Tip: Overrides are most useful when extracting large datasets that need special handling during the write phase, or when implementing complex merge/upsert logic.

State vs. Sync

Understanding the difference between state and sync:

State Variables

The state field defines variables available during endpoint execution. State is:

  • Temporary: Exists only during current run

  • Per-endpoint: Each endpoint has its own state

  • Per-iteration: Each iteration (if using iterate) gets its own state copy

endpoints:
  daily_data:
    state:
      start_date: "{date_format(date_add(now(), -1, 'day'), '%Y-%m-%d')}"
      end_date: "{date_format(now(), '%Y-%m-%d')}"
      page_size: 100

    request:
      url: "{state.base_url}/data"
      parameters:
        from: "{state.start_date}"
        to: "{state.end_date}"
        limit: "{state.page_size}"

Sync Variables

The sync field lists which state variables should persist between runs. This enables incremental data loading:

endpoints:
  incremental_data:
    state:
      # Initialize from previous run, or default to 7 days ago
      last_sync_timestamp: >
        {
          coalesce(
            sync.last_sync_timestamp,
            date_format(date_add(now(), -7, 'day'), '%Y-%m-%dT%H:%M:%SZ')
          )
        }

    # Persist this variable for next run
    sync: [last_sync_timestamp]

    request:
      url: "{state.base_url}/data"
      parameters:
        updated_since: "{state.last_sync_timestamp}"

    response:
      processors:
        # Track the maximum timestamp seen
        - expression: "record.updated_at"
          output: "state.last_sync_timestamp"
          aggregation: maximum

Key Differences:

Feature
State
Sync

Scope

Current run only

Persisted between runs

Purpose

Runtime variables

Incremental tracking

Declaration

state: {key: value}

sync: [key]

Access

state.key

sync.key (on load) → state.key (during run)

Use Case

Configuration, calculations

Timestamps, cursors, offsets

Context Variables

Context variables are read-only runtime values passed from the replication configuration to the API spec. They enable endpoints to support both backfill and incremental modes with a single configuration.

Available Context Variables:

Variable
Type
Description
Set From

context.mode

string

Replication mode

Replication config mode field

context.store

map

Store values from replication

Replication store variable

context.limit

integer

Maximum records to fetch

Replication config source_options.limit

context.range_start

string

Backfill range start

Replication config source_options.range (first value)

context.range_end

string

Backfill range end

Replication config source_options.range (second value)

Context vs. State vs. Sync:

Feature
Context
State
Sync

Source

Replication config

API spec

Persisted storage

Scope

Current run

Current run

Between runs

Modifiable

No (read-only)

Yes

Yes (via state)

Common Pattern: Backfill with Incremental Fallback

This pattern supports backfill (with range), incremental (with sync state), and first run (with default):

endpoints:
  daily_events:
    sync: [last_date]  # Persist for incremental runs

    iterate:
      # Priority: context.range_start → sync.last_date → default
      over: >
        range(
          coalesce(context.range_start, sync.last_date, date_format(date_add(now(), -7, "day"), "%Y-%m-%d")),
          coalesce(context.range_end, date_format(now(), "%Y-%m-%d")),
          "1d"
        )
      into: "state.current_date"

    request:
      url: "{state.base_url}/events/daily/{state.current_date}"

    response:
      records:
        jmespath: "events[]"
        primary_key: ["event_id"]
      processors:
        - expression: "state.current_date"
          output: "state.last_date"
          aggregation: "maximum"

Replication Configs:

# Backfill mode: Process specific date range
source_options:
  range: '2024-01-01,2024-01-31'  # Sets context.range_start and context.range_end

# Incremental mode: Use sync state (no range specified)
# Falls back to sync.last_date from previous run

# Testing mode: Limit records
source_options:
  limit: 100  # Sets context.limit

Other Common Uses:

# Mode-specific behavior
state:
  batch_size: '{if(context.mode == "backfill", 1000, 100)}'

# Limit for testing/development
response:
  records:
    limit: '{coalesce(context.limit, null)}'

# Numeric ID ranges
iterate:
  over: >
    range(
      coalesce(context.range_start, sync.last_id, "1"),
      coalesce(context.range_end, "999999"),
      "1000"
    )

💡 Best Practice: Always use coalesce() with context variables to provide fallback values for when they're not set.

Using Inputs

Inputs are custom configuration values passed from the connection definition to the API spec. Unlike secrets (which are for credentials), inputs are for non-sensitive options like field mappings, account IDs, or feature flags. Inputs are accessed via {inputs.var_name}, similar to secrets and env.

Defining inputs in env.yaml:

# ~/.sling/env.yaml
connections:
  AIRTABLE:
    type: api
    spec: airtable
    secrets:
      api_key: "patXXXXXXXXXXXXXX"
    inputs:
      last_modified_field_map:
        'My Base Name':
          'My Table Name': 'Updated At'
        'Another Base':
          'Customers': 'Last Modified'

Accessing inputs in your API spec:

# In your API spec
state:
  modified_field: >
    {
      jmespath(
        coalesce(inputs.last_modified_field_map, object()),
        "\"" + state.base_name + "\".\"" + state.table_name + "\""
      )
    }

When to use inputs vs. secrets:

Use Case

Use secrets

Use inputs

API keys, tokens, passwords

Client IDs/secrets

Account IDs (non-sensitive)

Field name mappings

Feature flags

Custom configuration options

📝 Note: Inputs are defined by the API spec author. Check the specific API connector documentation to see what inputs are available.

Queues

Queues allow you to pass data from one endpoint to another in a multi-step workflow:

queues:
  - order_ids
  - customer_ids

endpoints:
  list_orders:
    response:
      processors:
        - expression: "record.id"
          output: "queue.order_ids"

  get_order_details:
    iterate:
      over: "queue.order_ids"
      into: "state.current_order_id"
    request:
      url: "{state.base_url}/orders/{state.current_order_id}"

For detailed information on queues, see Queues.

Sequence of Calls

A sequence is an ordered array of API calls that can be executed in workflows, authentication processes, and lifecycle hooks. Sequences are perfect for multi-step operations like async job workflows, custom authentication flows, or complex setup/teardown processes.

For detailed information on sequences, see Sequences: Setup and Teardown.

if: <condition expression to evaluate before executing the call>
request: <request configuration map>
pagination: <pagination configuration map>
response: <response configuration map>

Component Relationships

The following diagram shows how the major components relate to each other:

Basic Example

Here's a minimal example showing the essential components:

name: "GitHub API"
description: "API for accessing GitHub repositories and issues"

defaults:
  state:
    base_url: "https://api.github.com"
  request:
    headers:
      Accept: "application/vnd.github.v3+json"

endpoints:
  repos:
    description: "List repositories for a user"
    request:
      url: "{state.base_url}/users/{env.GITHUB_USERNAME}/repos"
    response:
      records:
        jmespath: "[*]"

API Specification

Here we have the definitions for the accepted keys.

API Config Key
Description

name

The display name of the API specification.

description

Brief description of what the API does.

queues

Array of queue names for passing data between endpoints.

defaults

Default endpoint configuration applied to all endpoints.

authentication

Authentication configuration for the API. See Authentication for details.

endpoints.<key>

Named endpoints that define API interactions.

dynamic_endpoints

Array of endpoint configurations for dynamic endpoint generation. See Dynamic Endpoints for details.

endpoints.<key>.name

or defaults.name

The endpoint name (defaults to the key).

endpoints.<key>.description

or defaults.description

Description of what the endpoint does.

endpoints.<key>.docs

or defaults.docs

URL to endpoint documentation.

endpoints.<key>.disabled

or defaults.disabled

Whether the endpoint is disabled (default: false).

endpoints.<key>.state

or defaults.state

Map of state variables available to the endpoint. See State vs. Sync section above.

endpoints.<key>.sync

or defaults.sync

Array of state variable names to persist between runs. See State vs. Sync section above.

endpoints.<key>.request

or defaults.request

HTTP request configuration. See Requests for details.

endpoints.<key>.pagination

or defaults.pagination

Pagination configuration. See Pagination for details.

endpoints.<key>.response

or defaults.response

Response processing configuration. See Response Processing for details.

endpoints.<key>.iterate

or defaults.iterate

Iteration configuration for looping over data. See Iteration for details.

endpoints.<key>.setup

or defaults.setup

Array of calls to execute before the main request. See Sequences for details.

endpoints.<key>.teardown

or defaults.teardown

Array of calls to execute after the main request. See Sequences for details.

endpoints.<key>.depends_on

or defaults.depends_on

Array of endpoint names this endpoint depends on. See Endpoint Dependencies section above.

endpoints.<key>.overrides

or defaults.overrides

Stream processing overrides for destination writing. See Stream Overrides section above.

💡 Tip: Start with the basic example and gradually add complexity as needed. Use the defaults section to avoid repetition across endpoints.

Last updated

Was this helpful?