Incremental

Examples of incrementally loading data from APIs to databases using sync state

Incremental loading from APIs allows you to fetch only new or updated data since the last run, reducing API calls and improving performance. Sling uses the sync feature to persist state between runs.

Learn more: Incremental Sync " State Variables

How API Incremental Sync Works

API incremental sync uses the sync key to persist values between runs:

  1. Define which state variables to persist using sync: [variable_name]

  2. On first run, use a default value (e.g., 30 days ago)

  3. Track the maximum timestamp/ID in each response using processors

  4. On subsequent runs, use the persisted value from sync.variable_name

The state is stored in the target database by default, or in a location specified by SLING_STATE.

Timestamp-Based Incremental Sync

This is the most common pattern - fetching records updated since the last run.

Spec File (orders_api.yaml)

orders_api.yaml
name: "Orders API"

defaults:
  state:
    base_url: https://api.shop.com/v1
  request:
    headers:
      Authorization: "Bearer {secrets.api_key}"

endpoints:
  orders:
    description: "Get orders updated since last sync"

    # Persist last_updated_at for next run
    sync: [last_updated_at]

    state:
      # Use last sync time if available, otherwise default to 30 days ago
      updated_at_min: >
        {coalesce(
          sync.last_updated_at,
          date_format(date_add(now(), -30, "day"), "%Y-%m-%dT%H:%M:%S%z")
        )}
      limit: 100
      offset: 0

    request:
      url: "{state.base_url}/orders"
      parameters:
        # Filter API to only return records updated after this timestamp
        updated_at_min: "{state.updated_at_min}"
        limit: "{state.limit}"
        offset: "{state.offset}"

    pagination:
      next_state:
        offset: "{state.offset + state.limit}"
      stop_condition: length(response.records) < state.limit

    response:
      records:
        jmespath: "orders[]"
        primary_key: ["order_id"]

      processors:
        # Track the maximum updated_at timestamp across all records
        - expression: "record.updated_at"
          output: "state.last_updated_at"
          aggregation: "maximum"  # Save highest timestamp for next sync

    overrides:
      mode: incremental  # Upsert based on primary_key

Using Replication

Running with Sling: sling run -r /path/to/replication.yaml

On first run, this will fetch orders from the last 30 days. On subsequent runs, it will only fetch orders updated since the last run.


Using Python

ID-Based Incremental Sync

For APIs with monotonically increasing IDs, you can sync based on the highest ID.

Spec File (events_api.yaml)


Using Replication

Date-Based Incremental with Iteration

For APIs that require a date parameter, use iteration with sync state.

Spec File (analytics_api.yaml)


Using Replication

Multiple Sync Variables

You can persist multiple values for complex sync scenarios.

Spec File (multi_sync_api.yaml)

Using SLING_STATE

By default, sync state is stored in the target database. You can store it externally using SLING_STATE.

Using Replication with External State

This creates state files at s3://your-bucket/sling/state/ for each stream.

Incremental with Full Refresh Fallback

You can mix incremental and full-refresh streams based on your needs.

Best Practices

1. Always Provide Sensible Defaults

Use coalesce() to handle the first run gracefully:

2. Use Appropriate Aggregation

Match the aggregation to your sync variable type:

3. Include Primary Keys

Always specify primary keys for proper upserting:

4. Handle Edge Cases

Account for APIs that might return stale data:

5. Use Incremental Mode Override

Set the mode in the spec to ensure consistent behavior:

Combining Incremental with Backfill

You can use both incremental sync and backfill capabilities together.

First: Backfill Historical Data

Then: Switch to Incremental

Last updated

Was this helpful?