Incremental
Examples of incrementally loading data from APIs to databases using sync state
Incremental loading from APIs allows you to fetch only new or updated data since the last run, reducing API calls and improving performance. Sling uses the sync feature to persist state between runs.
Learn more: Incremental Sync " State Variables
How API Incremental Sync Works
API incremental sync uses the sync key to persist values between runs:
Define which state variables to persist using
sync: [variable_name]On first run, use a default value (e.g., 30 days ago)
Track the maximum timestamp/ID in each response using processors
On subsequent runs, use the persisted value from
sync.variable_name
The state is stored in the target database by default, or in a location specified by SLING_STATE.
Timestamp-Based Incremental Sync
This is the most common pattern - fetching records updated since the last run.
Spec File (orders_api.yaml)
name: "Orders API"
defaults:
state:
base_url: https://api.shop.com/v1
request:
headers:
Authorization: "Bearer {secrets.api_key}"
endpoints:
orders:
description: "Get orders updated since last sync"
# Persist last_updated_at for next run
sync: [last_updated_at]
state:
# Use last sync time if available, otherwise default to 30 days ago
updated_at_min: >
{coalesce(
sync.last_updated_at,
date_format(date_add(now(), -30, "day"), "%Y-%m-%dT%H:%M:%S%z")
)}
limit: 100
offset: 0
request:
url: "{state.base_url}/orders"
parameters:
# Filter API to only return records updated after this timestamp
updated_at_min: "{state.updated_at_min}"
limit: "{state.limit}"
offset: "{state.offset}"
pagination:
next_state:
offset: "{state.offset + state.limit}"
stop_condition: length(response.records) < state.limit
response:
records:
jmespath: "orders[]"
primary_key: ["order_id"]
processors:
# Track the maximum updated_at timestamp across all records
- expression: "record.updated_at"
output: "state.last_updated_at"
aggregation: "maximum" # Save highest timestamp for next sync
overrides:
mode: incremental # Upsert based on primary_keyUsing Replication
Running with Sling: sling run -r /path/to/replication.yaml
On first run, this will fetch orders from the last 30 days. On subsequent runs, it will only fetch orders updated since the last run.
Using Python
ID-Based Incremental Sync
For APIs with monotonically increasing IDs, you can sync based on the highest ID.
Spec File (events_api.yaml)
Using Replication
Date-Based Incremental with Iteration
For APIs that require a date parameter, use iteration with sync state.
Spec File (analytics_api.yaml)
Using Replication
Multiple Sync Variables
You can persist multiple values for complex sync scenarios.
Spec File (multi_sync_api.yaml)
Using SLING_STATE
By default, sync state is stored in the target database. You can store it externally using SLING_STATE.
Using Replication with External State
This creates state files at s3://your-bucket/sling/state/ for each stream.
Incremental with Full Refresh Fallback
You can mix incremental and full-refresh streams based on your needs.
Best Practices
1. Always Provide Sensible Defaults
Use coalesce() to handle the first run gracefully:
2. Use Appropriate Aggregation
Match the aggregation to your sync variable type:
3. Include Primary Keys
Always specify primary keys for proper upserting:
4. Handle Edge Cases
Account for APIs that might return stale data:
5. Use Incremental Mode Override
Set the mode in the spec to ensure consistent behavior:
Combining Incremental with Backfill
You can use both incremental sync and backfill capabilities together.
First: Backfill Historical Data
Then: Switch to Incremental
Last updated
Was this helpful?