Backfill
Examples of backfilling historical data from APIs to databases using range parameters
How API Backfill Works
Date Range Backfill
name: "Analytics API"
defaults:
state:
base_url: https://api.analytics.com/v1
request:
headers:
Authorization: "Bearer {secrets.api_key}"
endpoints:
daily_events:
description: "Get daily event data for a date range"
# Persist the last processed date for incremental runs
sync: [last_date]
iterate:
# Generate date range using context values from replication config
over: >
range(
coalesce(context.range_start, sync.last_date, date_format(date_add(now(), -7, "day"), "%Y-%m-%d")),
coalesce(context.range_end, date_format(date_add(now(), -1, "day"), "%Y-%m-%d")),
"1d"
)
into: "state.current_date"
concurrency: 5 # Process 5 dates concurrently
state:
# Format the date for the API request
date: '{date_format(state.current_date, "%Y-%m-%d")}'
request:
url: "{state.base_url}/events/daily/{state.date}"
method: GET
response:
records:
jmespath: "data.events[]"
primary_key: ["event_id"]
processors:
# Track the latest date processed for incremental sync
- expression: "state.date"
output: "state.last_date"
aggregation: "maximum"
overrides:
mode: incremental # Use incremental mode for proper upsertingMonth Range Backfill
Numeric ID Range Backfill
Multiple Endpoints with Different Ranges
Open-Ended Range
Best Practices
1. Always Use Sync State
2. Set Appropriate Concurrency
3. Use Incremental Mode Override
4. Handle API Limits Gracefully
5. Provide Reasonable Defaults
Last updated
Was this helpful?