Backfill
Backfill mode allows you to load historical data within a specific range based on an update key. This is useful when you need to reload data for a particular time period or range of values.
Basic Backfill
Basic backfill requires:
A primary key to uniquely identify records
An update key to determine the range
A range specification defining the start and end values
Using CLI Flags
$ sling run --src-conn MY_SOURCE_DB \
--src-stream 'my_schema.orders' \
--tgt-conn MY_TARGET_DB \
--tgt-object 'target_schema.orders' \
--mode backfill \
--primary-key order_id \
--update-key order_date \
--range '2023-01-01,2023-12-31'Using Replication Config
source: MY_SOURCE_DB
target: MY_TARGET_DB
defaults:
mode: backfill
primary_key: [id]
object: target_schema.{stream_table}
streams:
my_schema.orders:
update_key: order_date
source_options:
range: '2023-01-01,2023-12-31'
my_schema.transactions:
update_key: id # same as primary key
source_options:
range: '100000,200000'Backfill Chunking
Backfill mode with chunking allows loading historical data in smaller ranges, optimizing for large datasets. See the chunking documentation for details.
Last updated
Was this helpful?