Backfill

Backfill mode allows you to load historical data within a specific range based on an update key. This is useful when you need to reload data for a particular time period or range of values.

Basic Backfill

Basic backfill requires:

  • A primary key to uniquely identify records

  • An update key to determine the range

  • A range specification defining the start and end values

Using CLI Flags

$ sling run --src-conn MY_SOURCE_DB \
  --src-stream 'my_schema.orders' \
  --tgt-conn MY_TARGET_DB \
  --tgt-object 'target_schema.orders' \
  --mode backfill \
  --primary-key order_id \
  --update-key order_date \
  --range '2023-01-01,2023-12-31'

Using Replication Config

replication.yaml
source: MY_SOURCE_DB
target: MY_TARGET_DB

defaults:
  mode: backfill
  primary_key: [id]
  object: target_schema.{stream_table}

streams:
  my_schema.orders:
    update_key: order_date
    source_options:
      range: '2023-01-01,2023-12-31'

  my_schema.transactions:
    update_key: id # same as primary key
    source_options:
      range: '100000,200000'

Backfill Chunking

Backfill mode with chunking allows loading historical data in smaller ranges, optimizing for large datasets. See the chunking documentation for details.

Last updated

Was this helpful?