Sling
Slingdata.ioBlogGithubHelp!
  • Introduction
  • Sling CLI
    • Installation
    • Environment
    • Running Sling
    • Global Variables
    • CLI Pro
  • Sling Platform
    • Sling Platform
      • Architecture
      • Agents
      • Connections
      • Editor
      • API
      • Deploy from CLI
  • Concepts
    • Replications
      • Structure
      • Modes
      • Source Options
      • Target Options
      • Columns
      • Transforms
      • Runtime Variables
      • Tags & Wildcards
    • Hooks / Steps
      • Check
      • Command
      • Copy
      • Delete
      • Group
      • Http
      • Inspect
      • List
      • Log
      • Query
      • Replication
      • Store
    • Pipelines
    • Data Quality
      • Constraints
  • Examples
    • File to Database
      • Custom SQL
      • Incremental
    • Database to Database
      • Custom SQL
      • Incremental
      • Backfill
    • Database to File
      • Incremental
  • Connections
    • Database Connections
      • BigTable
      • BigQuery
      • Cloudflare D1
      • Clickhouse
      • DuckDB
      • MotherDuck
      • MariaDB
      • MongoDB
      • Elasticsearch
      • MySQL
      • Oracle
      • Postgres
      • Prometheus
      • Proton
      • Redshift
      • StarRocks
      • SQLite
      • SQL Server
      • Snowflake
      • Trino
    • Storage Connections
      • AWS S3
      • Azure Storage
      • Backblaze B2
      • Cloudflare R2
      • DigitalOcean Spaces
      • FTP
      • Google Storage
      • Local Storage
      • Min.IO
      • SFTP
      • Wasabi
Powered by GitBook
On this page
  • Basic Backfill
  • Chunked Backfill
  • Supported Chunk Types
  • Example Configuration
  • How Chunking Works
  1. Examples
  2. Database to Database

Backfill

Backfill mode allows you to load historical data within a specific range based on an update key. This is useful when you need to reload data for a particular time period or range of values.

Basic Backfill

Basic backfill requires:

  • A primary key to uniquely identify records

  • An update key to determine the range

  • A range specification defining the start and end values

Using CLI Flags

$ sling run --src-conn MY_SOURCE_DB \
  --src-stream 'my_schema.orders' \
  --tgt-conn MY_TARGET_DB \
  --tgt-object 'target_schema.orders' \
  --mode backfill \
  --primary-key order_id \
  --update-key order_date \
  --range '2023-01-01,2023-12-31'

Using Replication Config

replication.yaml
source: MY_SOURCE_DB
target: MY_TARGET_DB

defaults:
  mode: backfill
  primary_key: [id]
  object: target_schema.{stream_table}

streams:
  my_schema.orders:
    update_key: order_date
    source_options:
      range: '2023-01-01,2023-12-31'

  my_schema.transactions:
    update_key: id # same as primary key
    source_options:
      range: '100000,200000'

Chunked Backfill

Supported Chunk Types

We can define the chunk size using the source option chunk_size. It accepts values as shown below.

  1. Time-based chunks:

    • Hours: e.g., 6h

    • Days: e.g., 7d

    • Months: e.g., 1m

  2. Numeric chunks:

    • Integer ranges: e.g., 1000 for chunks of 1000 records

Example Configuration

replication.yaml
source: postgres
target: oracle

defaults:
  mode: backfill
  object: oracle.{stream_table_lower}
  primary_key: [id]

streams:
  public.orders_mariadb_pg:
    update_key: update_dt
    source_options:
      range: '2018-11-01,2018-12-01'
      chunk_size: 10d  # Process in 10-day chunks

  public.orders_sqlserver_pg:
    update_key: date
    source_options:
      range: '2019-01-01,2019-06-01'
      chunk_size: 2m   # Process in 2-month chunks

  public.orders_snowflake_pg:
    update_key: id  # same as primary key
    source_options:
      range: '1,800'
      chunk_size: 200  # Process in chunks of 200 IDs

env:
  SLING_THREADS: 3  # Process 3 streams concurrently

How Chunking Works

Sling automatically splits streams with the specified range into smaller stream ranges, based on the chunk_size.

Each chunk is processed independently, allowing for:

  • Easier recovery if errors occur

  • Progress tracking per chunk

For example, consider the public.orders_snowflake_pg stream below:

public.orders_snowflake_pg:
  primary_key: [id]
  update_key: id     # Same as integer primary key
  source_options:
    range: '1,800'
    chunk_size: 200  # Process in chunks of 200 IDs

Once compiled, Sling will generate the following streams with a numeric suffix (e.g. _001):

public.orders_snowflake_pg_001:
  source_options:
    range: '1,200'

public.orders_snowflake_pg_002:
  source_options:
    range: '201,400'

public.orders_snowflake_pg_003:
  source_options:
    range: '301,600'

public.orders_snowflake_pg_004:
  source_options:
    range: '601,800'

For a time-based example:

public.orders_sqlserver_pg:
  primary_key: [id]
  update_key: date
  source_options:
    range: '2019-01-01,2019-06-01'
    chunk_size: 2m   # Process in 2-month chunks

Will yield:

public.orders_sqlserver_pg_001:
  source_options:
    range: '2019-01-01,2019-03-01'

public.orders_sqlserver_pg_002:
  source_options:
    range: '2019-03-01,2019-05-01'

public.orders_sqlserver_pg_003:
  source_options:
    range: '2019-05-01,2019-06-01'
PreviousIncrementalNextDatabase to File

Last updated 3 months ago

For large datasets, chunked backfill breaks down the data into smaller, manageable pieces. This helps optimize performance and resource usage. Chunking is available with Sling or on a plan. This is especially useful when combined with the for parallel execution.

Parallel processing capabilities (with )

CLI Pro
Platform
SLING_THREADS
SLING_THREADS