Change Capture (CDC)

Continuously replicate row-level changes using Change Data Capture (CDC)

Change Data Capture (CDC) continuously replicates row-level changes (inserts, updates, deletes) from a source database to a target database by reading the database's transaction log. Unlike incremental mode which polls for new or updated rows, CDC captures every change as it happens, including deletes.

circle-check

Supported Sources

Source
Transaction Log
Status

Binary log (binlog)

Available

Binary log (binlog)

Available

Write-Ahead Log (WAL)

Coming Soon

Change Tracking / CDC

Coming Soon

How It Works

CDC operates in two phases: an initial load that copies existing data, followed by incremental change capture that streams ongoing changes.

Phase 1: Initial Load

On the first run for a given stream, Sling performs a full table copy from source to target:

  1. Position capture — Sling records the current transaction log position (e.g., binlog file + offset) before the snapshot begins. This ensures no changes are lost between the snapshot and the first incremental run.

  2. Chunked reading — Large tables are automatically split into primary-key-range chunks (configurable via snapshot_chunk_size). Each chunk is read, written, and checkpointed independently.

  3. Resumability — If the process is interrupted (crash, timeout, kill), the next run detects the in-progress snapshot and resumes from the last completed chunk. No data is re-read.

  4. Completion — Once all chunks are written, Sling marks the initial load as complete in the state store.

circle-info

Chunked mode requires an integer-like primary key for range splitting. If the table has no primary key or the PK is non-numeric, Sling falls back to a single-shot full table read automatically.

Phase 2: Incremental Changes

On subsequent runs, Sling reads the source database's transaction log from the last saved position:

  1. Read changes — Reads inserts, updates, and deletes from the transaction log starting at the saved position, up to run_max_events or run_max_duration.

  2. Merge to target — Applies changes to the target table using a merge strategy that handles inserts, updates, and deletes.

  3. Save position — Persists the new log position in the state store so the next run picks up where this one left off.

Each run is bounded and exits after processing its batch. This makes CDC safe to schedule on a recurring interval (e.g., every 30 seconds or every 5 minutes) via cron or the Sling Platform.

What Is a CDC Event?

A CDC event corresponds to a single statement in the transaction log, not a single row. A bulk insert like INSERT INTO t VALUES (...), (...), (...) produces one event containing multiple rows. Similarly, an UPDATE ... WHERE status = 'old' that modifies 1,000 rows is a single event with 1,000 row changes.

This means run_max_events: 10000 does not necessarily equal 10,000 rows — it could represent significantly more rows depending on how the source application writes data. Keep this in mind when tuning run_max_events for high-throughput workloads.

Lifecycle Diagram

Replication Structure

CDC is configured using mode: change-capture in a standard Sling replication file. CDC-specific options go under change_capture_options:

Options set in defaults.change_capture_options apply to all streams. Per-stream change_capture_options override the defaults.

Options Reference

Key
Description

run_max_events

Maximum number of change events to process per run. When this limit is reached, Sling saves the position and exits. Default is 10000.

run_max_duration

Maximum duration per run (e.g., 30s, 10m, 1h). If no events arrive within this window, the run completes with zero changes. Default is 10m.

soft_delete

When true, DELETE events mark the row with _sling_synced_op = 'D' and update _sling_synced_at instead of removing the row. Default is false.

snapshot_start

Where to start reading the transaction log on the very first run. Default is now. Use beginning to read from the earliest available log position.

snapshot_chunk_size

Number of rows per chunk during the initial snapshot. Default is 100000.

snapshot_run_duration

Maximum time to spend on the initial snapshot per run (e.g., 30m, 1h). When the budget is exhausted, Sling exits cleanly after the current chunk and resumes on the next run. Default: no limit.

replay_from

Rewind the CDC position to re-process changes from an earlier point. Accepts source-specific formats (e.g., RFC 3339 timestamp, binlog position, GTID set). Applied once per unique value.

retry_attempts

Number of retry attempts on transient failures. Default is 3.

retry_delay

Delay between retries. Default is 5s.

CDC Metadata Columns

Sling adds three metadata columns to every CDC-managed target table:

Column
Type
Description

_sling_synced_at

timestamptz

Timestamp when the row was last synced.

_sling_synced_op

varchar

The operation type: S (snapshot), I (insert), U (update), D (delete). When soft_delete: true, deleted rows are preserved with _sling_synced_op = 'D'.

_sling_cdc_seq

bigint

Monotonically increasing sequence number for ordering events within and across runs.

State Management

CDC state is stored in the connection specified by the SLING_STATE environment variable. This tracks:

  • The current transaction log position

  • Whether the initial snapshot is complete

  • Checkpoint progress for in-progress snapshots

  • Total rows captured

circle-exclamation

Scheduling

CDC is designed to be run repeatedly on a schedule. Each run processes a bounded batch of changes (controlled by run_max_events and run_max_duration) and exits.

Configure the replication in the Sling Platform UI with a schedule interval. The platform handles orchestration, monitoring, and alerting automatically.

Sling Platform UI

Comparison with Other Modes

Feature

change-capture

incremental

full-refresh

Captures inserts

Yes

Yes

Yes

Captures updates

Yes

Yes (with update_key)

Yes

Captures deletes

Yes

With delete_missing

Yes

Reads from

Transaction log

Table query

Table query

State tracking

Log position

Max update_key value

None

Source load

Minimal (reads log)

Queries table

Full table scan

Initial setup

Automatic snapshot

Manual first load

N/A

Last updated

Was this helpful?