# MySQL / MariaDB

Sling supports Change Data Capture from MySQL and MariaDB by reading the binary log (binlog). Each run reads row-level inserts, updates, and deletes from the binlog and merges them into the target table.

For general CDC concepts, the two-phase process, and all available options, see the [Change Capture overview](https://docs.slingdata.io/concepts/change-capture).

## Prerequisites

Ensure your MySQL source has binary logging enabled with row-based format:

```sql
-- Verify binlog is enabled and using ROW format
SHOW VARIABLES LIKE 'log_bin';          -- Must be ON
SHOW VARIABLES LIKE 'binlog_format';    -- Must be ROW
SHOW VARIABLES LIKE 'binlog_row_image'; -- Should be FULL
```

The MySQL user must have the `REPLICATION SLAVE` and `REPLICATION CLIENT` privileges:

```sql
GRANT REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'sling_user'@'%';
GRANT SELECT ON my_database.* TO 'sling_user'@'%';
```

{% hint style="info" %}
MariaDB uses the same binlog mechanism as MySQL. The same prerequisites apply.
{% endhint %}

## Quick Start

```bash
# Source MySQL
sling conns set MY_MYSQL type=mysql host=mysql.example.com user=sling_user password=secret database=my_database

# Target PostgreSQL
sling conns set MY_POSTGRES type=postgres host=pg.example.com user=postgres password=secret database=analytics

# State store (required for CDC)
export SLING_STATE='MY_POSTGRES/sling_state'
```

```yaml
# replication.yaml
source: MY_MYSQL
target: MY_POSTGRES

defaults:
  mode: change-capture
  primary_key: [id]
  object: public.{stream_table}

streams:
  my_database.customers:
  my_database.orders:
```

```bash
# First run: performs the initial snapshot
sling run -d replication.yaml

# Subsequent runs: captures and applies changes
sling run -d replication.yaml
```

## Examples

### Large Tables with Custom Chunk Size

For very large tables, adjust the chunk size to control memory usage and checkpointing frequency during the initial snapshot.

```yaml
source: MY_MYSQL
target: MY_POSTGRES

defaults:
  mode: change-capture
  primary_key: [id]
  object: analytics.{stream_table}
  change_capture_options:
    snapshot_chunk_size: 50000  # 50k rows per chunk

streams:
  my_database.transactions:
    # This 10M-row table will be loaded in ~200 chunks
    # If interrupted, it resumes from the last completed chunk
```

### Time-Bounded Snapshots for Very Large Tables

For tables with hundreds of millions of rows, the initial snapshot can take hours. Use `snapshot_run_duration` to cap how long each run spends on the snapshot. The next run automatically resumes from the last completed chunk.

```yaml
source: MY_MYSQL
target: MY_POSTGRES

defaults:
  mode: change-capture
  primary_key: [id]
  object: analytics.{stream_table}
  change_capture_options:
    snapshot_chunk_size: 50000
    snapshot_run_duration: 30m  # spend at most 30 minutes per run on the snapshot

streams:
  my_database.huge_events:
    # 500M rows — will take multiple runs to complete the initial load
    # Each run processes ~30 minutes worth of chunks, then exits cleanly
```

### High-Throughput Workloads

For tables with heavy write activity, increase `run_max_events` and `run_max_duration` so each run captures more changes.

```yaml
source: MY_MYSQL
target: MY_POSTGRES

defaults:
  mode: change-capture
  primary_key: [id]
  object: warehouse.{stream_table}
  change_capture_options:
    run_max_events: 50000  # Process up to 50k events per run
    run_max_duration: 60s  # Wait up to 60 seconds for events

streams:
  my_database.click_events:
  my_database.page_views:
```

### Soft Deletes

Keep deleted rows in the target instead of physically removing them. Deleted rows are marked with `_sling_synced_op = 'D'`. Useful for audit trails or when downstream queries need to detect deletions.

```yaml
source: MY_MYSQL
target: MY_POSTGRES

defaults:
  mode: change-capture
  primary_key: [id]
  object: public.{stream_table}
  change_capture_options:
    soft_delete: true

streams:
  my_database.customers:
  my_database.subscriptions:
```

When a row is deleted in MySQL, the target row is preserved with `_sling_synced_op` set to `'D'` and `_sling_synced_at` updated to the current timestamp. A subsequent re-insert of the same primary key restores the row with the appropriate operation type.

### Mixed Streams with Per-Stream Overrides

Different tables can have different CDC options.

```yaml
source: MY_MYSQL
target: MY_POSTGRES

defaults:
  mode: change-capture
  primary_key: [id]
  object: public.{stream_table}

streams:
  # High-volume table: larger batches
  my_database.events:
    change_capture_options:
      run_max_events: 100000
      run_max_duration: 2m

  # Audit table: keep soft deletes
  my_database.user_accounts:
    change_capture_options:
      soft_delete: true

  # Standard table: uses defaults
  my_database.products:
```

### Replay / Backfill from a Point in Time

If target data becomes inconsistent, you can replay changes from an earlier position. The `replay_from` value is applied exactly once per unique value.

```yaml
source: MY_MYSQL
target: MY_POSTGRES

defaults:
  mode: change-capture
  primary_key: [id]
  object: public.{stream_table}
  change_capture_options:
    replay_from: "2025-06-01T00:00:00Z"  # Re-process all changes since June 1

streams:
  my_database.orders:
```

After the replay run completes, remove or change the `replay_from` value. Leaving it unchanged has no effect (it is only applied once).

## Replay Formats

The `replay_from` option accepts these MySQL-specific position formats:

* **RFC 3339 timestamp**: `2025-06-01T00:00:00Z`
* **Binlog file:offset**: `mysql-bin.000003:12345`
* **GTID set**: `cc51b500-0cda-11f1-9e7c-0242ac110002:1-100`

## Binlog Retention

Ensure binlog retention is long enough to cover the maximum gap between CDC runs. If MySQL purges binlogs that Sling hasn't read yet, the run will fail.

```sql
-- Check current retention
SHOW VARIABLES LIKE 'binlog_expire_logs_seconds';
-- or for older MySQL versions:
SHOW VARIABLES LIKE 'expire_logs_days';

-- Set retention to 7 days (MySQL 8.0+)
SET GLOBAL binlog_expire_logs_seconds = 604800;
```

## Troubleshooting

### "CDC not supported for \<type>"

Ensure your source connection is configured as `type=mysql` or `type=mariadb`.

### Initial snapshot keeps restarting

Ensure `SLING_STATE` is configured. Without state persistence, Sling cannot track that the snapshot completed and will restart it on every run.

### Missing changes after snapshot

This should not happen — Sling captures the binlog position before the snapshot begins. If you suspect missing data, check that the binlog retention period is long enough to cover the snapshot duration:

```sql
SHOW VARIABLES LIKE 'binlog_expire_logs_seconds';
-- or for older MySQL versions:
SHOW VARIABLES LIKE 'expire_logs_days';
```

### "could not resolve replay\_from position"

The `replay_from` value must be in one of the formats listed in the Replay Formats section above. Ensure the value is valid and the binlog position still exists on the server.

### Binlog purged before CDC could read

If MySQL purges binlogs that Sling hasn't read yet, the run will fail. Increase binlog retention to be longer than the maximum gap between CDC runs:

```sql
-- Set retention to 7 days (MySQL 8.0+)
SET GLOBAL binlog_expire_logs_seconds = 604800;
```
