# SQL Server

Sling supports Change Data Capture from SQL Server by reading the native CDC change tables. SQL Server CDC is query-based: the SQL Server Agent captures row-level changes from the transaction log into system-managed change tables, and Sling reads those tables on each run using `cdc.fn_cdc_get_all_changes_*()`.

For general CDC concepts, the two-phase process, and all available options, see the [Change Capture overview](/concepts/change-capture.md).

## Prerequisites

### 1. Supported Editions

SQL Server CDC is available on **Enterprise**, **Standard** (2016 SP1+), and **Developer** editions. **Express edition does not support CDC.**

For Azure:

* **Azure SQL Database**: CDC is supported on **S3 (Standard) tier or higher**. Basic/S0/S1/S2 tiers are not recommended.
* **Azure SQL Managed Instance**: Full CDC support (same as on-premises).

### 2. Enable CDC on the Database

{% hint style="info" %}
Sling automatically enables CDC on the database and on each captured table if the user has sufficient permissions. The steps below are only needed if you prefer to configure CDC manually or if the Sling user does not have `db_owner` privileges.
{% endhint %}

```sql
USE my_database;
GO
EXEC sys.sp_cdc_enable_db;
GO
```

Verify CDC is enabled:

```sql
SELECT name, is_cdc_enabled FROM sys.databases WHERE name = DB_NAME();
-- is_cdc_enabled should be 1
```

### 3. SQL Server Agent Must Be Running

On-premises SQL Server requires SQL Server Agent to be running. When CDC is enabled, SQL Server creates two Agent jobs:

* **`cdc.<dbname>_capture`** — reads the transaction log and populates change tables.
* **`cdc.<dbname>_cleanup`** — purges expired change data (default retention: 3 days).

If Agent is stopped, changes accumulate in the transaction log but are **not** captured into change tables.

```sql
-- Check if the CDC capture job exists
SELECT * FROM msdb.dbo.cdc_jobs WHERE job_type = 'capture';
```

{% hint style="info" %}
Azure SQL Database does not use SQL Server Agent — it has a built-in CDC scheduler that runs automatically. No action is needed.
{% endhint %}

### 4. User Permissions

The Sling user needs:

* **`db_owner`** role membership on the source database (required to enable CDC on tables and query change tables).

```sql
-- Create a CDC user
CREATE LOGIN sling_user WITH PASSWORD = 'YourStrongPassword!';
USE my_database;
CREATE USER sling_user FOR LOGIN sling_user;
ALTER ROLE db_owner ADD MEMBER sling_user;
```

{% hint style="info" %}
If you enable CDC manually on all tables before running Sling, the user only needs `SELECT` access on both the source tables and the `cdc` schema change tables. The `db_owner` role is only required for the `sp_cdc_enable_db` and `sp_cdc_enable_table` calls that Sling performs automatically.
{% endhint %}

### 5. Enable CDC on Tables (Optional)

Sling automatically enables CDC on each captured table if it is not already enabled. To enable manually:

```sql
EXEC sys.sp_cdc_enable_table
    @source_schema = N'dbo',
    @source_name   = N'my_table',
    @role_name     = NULL,
    @supports_net_changes = 1;
GO
```

Verify table-level CDC:

```sql
-- List all CDC-enabled tables
SELECT name, is_tracked_by_cdc FROM sys.tables WHERE is_tracked_by_cdc = 1;

-- List all capture instances
EXEC sys.sp_cdc_help_change_data_capture;
```

## Quick Start

```bash
# Source SQL Server
sling conns set MY_MSSQL type=sqlserver host=sqlserver.example.com user=sling_user password='YourStrongPassword!' database=my_database

# Target PostgreSQL
sling conns set MY_POSTGRES type=postgres host=pg.example.com user=postgres password=secret database=analytics

# State store (required for CDC)
export SLING_STATE='MY_POSTGRES/sling_state'
```

```yaml
# replication.yaml
source: MY_MSSQL
target: MY_POSTGRES

defaults:
  mode: change-capture
  primary_key: [id]
  object: public.{stream_table}

streams:
  dbo.customers:
  dbo.orders:
```

```bash
# First run: performs the initial snapshot
sling run -r replication.yaml

# Subsequent runs: captures and applies changes
sling run -r replication.yaml
```

## How It Works

Unlike MySQL (binlog streaming) and PostgreSQL (WAL logical replication), SQL Server CDC is **query-based**:

1. The SQL Server Agent continuously reads the transaction log and writes row-level changes into CDC change tables (`cdc.dbo_<table>_CT`).
2. On each run, Sling queries `cdc.fn_cdc_get_all_changes_<capture_instance>()` to read changes between the last saved LSN and the current max LSN.
3. Changes are merged into the target table using the standard CDC merge strategy.
4. The new LSN position is saved in the state store.

Because CDC is query-based, each run reads **all available changes** in the LSN range rather than streaming. This means:

* There is no persistent connection or replication slot (unlike PostgreSQL).
* The CDC Agent must have time to process the transaction log before Sling can see the changes. There is typically a few seconds of latency between a DML commit and the change appearing in the change table.

## Examples

### Large Tables with Custom Chunk Size

For very large tables, adjust the chunk size to control memory usage and checkpointing frequency during the initial snapshot.

```yaml
source: MY_MSSQL
target: MY_POSTGRES

defaults:
  mode: change-capture
  primary_key: [id]
  object: analytics.{stream_table}
  change_capture_options:
    snapshot_chunk_size: 50000  # 50k rows per chunk

streams:
  dbo.transactions:
    # This 10M-row table will be loaded in ~200 chunks
    # If interrupted, it resumes from the last completed chunk
```

### Time-Bounded Snapshots for Very Large Tables

For tables with hundreds of millions of rows, the initial snapshot can take hours. Use `snapshot_run_duration` to cap how long each run spends on the snapshot. The next run automatically resumes from the last completed chunk.

```yaml
source: MY_MSSQL
target: MY_POSTGRES

defaults:
  mode: change-capture
  primary_key: [id]
  object: analytics.{stream_table}
  change_capture_options:
    snapshot_chunk_size: 50000
    snapshot_run_duration: 30m  # spend at most 30 minutes per run on the snapshot

streams:
  dbo.huge_events:
    # 500M rows — will take multiple runs to complete the initial load
    # Each run processes ~30 minutes worth of chunks, then exits cleanly
```

### High-Throughput Workloads

For tables with heavy write activity, increase `run_max_events` and `run_max_duration` so each run captures more changes.

```yaml
source: MY_MSSQL
target: MY_POSTGRES

defaults:
  mode: change-capture
  primary_key: [id]
  object: warehouse.{stream_table}
  change_capture_options:
    run_max_events: 50000  # Process up to 50k events per run
    run_max_duration: 60s  # Wait up to 60 seconds for events

streams:
  dbo.click_events:
  dbo.page_views:
```

### Soft Deletes

Keep deleted rows in the target instead of physically removing them. Deleted rows are marked with `_sling_synced_op = 'D'`. Useful for audit trails or when downstream queries need to detect deletions.

```yaml
source: MY_MSSQL
target: MY_POSTGRES

defaults:
  mode: change-capture
  primary_key: [id]
  object: public.{stream_table}
  change_capture_options:
    soft_delete: true

streams:
  dbo.customers:
  dbo.subscriptions:
```

When a row is deleted in the source, the target row is preserved with `_sling_synced_op` set to `'D'` and `_sling_synced_at` updated to the current timestamp. A subsequent re-insert of the same primary key restores the row with the appropriate operation type.

### Mixed Streams with Per-Stream Overrides

Different tables can have different CDC options.

```yaml
source: MY_MSSQL
target: MY_POSTGRES

defaults:
  mode: change-capture
  primary_key: [id]
  object: public.{stream_table}

streams:
  # High-volume table: larger batches
  dbo.events:
    change_capture_options:
      run_max_events: 100000
      run_max_duration: 2m

  # Audit table: keep soft deletes
  dbo.user_accounts:
    change_capture_options:
      soft_delete: true

  # Standard table: uses defaults
  dbo.products:
```

### Replay / Backfill from a Point in Time

If target data becomes inconsistent, you can replay changes from an earlier position. The `replay_from` value is applied exactly once per unique value.

```yaml
source: MY_MSSQL
target: MY_POSTGRES

defaults:
  mode: change-capture
  primary_key: [id]
  object: public.{stream_table}
  change_capture_options:
    replay_from: "2025-06-01T00:00:00Z"  # Re-process all changes since June 1

streams:
  dbo.orders:
```

After the replay run completes, remove or change the `replay_from` value. Leaving it unchanged has no effect (it is only applied once).

## Replay Formats

The `replay_from` option accepts these SQL Server-specific position formats:

* **RFC 3339 timestamp**: `2025-06-01T00:00:00Z` — resolved to an LSN using `sys.fn_cdc_map_time_to_lsn()`

## CDC Retention

SQL Server's CDC cleanup job purges change data older than the retention period (default: **3 days / 4320 minutes**). Ensure retention is long enough to cover the maximum gap between CDC runs.

```sql
-- Check current retention (in minutes)
SELECT retention FROM msdb.dbo.cdc_jobs WHERE job_type = 'cleanup';

-- Increase retention to 7 days (10080 minutes)
EXEC sys.sp_cdc_change_job
    @job_type = N'cleanup',
    @retention = 10080;
GO

-- Restart the cleanup job for the change to take effect
EXEC sys.sp_cdc_stop_job @job_type = N'cleanup';
EXEC sys.sp_cdc_start_job @job_type = N'cleanup';
```

{% hint style="warning" %}
If CDC data is cleaned up before Sling reads it, the run will fail with a "CDC data cleaned up past checkpoint position" error. Increase retention or run CDC more frequently.
{% endhint %}

## Transaction Log Considerations

CDC prevents the transaction log from being truncated until the capture job has processed all changes. If the CDC Agent falls behind (e.g., Agent is stopped or under heavy load), the transaction log can grow significantly.

Monitor transaction log usage:

```sql
DBCC SQLPERF(LOGSPACE);
```

To tune the capture job throughput:

```sql
EXEC sys.sp_cdc_change_job
    @job_type = N'capture',
    @maxtrans = 1000,       -- max transactions per scan cycle (default: 500)
    @maxscans = 20,          -- max scan cycles per polling interval (default: 10)
    @pollinginterval = 5;    -- seconds between polling cycles (default: 5)
```

## Troubleshooting

### "CDC is not enabled on this database"

Enable CDC on the database. The Sling user must have `db_owner` role:

```sql
USE my_database;
EXEC sys.sp_cdc_enable_db;
```

### "could not enable CDC on \<table>"

The Sling user needs `db_owner` role membership. CDC also requires SQL Server Agent to be running (on-premises) and a supported SQL Server edition.

```sql
ALTER ROLE db_owner ADD MEMBER sling_user;
```

### "sys.fn\_cdc\_get\_max\_lsn() returned NULL"

The SQL Server Agent has not completed its first capture scan. Sling retries automatically for up to 30 seconds, but if the error persists:

1. Verify SQL Server Agent is running.
2. Manually trigger a scan: `EXEC sys.sp_cdc_scan;`
3. Check for errors in the capture job: `SELECT * FROM msdb.dbo.cdc_jobs WHERE job_type = 'capture';`

### "CDC data cleaned up past checkpoint position"

The CDC cleanup job purged change data that Sling hasn't read yet. Increase retention:

```sql
EXEC sys.sp_cdc_change_job @job_type = N'cleanup', @retention = 10080;
EXEC sys.sp_cdc_stop_job @job_type = N'cleanup';
EXEC sys.sp_cdc_start_job @job_type = N'cleanup';
```

### Initial snapshot keeps restarting

Ensure `SLING_STATE` is configured. Without state persistence, Sling cannot track that the snapshot completed and will restart it on every run.

### No CDC capture job found warning

SQL Server Agent may not be running. On Windows, start it from SQL Server Configuration Manager or Services. On Linux (Docker), ensure the `MSSQL_AGENT_ENABLED` environment variable is set:

```bash
docker run -e MSSQL_AGENT_ENABLED=true -e ACCEPT_EULA=Y ...
```

### Schema changes on CDC-enabled tables

When you `ALTER TABLE` on a CDC-enabled table (e.g., adding a column), the existing CDC capture instance continues tracking the original column set. New columns are **not** automatically captured.

To capture the new schema, create a second capture instance (SQL Server allows up to 2 per table) and then disable the old one:

```sql
-- Create new capture instance with updated schema
EXEC sys.sp_cdc_enable_table
    @source_schema = N'dbo',
    @source_name = N'my_table',
    @capture_instance = N'dbo_my_table_v2',
    @role_name = NULL,
    @supports_net_changes = 1;

-- Disable old capture instance
EXEC sys.sp_cdc_disable_table
    @source_schema = N'dbo',
    @source_name = N'my_table',
    @capture_instance = N'dbo_my_table';
```

{% hint style="info" %}
A fresh CDC run after re-enabling the capture instance will use the new schema. If Sling's saved checkpoint references the old capture instance, it will automatically detect the change and start from the new instance's minimum LSN.
{% endhint %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.slingdata.io/concepts/change-capture/sql-server.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
