# Merge Strategy

## Overview

When using **incremental** or **backfill** modes with a `primary_key`, Sling needs to merge incoming data with existing records in the target table. The **merge strategy** controls exactly how this merge operation is performed. This is available in *v1.5.7*.

By default, Sling uses the optimal merge strategy for each database. However, you can override this behavior using the `merge_strategy` target option to gain more control over how data is merged.

{% hint style="info" %}
**When Does Merge Strategy Apply?**

Merge strategy only applies when:

* Mode is `incremental` or `backfill`
* A `primary_key` is defined for the stream

For `full-refresh`, `truncate`, or `snapshot` modes, data is inserted directly without merging.
{% endhint %}

## Available Strategies

Sling supports four merge strategies:

| Strategy        | Description                           | Behavior                                                                                                         |
| --------------- | ------------------------------------- | ---------------------------------------------------------------------------------------------------------------- |
| `update_insert` | Update existing rows, insert new rows | Standard upsert behavior. Matches on primary key, updates existing records, inserts new ones.                    |
| `delete_insert` | Delete matching rows, then insert all | Removes rows with matching primary keys from target, then inserts all source rows. Safe and reliable.            |
| `insert`        | Insert only, skip existing            | Only inserts new rows. Does not update existing records (use for append-only scenarios).                         |
| `update`        | Update only, skip new                 | Only updates existing rows. Does not insert new records (use when target should only contain pre-existing keys). |

### Strategy Comparison

| Strategy        | Updates Existing        | Inserts New | Performance | Use Case                                           |
| --------------- | ----------------------- | ----------- | ----------- | -------------------------------------------------- |
| `update_insert` | Yes                     | Yes         | Best        | Default upsert behavior                            |
| `delete_insert` | Yes (via delete+insert) | Yes         | Good        | When native MERGE unavailable or for simpler logic |
| `insert`        | No                      | Yes         | Fast        | Append-only, idempotent loads                      |
| `update`        | Yes                     | No          | Fast        | Updating existing records only                     |

## Database Support

Not all databases support all merge strategies due to differences in SQL capabilities.

| Database      | Default         | Supported Strategies                       |
| ------------- | --------------- | ------------------------------------------ |
| PostgreSQL    | `update_insert` | **all**                                    |
| Snowflake     | `update_insert` | **all**                                    |
| BigQuery      | `delete_insert` | **all**                                    |
| SQL Server    | `update_insert` | **all**                                    |
| Oracle        | `update_insert` | **all**                                    |
| Databricks    | `update_insert` | **all**                                    |
| SQLite        | `update_insert` | **all**                                    |
| Exasol        | `update_insert` | **all**                                    |
| Cloudflare D1 | `update_insert` | **all**                                    |
| MotherDuck    | `update_insert` | **all**                                    |
| MySQL         | `delete_insert` | `insert`, `delete_insert`                  |
| MariaDB       | `delete_insert` | `insert`, `delete_insert`                  |
| Redshift      | `delete_insert` | `insert`, `delete_insert`                  |
| ClickHouse    | `delete_insert` | `insert`, `delete_insert`                  |
| DuckDB        | `delete_insert` | `insert`, `update`, `delete_insert`        |
| StarRocks     | `delete_insert` | `insert`, `update_insert`, `delete_insert` |

{% hint style="warning" %}
**Why Some Strategies Are Unavailable**

* **MySQL/MariaDB**: No native `MERGE` or `UPDATE...FROM` syntax
* **Redshift**: No native `MERGE` statement
* **ClickHouse**: Columnar architecture doesn't support row-level updates
* **DuckDB**: `update_insert` requires explicit PRIMARY KEY constraint, which Sling doesn't create by default
  {% endhint %}

## Usage

### In Replication YAML

Set the merge strategy in `target_options`:

```yaml
source: postgres
target: snowflake

defaults:
  mode: incremental
  primary_key: [id]
  target_options:
    merge_strategy: delete_insert  # Override the default

streams:
  public.customers:
    object: analytics.customers

  public.orders:
    object: analytics.orders
    target_options:
      merge_strategy: update  # Only update existing orders, don't insert new ones
```

### Per-Stream Override

Override the strategy for specific streams:

```yaml
source: postgres
target: bigquery

defaults:
  mode: incremental
  primary_key: [id]

streams:
  # Uses BigQuery default (delete_insert)
  public.users:
    object: analytics.users

  # Override to insert-only for audit log
  public.audit_log:
    object: analytics.audit_log
    target_options:
      merge_strategy: insert  # Append only, never update

  # Override to update-only for dimension table
  public.product_catalog:
    object: analytics.product_catalog
    target_options:
      merge_strategy: update  # Only update known products
```

### CLI Usage

Use JSON format for `--tgt-options`:

```bash
sling run \
  --src-conn POSTGRES \
  --src-stream "public.customers" \
  --tgt-conn SNOWFLAKE \
  --tgt-object "analytics.customers" \
  --mode incremental \
  --primary-key "id" \
  --tgt-options '{"merge_strategy": "delete_insert"}'
```

## Strategy Details

### `update_insert` (Upsert)

The most common merge pattern. Uses the database's native `MERGE` statement (or equivalent) to:

1. **Match** rows on primary key
2. **Update** matched rows with new values
3. **Insert** unmatched rows

**SQL Pattern** (varies by database):

```sql
MERGE INTO target_table tgt
USING source_table src
ON (src.id = tgt.id)
WHEN MATCHED THEN UPDATE SET col1 = src.col1, col2 = src.col2
WHEN NOT MATCHED THEN INSERT (id, col1, col2) VALUES (src.id, src.col1, src.col2)
```

**Use when:**

* You need standard upsert behavior
* Target database supports MERGE or equivalent
* Performance is important

### `delete_insert`

A two-step approach that's universally supported:

1. **Delete** all rows in target that have matching primary keys in source
2. **Insert** all rows from source

**SQL Pattern**:

```sql
DELETE FROM target_table
WHERE id IN (SELECT id FROM source_table);

INSERT INTO target_table (id, col1, col2)
SELECT id, col1, col2 FROM source_table;
```

**Use when:**

* Database doesn't support native MERGE
* You want simpler, more predictable behavior
* Debugging merge issues
* Working with databases like MySQL, Redshift, or ClickHouse

### `insert`

Only inserts new rows, ignoring any that already exist:

1. **Insert** rows from source that don't exist in target (based on primary key)

**SQL Pattern** (varies by database):

```sql
INSERT INTO target_table (id, col1, col2)
SELECT id, col1, col2 FROM source_table src
WHERE NOT EXISTS (
  SELECT 1 FROM target_table tgt WHERE tgt.id = src.id
)
```

**Use when:**

* Append-only tables (audit logs, event streams)
* Idempotent data loads where updates shouldn't occur
* First-wins conflict resolution

### `update`

Only updates existing rows, never inserts new ones:

1. **Update** rows in target that have matching primary keys in source
2. **Skip** source rows that don't exist in target

**SQL Pattern**:

```sql
UPDATE target_table tgt
SET col1 = src.col1, col2 = src.col2
FROM source_table src
WHERE src.id = tgt.id
```

**Use when:**

* Target table should be the source of truth for which records exist
* Updating dimension tables with enrichment data
* Preventing accidental record creation

## Customizing Merge Templates

For advanced use cases, you can customize the SQL templates used for merge operations. See [Template Overrides](https://docs.slingdata.io/concepts/replication/templates) for details.

Merge strategy templates are defined in the `core` section:

```yaml
# ~/.sling/templates/postgres.yaml
core:
  merge_update_insert: |
    -- Custom MERGE logic
    WITH updates AS (
      UPDATE {tgt_table} tgt
      SET {set_fields}
      FROM {src_table} src
      WHERE {src_tgt_pk_equal}
      RETURNING tgt.*
    )
    INSERT INTO {tgt_table} ({insert_fields})
    SELECT {src_fields} FROM {src_table} src
    WHERE NOT EXISTS (
      SELECT 1 FROM updates upd WHERE {src_upd_pk_equal}
    )
```

### Available Template Variables

| Variable              | Description                     |
| --------------------- | ------------------------------- |
| `{tgt_table}`         | Target table name               |
| `{src_table}`         | Source (temp) table name        |
| `{src_fields}`        | Source fields with type casting |
| `{insert_fields}`     | Fields for INSERT clause        |
| `{src_insert_fields}` | Source fields for VALUES clause |
| `{set_fields}`        | SET clause for UPDATE           |
| `{src_tgt_pk_equal}`  | Primary key equality condition  |
| `{src_pk_fields}`     | Source primary key fields       |
| `{tgt_pk_fields}`     | Target primary key fields       |

### Disabling a Strategy

To disable a strategy (force users to use a different one), set it to `null`:

```yaml
# ~/.sling/templates/custom_db.yaml
core:
  merge_update_insert: null  # Force use of delete_insert instead
```

{% hint style="info" %}
**Built-in Templates Reference**

View the default merge templates in the [sling-cli repository](https://github.com/slingdata-io/sling-cli/tree/main/core/dbio/templates). Each database file (e.g., `postgres.yaml`, `snowflake.yaml`) shows how merge strategies are implemented.
{% endhint %}

## Best Practices

### 1. Use the Default When Possible

Each database has a default merge strategy optimized for its capabilities. Only override when you have a specific need.

### 2. Consider Performance

* `update_insert`: Usually fastest when database has native MERGE
* `delete_insert`: May be slower for large tables with few updates
* `insert` / `update`: Fastest when you only need one operation

### 3. Handle Edge Cases

| Scenario                     | Recommended Strategy                  |
| ---------------------------- | ------------------------------------- |
| First-time load              | Any (all behave same for empty table) |
| Append-only data             | `insert`                              |
| Full record replacement      | `delete_insert`                       |
| Partial field updates        | `update_insert`                       |
| Cross-database compatibility | `delete_insert`                       |

### 4. Testing

When changing merge strategies:

1. Test with a small dataset first
2. Verify row counts match expectations
3. Check that updates/inserts behave correctly
4. Monitor performance differences

## Troubleshooting

### "merge strategy not supported"

**Error**: `merge strategy 'update_insert' not supported for mysql`

**Cause**: The requested strategy isn't available for the target database.

**Solution**: Use a supported strategy for that database. Check the [Database Support](#database-support) table above.

### Data not updating

**Cause**: Using `insert` strategy when you need `update_insert`.

**Solution**: Change to `update_insert` or `delete_insert`.

### Unexpected row deletions

**Cause**: Using `delete_insert` with partial data loads.

**Solution**: If source only contains some records, consider `update_insert` to avoid deleting non-present records.

### Performance issues

**Cause**: `delete_insert` on large tables with few changes.

**Solution**:

* Use `update_insert` if supported
* Consider partitioning/filtering to reduce scope
* Use `update` if new records shouldn't be inserted
