# Merge Strategy

## Overview

When using **incremental** or **backfill** modes with a `primary_key`, Sling needs to merge incoming data with existing records in the target table. The **merge strategy** controls exactly how this merge operation is performed. This is available in *v1.5.7*.

By default, Sling uses the optimal merge strategy for each database. However, you can override this behavior using the `merge_strategy` target option to gain more control over how data is merged.

{% hint style="info" %}
**When Does Merge Strategy Apply?**

Merge strategy only applies when:

* Mode is `incremental` or `backfill`
* A `primary_key` is defined for the stream

For `full-refresh`, `truncate`, or `snapshot` modes, data is inserted directly without merging.
{% endhint %}

## Available Strategies

Sling supports four merge strategies:

| Strategy        | Description                           | Behavior                                                                                                         |
| --------------- | ------------------------------------- | ---------------------------------------------------------------------------------------------------------------- |
| `update_insert` | Update existing rows, insert new rows | Standard upsert behavior. Matches on primary key, updates existing records, inserts new ones.                    |
| `delete_insert` | Delete matching rows, then insert all | Removes rows with matching primary keys from target, then inserts all source rows. Safe and reliable.            |
| `insert`        | Insert only, skip existing            | Only inserts new rows. Does not update existing records (use for append-only scenarios).                         |
| `update`        | Update only, skip new                 | Only updates existing rows. Does not insert new records (use when target should only contain pre-existing keys). |

### Strategy Comparison

| Strategy        | Updates Existing        | Inserts New | Performance | Use Case                                           |
| --------------- | ----------------------- | ----------- | ----------- | -------------------------------------------------- |
| `update_insert` | Yes                     | Yes         | Best        | Default upsert behavior                            |
| `delete_insert` | Yes (via delete+insert) | Yes         | Good        | When native MERGE unavailable or for simpler logic |
| `insert`        | No                      | Yes         | Fast        | Append-only, idempotent loads                      |
| `update`        | Yes                     | No          | Fast        | Updating existing records only                     |

## Database Support

Not all databases support all merge strategies due to differences in SQL capabilities.

| Database      | Default         | Supported Strategies                       |
| ------------- | --------------- | ------------------------------------------ |
| PostgreSQL    | `update_insert` | **all**                                    |
| Snowflake     | `update_insert` | **all**                                    |
| BigQuery      | `delete_insert` | **all**                                    |
| SQL Server    | `update_insert` | **all**                                    |
| Oracle        | `update_insert` | **all**                                    |
| Databricks    | `update_insert` | **all**                                    |
| SQLite        | `update_insert` | **all**                                    |
| Exasol        | `update_insert` | **all**                                    |
| Cloudflare D1 | `update_insert` | **all**                                    |
| MotherDuck    | `update_insert` | **all**                                    |
| MySQL         | `delete_insert` | `insert`, `delete_insert`                  |
| MariaDB       | `delete_insert` | `insert`, `delete_insert`                  |
| Redshift      | `delete_insert` | `insert`, `delete_insert`                  |
| ClickHouse    | `delete_insert` | `insert`, `delete_insert`                  |
| DuckDB        | `delete_insert` | `insert`, `update`, `delete_insert`        |
| StarRocks     | `delete_insert` | `insert`, `update_insert`, `delete_insert` |

{% hint style="warning" %}
**Why Some Strategies Are Unavailable**

* **MySQL/MariaDB**: No native `MERGE` or `UPDATE...FROM` syntax
* **Redshift**: No native `MERGE` statement
* **ClickHouse**: Columnar architecture doesn't support row-level updates
* **DuckDB**: `update_insert` requires explicit PRIMARY KEY constraint, which Sling doesn't create by default
  {% endhint %}

## Usage

### In Replication YAML

Set the merge strategy in `target_options`:

```yaml
source: postgres
target: snowflake

defaults:
  mode: incremental
  primary_key: [id]
  target_options:
    merge_strategy: delete_insert  # Override the default

streams:
  public.customers:
    object: analytics.customers

  public.orders:
    object: analytics.orders
    target_options:
      merge_strategy: update  # Only update existing orders, don't insert new ones
```

### Per-Stream Override

Override the strategy for specific streams:

```yaml
source: postgres
target: bigquery

defaults:
  mode: incremental
  primary_key: [id]

streams:
  # Uses BigQuery default (delete_insert)
  public.users:
    object: analytics.users

  # Override to insert-only for audit log
  public.audit_log:
    object: analytics.audit_log
    target_options:
      merge_strategy: insert  # Append only, never update

  # Override to update-only for dimension table
  public.product_catalog:
    object: analytics.product_catalog
    target_options:
      merge_strategy: update  # Only update known products
```

### CLI Usage

Use JSON format for `--tgt-options`:

```bash
sling run \
  --src-conn POSTGRES \
  --src-stream "public.customers" \
  --tgt-conn SNOWFLAKE \
  --tgt-object "analytics.customers" \
  --mode incremental \
  --primary-key "id" \
  --tgt-options '{"merge_strategy": "delete_insert"}'
```

## Strategy Details

### `update_insert` (Upsert)

The most common merge pattern. Uses the database's native `MERGE` statement (or equivalent) to:

1. **Match** rows on primary key
2. **Update** matched rows with new values
3. **Insert** unmatched rows

**SQL Pattern** (varies by database):

```sql
MERGE INTO target_table tgt
USING source_table src
ON (src.id = tgt.id)
WHEN MATCHED THEN UPDATE SET col1 = src.col1, col2 = src.col2
WHEN NOT MATCHED THEN INSERT (id, col1, col2) VALUES (src.id, src.col1, src.col2)
```

**Use when:**

* You need standard upsert behavior
* Target database supports MERGE or equivalent
* Performance is important

### `delete_insert`

A two-step approach that's universally supported:

1. **Delete** all rows in target that have matching primary keys in source
2. **Insert** all rows from source

**SQL Pattern**:

```sql
DELETE FROM target_table
WHERE id IN (SELECT id FROM source_table);

INSERT INTO target_table (id, col1, col2)
SELECT id, col1, col2 FROM source_table;
```

**Use when:**

* Database doesn't support native MERGE
* You want simpler, more predictable behavior
* Debugging merge issues
* Working with databases like MySQL, Redshift, or ClickHouse

### `insert`

Only inserts new rows, ignoring any that already exist:

1. **Insert** rows from source that don't exist in target (based on primary key)

**SQL Pattern** (varies by database):

```sql
INSERT INTO target_table (id, col1, col2)
SELECT id, col1, col2 FROM source_table src
WHERE NOT EXISTS (
  SELECT 1 FROM target_table tgt WHERE tgt.id = src.id
)
```

**Use when:**

* Append-only tables (audit logs, event streams)
* Idempotent data loads where updates shouldn't occur
* First-wins conflict resolution

### `update`

Only updates existing rows, never inserts new ones:

1. **Update** rows in target that have matching primary keys in source
2. **Skip** source rows that don't exist in target

**SQL Pattern**:

```sql
UPDATE target_table tgt
SET col1 = src.col1, col2 = src.col2
FROM source_table src
WHERE src.id = tgt.id
```

**Use when:**

* Target table should be the source of truth for which records exist
* Updating dimension tables with enrichment data
* Preventing accidental record creation

## Customizing Merge Templates

For advanced use cases, you can customize the SQL templates used for merge operations. See [Template Overrides](/concepts/replication/templates.md) for details.

Merge strategy templates are defined in the `core` section:

```yaml
# ~/.sling/templates/postgres.yaml
core:
  merge_update_insert: |
    -- Custom MERGE logic
    WITH updates AS (
      UPDATE {tgt_table} tgt
      SET {set_fields}
      FROM {src_table} src
      WHERE {src_tgt_pk_equal}
      RETURNING tgt.*
    )
    INSERT INTO {tgt_table} ({insert_fields})
    SELECT {src_fields} FROM {src_table} src
    WHERE NOT EXISTS (
      SELECT 1 FROM updates upd WHERE {src_upd_pk_equal}
    )
```

### Available Template Variables

| Variable              | Description                     |
| --------------------- | ------------------------------- |
| `{tgt_table}`         | Target table name               |
| `{src_table}`         | Source (temp) table name        |
| `{src_fields}`        | Source fields with type casting |
| `{insert_fields}`     | Fields for INSERT clause        |
| `{src_insert_fields}` | Source fields for VALUES clause |
| `{set_fields}`        | SET clause for UPDATE           |
| `{src_tgt_pk_equal}`  | Primary key equality condition  |
| `{src_pk_fields}`     | Source primary key fields       |
| `{tgt_pk_fields}`     | Target primary key fields       |

### Disabling a Strategy

To disable a strategy (force users to use a different one), set it to `null`:

```yaml
# ~/.sling/templates/custom_db.yaml
core:
  merge_update_insert: null  # Force use of delete_insert instead
```

{% hint style="info" %}
**Built-in Templates Reference**

View the default merge templates in the [sling-cli repository](https://github.com/slingdata-io/sling-cli/tree/main/core/dbio/templates). Each database file (e.g., `postgres.yaml`, `snowflake.yaml`) shows how merge strategies are implemented.
{% endhint %}

## Best Practices

### 1. Use the Default When Possible

Each database has a default merge strategy optimized for its capabilities. Only override when you have a specific need.

### 2. Consider Performance

* `update_insert`: Usually fastest when database has native MERGE
* `delete_insert`: May be slower for large tables with few updates
* `insert` / `update`: Fastest when you only need one operation

### 3. Handle Edge Cases

| Scenario                     | Recommended Strategy                  |
| ---------------------------- | ------------------------------------- |
| First-time load              | Any (all behave same for empty table) |
| Append-only data             | `insert`                              |
| Full record replacement      | `delete_insert`                       |
| Partial field updates        | `update_insert`                       |
| Cross-database compatibility | `delete_insert`                       |

### 4. Testing

When changing merge strategies:

1. Test with a small dataset first
2. Verify row counts match expectations
3. Check that updates/inserts behave correctly
4. Monitor performance differences

## Troubleshooting

### "merge strategy not supported"

**Error**: `merge strategy 'update_insert' not supported for mysql`

**Cause**: The requested strategy isn't available for the target database.

**Solution**: Use a supported strategy for that database. Check the [Database Support](#database-support) table above.

### Data not updating

**Cause**: Using `insert` strategy when you need `update_insert`.

**Solution**: Change to `update_insert` or `delete_insert`.

### Unexpected row deletions

**Cause**: Using `delete_insert` with partial data loads.

**Solution**: If source only contains some records, consider `update_insert` to avoid deleting non-present records.

### Performance issues

**Cause**: `delete_insert` on large tables with few changes.

**Solution**:

* Use `update_insert` if supported
* Consider partitioning/filtering to reduce scope
* Use `update` if new records shouldn't be inserted


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.slingdata.io/concepts/replication/merge-strategy.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
