Capture Deletes

Examples of using Sling to capture deleted records during incremental loads

When loading data incrementally, you may want to handle records that exist in the target but are missing from the source. The delete_missing option supports two modes:

  • hard: Physically deletes records from the target table that no longer exist in the source

  • soft: Marks records as deleted in the target table by setting a deletion timestamp

circle-exclamation

Hard Delete

Physically removes records from the target table that no longer exist in the source.

source: MY_POSTGRES
target: MY_SNOWFLAKE

defaults:
  mode: incremental
  update_key: updated_at
  primary_key: id  # primary key is required for delete_missing

streams:
  finance.accounts:
    object: finance.accounts_target
    target_options:
      delete_missing: hard  # will remove records that don't exist in source

Soft Delete

Marks records as deleted in the target table by setting a _sling_deleted_at timestamp column.

When using soft delete mode, Sling will add a _sling_deleted_at timestamp column to track when records were marked as deleted.

Scoped Delete with WHERE Clauses

For large tables with years of historical data, comparing all primary keys can be expensive. You can scope the delete detection to a subset of data using source_where and target_where clauses. This is useful when you only need to detect deletes in recent data (e.g., records from the last 30 days). This feature is added in v1.5.6.

The extended delete_missing configuration accepts:

  • type (required): Either soft or hard

  • where: WHERE clause applied to both source and target (use when both databases share the same SQL syntax)

  • source_where: WHERE clause for the source query (uses source database SQL syntax)

  • target_where: WHERE clause for the target query (uses target database SQL syntax)

circle-info

When using different database types (e.g., SQL Server to PostgreSQL), use source_where and target_where with the appropriate SQL syntax for each database. When source and target use the same SQL dialect, you can use the simpler where option.

Example with single where clause (same database types):

Important Notes

  • The delete_missing option requires that you specify a primary_key to uniquely identify records

  • For incremental loads, an update_key is also required to determine which records to process

  • The comparison is done using a temporary table to efficiently identify missing records

  • When using source_where/target_where, only records matching the WHERE clause are considered for delete detection

  • Schema configuration: Ensure your target database connection has the correct default schema configured. For databases that support schema search paths:

    • PostgreSQL: Set search_path in your connection string (e.g., ?search_path=my_schema) or configure it at the database/role level

    • SQL Server: The default schema is determined by the user's default schema setting

    • Oracle: Set the current_schema or ensure the user has appropriate schema permissions

    This is particularly important when running multiple streams concurrently, as the delete operation uses temporary tables that need to resolve correctly within the target schema.

Last updated

Was this helpful?