Modes

Here are the various loading modes available. All modes load into a new temporary table prior to final load.

Mode
Description

full-refresh

This is the default mode. The target table will be dropped and recreated with the source data.

incremental

The source data will be merged or appended into the target table. If the table does not exist, it will be created. See below for more details.

truncate

Similar to full-refresh, except that the target table is truncated instead of dropped. This keeps any special DDL / GRANT applied.

snapshot

Appends the full dataset with an added timestamp column. If the target table exists, Sling will insert into / append data with a _sling_loaded_at column. If it does not, the table will be created.

backfill

Similar to incremental, but takes a range input to backfill a specific update_key range, such as dates or numbers.

Incremental Mode Strategies

Load Strategy
Primary Key
Update Key
Stream Strategy

New Data Upsert (update/insert)

yes

yes

Only new records after max(update_key)

Full Data Upsert (update/insert)

yes

no

Full data

Append Only (insert, no update)

no

yes

Only new records after max(update_key)

Incremental or Backfill Mode With Custom SQL

When using incremental or backfill mode, with a custom SQL stream, as well as with a provided update_key, it is necessary to include the placeholder {incremental_where_cond} or {incremental_value} in the SQL text. This will allow Sling to inject the necessary values to apply the watermark in order to get only newer records.

For example, let us assume the custom SQL stream is select * from my_schema.my_table where {incremental_where_cond}. When Sling executes, if the table does not exist (first-run), the {incremental_where_cond} value will be 1=1, and the rendered query will be select * from my_schema.my_table where 1=1. If the table exists, Sling will pull the max value of the update_key column (being 2001-01-01 01:01:01, rendering {incremental_where_cond} as my_update_key > '2001-01-01 01:01:01'), and inject the value: select * from my_schema.my_table where my_update_key > '2001-01-01 01:01:01'.

Furthermore, let us assume the custom SQL stream is select * from my_schema.my_table where my_int_key > coalesce({incremental_value}, 0) (for a timestamp column, we'd put something like coalesce({incremental_value}, '2001-01-01')). When Sling executes, if the table does not exist (first-run), the {incremental_value} value will be null, and the rendered query will be select * from my_schema.my_table where my_int_key > coalesce(null, 0). If the table exists, Sling will pull the max value of the update_key column (being 99), and inject the value: select * from my_schema.my_table where my_int_key > coalesce(99, 0).

Last updated