Modes
Here are the various loading modes available. All modes load into a new temporary table prior to final load.
full-refresh
This is the default mode. The target table will be dropped and recreated with the source data.
incremental
The source data will be merged or appended into the target table. If the table does not exist, it will be created. See below for more details.
truncate
Similar to full-refresh
, except that the target table is truncated instead of dropped. This keeps any special DDL / GRANT applied.
snapshot
Appends the full dataset with an added timestamp column. If the target table exists, Sling will insert into / append data with a _sling_loaded_at
column. If it does not, the table will be created.
backfill
Similar to incremental
, but takes a range
input to backfill a specific update_key
range, such as dates or numbers.
Incremental Mode Strategies
New Data Upsert (update/insert)
yes
yes
Only new records after max(update_key)
Full Data Upsert (update/insert)
yes
no
Full data
Append Only (insert, no update)
no
yes
Only new records after max(update_key)
Incremental or Backfill Mode With Custom SQL
When using incremental
or backfill
mode, with a custom SQL stream, as well as with a provided update_key
, it is necessary to include the placeholder {incremental_where_cond}
or {incremental_value}
in the SQL text. This will allow Sling to inject the necessary values to apply the watermark in order to get only newer records.
For example, let us assume the custom SQL stream is select * from my_schema.my_table where {incremental_where_cond}
. When Sling executes, if the table does not exist (first-run), the {incremental_where_cond}
value will be 1=1
, and the rendered query will be select * from my_schema.my_table where 1=1
. If the table exists, Sling will pull the max value of the update_key
column (being 2001-01-01 01:01:01
, rendering {incremental_where_cond}
as my_update_key > '2001-01-01 01:01:01'
), and inject the value: select * from my_schema.my_table where my_update_key > '2001-01-01 01:01:01'
.
Furthermore, let us assume the custom SQL stream is select * from my_schema.my_table where my_int_key > coalesce({incremental_value}, 0)
(for a timestamp column, we'd put something like coalesce({incremental_value}, '2001-01-01')
). When Sling executes, if the table does not exist (first-run), the {incremental_value}
value will be null
, and the rendered query will be select * from my_schema.my_table where my_int_key > coalesce(null, 0)
. If the table exists, Sling will pull the max value of the update_key
column (being 99
), and inject the value: select * from my_schema.my_table where my_int_key > coalesce(99, 0)
.
Last updated