Incremental
Examples of using Sling to incrementally load data from databases to databases
New Data Upsert
This mode performs incremental loading by only processing new/updated records based on an update key. It requires both a primary key and update key.
Full Data Upsert
This mode performs incremental loading by processing the full source dataset and upserting records based on the primary key. No update key is required.
Append Only
This mode performs incremental loading by only appending new records based on an update key, without updating existing records. No primary key is required.
Custom SQL
This mode allows using custom SQL queries with incremental loading by using special variables that Sling will replace at runtime. See here from more details.
Using SLING_STATE
If we wish to store the incremental state externally (and avoid using the max value of the target table), we can use the state feature. We need to provide an environment variable called SLING_STATE
, which is a location where sling will store the respective incremental values. See Global Variables for more details.
Here is an example, where sling will store the incremental values in the my/state
path, in the AWS_S3
connection:
Delete Missing Records (Soft / Hard)
When loading data incrementally, you may want to handle records that exist in the target but are missing from the source. The delete_missing
option supports two modes:
hard
: Physically deletes records from the target table that no longer exist in the sourcesoft
: Marks records as deleted in the target table by setting a deletion timestamp
Careful not to enable this feature on massive tables. The primary key column(s) is fully selected from the source stream each run in order to determine which records don't exist anymore.
Hard Delete Example
Soft Delete Example
When using soft
delete mode, Sling will add a _sling_deleted_at
timestamp column to track when records were marked as deleted.
Important Notes:
The
delete_missing
option requires that you specify aprimary_key
to uniquely identify recordsFor incremental loads, an
update_key
is also required to determine which records to processThe comparison is done using a temporary table to efficiently identify missing records
Last updated