Incremental
Examples of using Sling to load data from storage systems to databases
Using the File timestamp
The current approach for incrementally loading files into a database is using the file timestamp.
Here is an example replication, incrementally loading files from an S3 bucket into a Postgres database:
Using SLING_STATE
We can also provide a environment variable called SLING_STATE
, which is a location where sling will store the respective incremental values. See Global Variables for more details. This allows you to modify the value directly if you want to change the incremental marker value manually.
Let us assume we have parquet files in the following format, at a daily level:
We can specify the stream string as dumps/orders/{YYYY}/{MM}/{DD}/*.parquet
Backfilling
Best Practice for first load is to backfill the period range of interest. Sling will ingest all data found according to the stream path provided. Upon completion, the incremental date value will be set to the last date in the backfill range. Therefore it is necessary to run backfills in ascending order (e.g 2021-01-01,2021-12-31
, 2022-01-01,2022-12-31
, etc).
Incremental
Run normally. Sling will process the next incremental data value
Last updated