Run Replications (Many Tasks)
Replications are the best way to provide configuration files for many streams.
Replications are the best way to provide configuration files for many streams.
Example configuration (located at
/tmp/pg-to-snowflake.yaml
):# We first need to make sure our connections are available in our environment
# See https://docs.slingdata.io/sling-cli/environment for help
source: MY_POSTGRES
target: MY_SNOWFLAKE
# default config options which apply to all streams
defaults:
mode: full-refresh # valid choices: incremental, truncate, full-refresh, snapshot
# specify pattern to use for object naming in target connection, see below for options
object: '{target_schema}.{stream_schema}_{stream_table}'
# source_options: # optional for more advanced options for source connection
# target_options: # optional for more advanced options for target connection
streams:
finance.accounts:
finance.users:
disabled: true
finance.departments:
object: '{target_schema}.finance_departments_old' # overwrite default object
source_options:
empty_as_null: false
finance."Transactions":
mode: incremental # overwrite default mode
primary_key: id
update_key: last_updated_at
Running the replication with Sling CLI:
$ sling run -r /tmp/pg-to-snowflake.yaml
11:04AM INF Sling Replication [3 streams] | MY_POSTGRES -> MY_SNOWFLAKE
11:04AM INF [1 / 3] running stream `finance.accounts`
11:04AM INF connecting to source database (postgres)
11:04AM INF connecting to target database (snowflake)
11:04AM INF reading from source database
11:04AM INF writing to target database [mode: full-refresh]
11:04AM INF streaming data
11:04AM INF dropped table public.finance_accounts
11:04AM INF created table public.finance_accounts
11:04AM INF inserted 16900 rows in 30 sec
11:04AM INF execution succeeded
11:05AM INF [2 / 3] running stream `finance.departments`
11:05AM INF connecting to source database (postgres)
11:05AM INF connecting to target database (snowflake)
11:05AM INF reading from source database
11:05AM INF writing to target database [mode: full-refresh]
11:05AM INF streaming data
11:05AM INF dropped table public.finance_departments_old
11:05AM INF created table public.finance_departments_old
11:05AM INF inserted 18 rows in 1 sec
11:05AM INF execution succeeded
11:05AM INF [3 / 3] running stream `finance."Transactions"`
11:05AM INF connecting to source database (postgres)
11:05AM INF connecting to target database (snowflake)
11:05AM INF getting checkpoint value
11:05AM INF reading from source database
11:05AM INF writing to target database [mode: incremental]
11:05AM INF streaming data
11:08AM INF inserted 1128000 rows in 99 sec
11:08AM INF execution succeeded
We can even deploy the replication YAML files to Sling Cloud using the
sling cloud
commands. This is useful especially when replication configurations need to be version controlled. See here for more details.Mac
Linux
Windows
export SLING_PROJECT=...
export SLING_API_KEY=...
sling cloud deploy /path/to/replication.yaml
export SLING_PROJECT=...
export SLING_API_KEY=...
sling cloud deploy /path/to/replication.yaml
$env:SLING_PROJECT='...'
$env:SLING_API_KEY='...'
sling cloud deploy /path/to/replication.yaml
A powerful logic which allows dynamic naming of the target object. The used parts in the object name will be replaced at runtime with the corresponding values. So you could name your target object
{target_schema}.{stream_schema}_{stream_table}
, and at runtime it will be formatted correctly as depicted below.run_timestamp
: The run timestamp of the tasksource_account
: the name of the account of the source connection (when source conn is AZURE)source_bucket
: the name of the bucket of the source connection (when source conn is GCS or S3)source_container
: the name of the container of the source connection (when source conn is AZURE)source_name
: the name of the source connectionstream_file_folder
: the file parent folder name of the stream (when source is a file system)stream_file_name
: the file name of the stream (when source is a file system)stream_file_path
: the file path of the stream (when source is a file system)stream_name
: the name of the streamstream_schema
: the schema name of the source stream (when source conn is a database)stream_table
: the table name of the source stream (when source conn is a database)target_account
: the name of the account of the target connection (when target conn is AZURE)target_bucket
: the name of the bucket of the target connection (when target conn is GCS or S3)target_container
: the name of the container of the target connection (when target conn is AZURE)target_name
: the name of the target connectiontarget_schema
: the target conn default schema specified in credentials
Last modified 1mo ago