Replications
Multiple streams in a YAML or JSON file. Best way to scale Sling.
Overview
Replications are the best way to use sling in a reusable manner. The defaults
key allows reusing your inputs with the ability to override any of them in a particular stream. Both YAML or JSON files are accepted. When you run a replication, internally, Sling auto-generates many tasks (one per stream) and runs them in order.
See these pages for more details:
Here is a basic example, where all PostgreSQL tables in the schema my_schema
will be loaded into Snowflake. The my_schema.*
notation as the stream name is a feature possible only in Replications. Also notice how defaults.object
uses runtime variables.
source: MY_POSTGRES
target: MY_SNOWFLAKE
# default config options which apply to all streams
defaults:
mode: full-refresh
object: new_schema.{stream_schema}_{stream_table}
streams:
my_schema.*:
env:
SLING_THREADS: 3
Another example:
source: MY_MYSQL
target: MY_BIGQUERY
defaults:
mode: incremental
object: '{target_schema}.{stream_schema}_{stream_table}'
primary_key: [id]
source_options:
empty_as_null: false
target_options:
column_casing: snake
streams:
finance.accounts:
finance.users:
disabled: true
finance.departments:
object: '{target_schema}.finance_departments_old' # overwrite default object
source_options:
empty_as_null: false
finance."Transactions":
mode: incremental # overwrite default mode
primary_key: [other_id]
update_key: last_updated_at
finance.all_users.custom:
sql: |
select col1, col2
from finance."all_Users"
object: finance.all_users # need to add 'object' key for custom SQL
env:
# adds the _sling_loaded_at timestamp column
SLING_LOADED_AT_COLUMN: true
# if source is file, adds a _sling_stream_url column with file path / url
SLING_STREAM_URL_COLUMN: true
# parallel stream runs
SLING_THREADS: 3
# retry failing stream runs
SLING_RETRIES: 1
We can use a replication config with: sling run -r /path/to/replication.yaml
Last updated