Sling
Slingdata.ioBlogGithubHelp!
  • Introduction
  • Sling CLI
    • Installation
    • Environment
    • Running Sling
    • Global Variables
    • CLI Pro
  • Sling Platform
    • Sling Platform
      • Architecture
      • Agents
      • Connections
      • Editor
      • API
      • Deploy from CLI
  • Concepts
    • Replications
      • Structure
      • Modes
      • Source Options
      • Target Options
      • Columns
      • Transforms
      • Runtime Variables
      • Tags & Wildcards
    • Hooks / Steps
      • Check
      • Command
      • Copy
      • Delete
      • Group
      • Http
      • Inspect
      • List
      • Log
      • Query
      • Replication
      • Store
    • Pipelines
    • Data Quality
      • Constraints
  • Examples
    • File to Database
      • Custom SQL
      • Incremental
    • Database to Database
      • Custom SQL
      • Incremental
      • Backfill
    • Database to File
      • Incremental
  • Connections
    • Database Connections
      • BigTable
      • BigQuery
      • Cloudflare D1
      • Clickhouse
      • DuckDB
      • MotherDuck
      • MariaDB
      • MongoDB
      • Elasticsearch
      • MySQL
      • Oracle
      • Postgres
      • Prometheus
      • Proton
      • Redshift
      • StarRocks
      • SQLite
      • SQL Server
      • Snowflake
      • Trino
    • Storage Connections
      • AWS S3
      • Azure Storage
      • Backblaze B2
      • Cloudflare R2
      • DigitalOcean Spaces
      • FTP
      • Google Storage
      • Local Storage
      • Min.IO
      • SFTP
      • Wasabi
Powered by GitBook
On this page
  1. Concepts

Replications

Multiple streams in a YAML or JSON file. Best way to scale Sling.

PreviousDeploy from CLINextStructure

Last updated 3 months ago

Overview

Replications are the best way to use sling in a reusable manner. The defaults key allows reusing your inputs with the ability to override any of them in a particular stream. Both YAML or JSON files are accepted. When you run a replication, internally, Sling auto-generates many tasks (one per stream) and runs them in order.

See these pages for more details:

Here is a basic example, where all PostgreSQL tables in the schema my_schema will be loaded into Snowflake. The my_schema.* notation as the stream name is a feature possible only in Replications. Also notice how defaults.object uses .

replication.yaml
source: MY_POSTGRES
target: MY_SNOWFLAKE

# default config options which apply to all streams
defaults:
  mode: full-refresh
  object: new_schema.{stream_schema}_{stream_table}

streams:
  my_schema.*:

env:
  SLING_THREADS: 3

Another example:

replication.yaml
source: MY_MYSQL
target: MY_BIGQUERY

defaults:
  mode: incremental
  object: '{target_schema}.{stream_schema}_{stream_table}'
  primary_key: [id]
  
  source_options:
    empty_as_null: false
    
  target_options:
    column_casing: snake

streams:
  finance.accounts:
  finance.users:
    disabled: true
  
  finance.departments:
    object: '{target_schema}.finance_departments_old' # overwrite default object
    source_options:
      empty_as_null: false

  finance."Transactions":
    mode: incremental # overwrite default mode
    primary_key: [other_id]
    update_key: last_updated_at
  
  finance.all_users.custom:
    sql: |
      select col1, col2
      from finance."all_Users"
    object: finance.all_users # need to add 'object' key for custom SQL

env:
  # adds the _sling_loaded_at timestamp column
  SLING_LOADED_AT_COLUMN: true 
  
  # if source is file, adds a _sling_stream_url column with file path / url
  SLING_STREAM_URL_COLUMN: true

  # parallel stream runs
  SLING_THREADS: 3

  # retry failing stream runs
  SLING_RETRIES: 1

We can use a replication config with: sling run -r /path/to/replication.yaml

Structure
Modes
Source Options
Target Options
Columns & Constraints
Transformations
Hooks
runtime variables