Running Sling

The sling run command is the primary mechanism for executing data movement operations in Sling CLI. It provides a flexible interface for transferring data between various sources and targets, with support for different replication modes and configuration options.

There are 2 primary ways to configure and run sling, using:

  • CLI Flags: quick ad-hoc runs from your terminal shell or script.

  • Replication: streams defined in a YAML or JSON file.


Furthermore, you'll find plenty of examples on how to use Sling:

CLI Flags Overview

For quickly running ad-hoc operations from the terminal, using CLI flags is often best. Here are some examples using:

# Load all tables in a schema in with 3 threads
$ export SLING_THREADS=3
$ sling run \
    --src-conn MY_SOURCE_DB \
    --src-stream 'source_schema.*' \
    --tgt-conn MY_TARGET_DB \
    --tgt-object 'target_schema.{stream_table}'
    --mode full-refresh
    
# Pipe in your json file and flatten the nested keys into their own columns
$ cat /tmp/my_file.json | sling run --src-options '{"flatten": "true"}' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh

# Read folder containing many CSV files
$ sling run \
    --src-stream 'file:///tmp/my_csv_folder/' \
    --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' \
    --mode full-refresh

# Load only latest data from one source DB to another.
$ sling run \
    --src-conn MY_SOURCE_DB \
    --src-stream 'source_schema.source_table' \
    --tgt-conn MY_TARGET_DB \
    --tgt-object 'target_schema.target_table' \
    --mode incremental \
    --primary-key 'id' --update-key 'last_modified_dt' 

# Export / Backup database tables to JSON files
$ sling run \
    --src-conn MY_SOURCE_DB \
    --src-stream 'source_schema.source_table' \
    --tgt-conn MY_S3_BUCKET \
    --tgt-object 's3://my-bucket/my_json_folder/' \
    --tgt-options '{"file_max_rows": 100000, "format": "jsonlines"}'

Interface Specifications

CLI Flag
Description

--src-conn

The source database connection (name, conn string or URL).

--tgt-conn

The target database connection (name, conn string or URL).

--src-stream

The source table (schema.table), local / cloud file path. Can also be the path of sql file or in-line text to use as query. Use file:// for local paths.

--tgt-object

--mode

--primary-key

The column(s) to use as primary key (for incremental mode). If composite key, use a comma-delimited string.

--update-key

The column to use as update key (for incremental mode).

--src-options

--tgt-options

--stdout

Output the stream to standard output (STDOUT).

--select

Select or exclude specific columns from the source stream. (comma separated). Use - prefix to exclude.

--transforms

An object/map, or array/list of built-in transforms to apply to records (JSON or YAML).

--columns

An object/map to specify the type that a column should be cast as (JSON or YAML).

--streams

Features

  • Flexible Data Sources: Supports databases, files, cloud storage, and standard input

  • Multiple Load Modes: Includes full refresh, incremental, snapshot, and truncate modes

  • Data Transformations: Allows column selection, type casting, and custom transformations

  • Progress Tracking: Monitors row counts, bytes transferred, and constraint violations

  • Error Handling: Provides detailed error reporting and validation

The sling run command is designed to be both powerful and flexible, accommodating various data movement scenarios while maintaining ease of use through consistent parameter patterns and comprehensive documentation.

Last updated