Running Sling

The sling run command is the primary mechanism for executing data movement operations in Sling CLI. It provides a flexible interface for transferring data between various sources and targets, with support for different replication modes and configuration options.

There are 2 primary ways to configure and run sling, using:

CLI Flags: quick ad-hoc runs from your terminal shell or script.
Replication: streams defined in a YAML or JSON file.

Furthermore, you'll find plenty of examples on how to use Sling:

CLI Flags Overview

For quickly running ad-hoc operations from the terminal, using CLI flags is often best. Here are some examples using:

# Load all tables in a schema in with 3 threads
$ export SLING_THREADS=3
$ sling run \
    --src-conn MY_SOURCE_DB \
    --src-stream 'source_schema.*' \
    --tgt-conn MY_TARGET_DB \
    --tgt-object 'target_schema.{stream_table}'
    --mode full-refresh
    
# Pipe in your json file and flatten the nested keys into their own columns
$ cat /tmp/my_file.json | sling run --src-options '{"flatten": "true"}' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh

# Read folder containing many CSV files
$ sling run \
    --src-stream 'file:///tmp/my_csv_folder/' \
    --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' \
    --mode full-refresh

# Load only latest data from one source DB to another.
$ sling run \
    --src-conn MY_SOURCE_DB \
    --src-stream 'source_schema.source_table' \
    --tgt-conn MY_TARGET_DB \
    --tgt-object 'target_schema.target_table' \
    --mode incremental \
    --primary-key 'id' --update-key 'last_modified_dt' 

# Export / Backup database tables to JSON files
$ sling run \
    --src-conn MY_SOURCE_DB \
    --src-stream 'source_schema.source_table' \
    --tgt-conn MY_S3_BUCKET \
    --tgt-object 's3://my-bucket/my_json_folder/' \
    --tgt-options '{"file_max_rows": 100000, "format": "jsonlines"}'

Interface Specifications

CLI Flag

Description

--src-conn

The source database connection (name, conn string or URL).

--tgt-conn

The target database connection (name, conn string or URL).

--src-stream

The source table (schema.table), local / cloud file path. Can also be the path of sql file or in-line text to use as query. Use file:// for local paths.

--tgt-object

The target table (schema.table) or local / cloud file path. Use file:// for local paths. See here for details on runtime variables.

--mode

The target load mode to use: incremental, truncate, full-refresh, backfill or snapshot. Default is full-refresh.

--primary-key

The column(s) to use as primary key (for incremental mode). If composite key, use a comma-delimited string.

--update-key

The column to use as update key (for incremental mode).

--src-options

In-line options to further configure source (JSON or YAML). See here for details.

--tgt-options

In-line options to further configure target (JSON or YAML). See here for details.

--stdout

Output the stream to standard output (STDOUT).

--select

Select or exclude specific columns from the source stream. (comma separated). Use - prefix to exclude.

--transforms

An object/map, or array/list of built-in transforms to apply to records (JSON or YAML).

--columns

An object/map to specify the type that a column should be cast as (JSON or YAML).

--streams

Only run specific streams from a replication (comma separated). See here for details.

Features

Flexible Data Sources: Supports databases, files, cloud storage, and standard input
Multiple Load Modes: Includes full refresh, incremental, snapshot, and truncate modes
Data Transformations: Allows column selection, type casting, and custom transformations
Progress Tracking: Monitors row counts, bytes transferred, and constraint violations
Error Handling: Provides detailed error reporting and validation

The sling run command is designed to be both powerful and flexible, accommodating various data movement scenarios while maintaining ease of use through consistent parameter patterns and comprehensive documentation.

PreviousEnvironment NextGlobal Variables

Last updated 9 days ago