Links

Run Single Task

Let's go over how we can run a single task with Sling.
Sling makes it easy to Extract & Load data from one connection to another.
Some Example Commands
# Pipe in your json file and flatten the nested keys into their own columns
$ cat /tmp/my_file.json | sling run --src-options '{flatten: true}' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh
# Read folder containing many CSV files
$ sling run --src-stream 'file:///tmp/my_csv_folder/' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh
# Sync only latest data from one source DB to another.
$ sling run --src-conn MY_SOURCE_DB --src-stream 'source_schema.source_table' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode incremental --primary-key 'id' --update-key 'last_modified_dt'
# Export / Backup database tables to JSON files
$ sling run --src-conn MY_SOURCE_DB --src-stream 'source_schema.source_table' --tgt-conn MY_S3_BUCKET --tgt-object 's3://my-bucket/my_json_folder/' --tgt-options '{file_max_rows: 100000, format: jsonlines}'
See the Examples section below for more.

Configuration

There are 2 ways to configure a task:
  • Command Line Flags
  • YAML Configuration Text

CLI Flags / YAML Keys

CLI Flag
YAML Config Key
Description
--src-conn
source.conn
The source database / API connection (name, conn string or URL).
--src-stream
source.stream
The source table (schema.table) or local / cloud file path. Can also be the path of sql file or in-line text.
--tgt-object
target.object
The path/url of the target file (local, s3, gc, azure).
--tgt-conn
target.conn
The target database connection (name, conn string or URL).
--mode
mode
The target load mode to use: incremental, truncate, full-refresh or snapshot. Default is full-refresh.
--primary-key
source.primary_key
The column(s) to use as primary key. If composite key, use array.
--update-key
source.update_key
The column to use as update key (for incremental mode).
--src-options
source.options
in-line options to further configure source (JSON or YAML). See below for details.
--tgt-options
target.options
in-line options to further configure target (JSON or YAML). See below for details.
--stdout
stdout
Output the stream to standard output (STDOUT).

Modes

Here the various modes available. All modes load into a new temporary table on tgt-conn prior to final load.
Mode
Description
full-refresh
This is the default mode. The target table will be dropped and recreated with the source data.
incremental
The source data will be merged or appended into the target table. If the table does not exists, it will be created. See below for more details.
truncate
Similar to full-refresh, except that the target table is truncated instead of dropped. This keeps any special DDL / GRANT applied.
snapshot
If the target table exists, Sling will insert into / append data with a _sling_loaded_at column. If it does not, the table will be created.

Incremental Mode Strategies

Load Strategy
Primary Key Provided
Update Key Provided
Stream Strategy
New Data Upsert (update/insert)
yes
yes
Only new records after max(update_key)
Full Data Upsert (update/insert)
yes
no
Full data
Append Only (insert, no update)
no
yes
Only new records after max(update_key)

Advanced Options

Source Options (--src-options flag / source.options key)

Child Key
Type
Description
compression
string
(Only for file source)
The type of compression to use when reading files. Valid inputs are none, auto and gzip, zstd, snappy. Default is auto.
format
string
(Only for file source)
The format of the file(s). Options are: csv, json, jsonlines and xml.
delimiter
string
(Only for file source)
The delimiter to use when parsing tabular files. Default is auto.
header
bool
(Only for file source) Whether to consider the first line as header. Default is true.
flatten
bool
(Only for file source)
Whether to flatten a semi-structure file source format (JSON, XML)
jmespath
string
(Only for file source)
Specify a JMESPath expression to use to filter / extract nested JSON data. See https://jmespath.org/ for more
datetime_format
string
The ISO 8601 date format to use when reading date values. Default is auto
empty_as_null
bool
Whether empty fields should be treated as NULL. Default is true.
null_if
string
Whether this case-sensitive value should be treated as NULL when encountered. Default is NULL.
transforms
array
An array/list of built-in transforms to apply to records. Available values: replace_accents, parse_uuid, trim_space. See here for source code. [BETA]
columns
object
When source column type cannot be inferred (e.g. source is File), you can use this object/map to specify the type that a column should be casted as. It is not necessary to include all columns. Sling with attempt to auto-detect all unspecified column types. The map key is the column name, and the value is the type (string, json, integer, bigint, decimal, datetime or bool).
skip_blank_lines
bool
Whether blank lines should be skipped when encountered. Default is false.
trim_space
bool
Whether white spaces at beginning or end of parsed values should be removed. Default is false.
time_format
string
The ISO 8601 time format to use when reading/writing date values. Default is auto.
sheet
string
(Only for Excel source files) The name of the sheet to use as a data source. Default is the first sheet.
range
string
(Only for Excel source files) The range to use on the sheet used as a data source. Example is B1:H70. Default is all the range with data.

Target Options (--tgt-options flag / target.options key)

Child Key
Type
Description
compression
string
(Only for file target)
The type of compression to use when writing files. Valid inputs are none, auto and gzip, zstd, snappy. Default is auto.
delimiter
string
(Only for file target)
The delimiter to use when writing tabular files. Default is ,.
header
bool
(Only for file target)
Whether to write the first line as header. Default is true.
datetime_format
string
(Only for file target)
The ISO 8601 date format to use when writing date values. Default is auto
file_max_rows
integer
(Only for file target)
The maximum number of rows (usually lines) to write to a file. 0 means infinite number of rows. When a value greater than 0 is specified, the output location will be a folder with many parts in it. Default is 0.
file_max_bytes
integer
(Only for file target)
The maximum number of bytes to write to a file. 0 means infinite number of bytes. When a value greater than 0 is specified, the output location will be a folder with many parts in it. Default is 0.
format
string
(Only for file target)
The format of the file(s). Options are: csv, json and jsonlines.
table_ddl
string
(Only for database target)
The table DDL to use when writing to a database. Default is auto-generated by Sling.
table_tmp
string
(Only for database target)
The temporary table name that should be used when loading into a database. Default is auto-generated by Sling.
pre_sql
string
(Only for database target)
The SQL query to run before loading.
post_sql
string
(Only for database target)
The SQL query to run after loading.
use_bulk
bool
(Only for database target)
Whether to use bulk loading methodology, if available. If false, traditional batch INSERT loading will be used. Default is true.
add_new_columns
bool
(Only for database target)
Whether to add new columns from stream not found in target table (when mode is not full-refresh). Default is true.
adjust_column_type
bool
(Only for database target)
Whether to adjust the column type when needed. Default is false. [BETA]

Examples

Database to Database

Database -> Database (Full Refresh)
# We first need to make sure our connections are available in our environment
# See https://docs.slingdata.io/sling-cli/environment for help
export MY_SOURCE_DB='...'
export MY_TARGET_DB='...'
$ sling conns list
+---------------+------------------+-----------------+
| CONN NAME | CONN TYPE | SOURCE |
+---------------+------------------+-----------------+
| MY_SOURCE_DB | DB - PostgreSQL | env variable |
| MY_TARGET_DB | DB - Snowflake | env variable |
+---------------+------------------+-----------------+
######################## Using CLI Flags ########################
$ sling run --src-conn MY_SOURCE_DB --src-stream 'source_schema.source_table' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh
######################## Using YAML Config ########################
$ cat /path/to/task.yaml
source:
conn: MY_SOURCE_DB
stream: source_schema.source_table
target:
conn: MY_TARGET_DB
object: target_schema.target_table
mode: full-refresh
$ sling run -c /path/to/task.yaml
Database -> Database (Custom SQL)
# We first need to make sure our connections are available in our environment
# See https://docs.slingdata.io/sling-cli/environment for help
export MY_SOURCE_DB='...'
export MY_TARGET_DB='...'
$ sling conns list
+---------------+------------------+-----------------+
| CONN NAME | CONN TYPE | SOURCE |
+---------------+------------------+-----------------+
| MY_SOURCE_DB | DB - PostgreSQL | env variable |
| MY_TARGET_DB | DB - Snowflake | env variable |
+---------------+------------------+-----------------+
######################## Using CLI Flags ########################
$ sling run --src-conn MY_SOURCE_DB --src-stream 'select * from my_schema.my_table where col1 is not null' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh
# we can also read from a SQL file (/path/to/query.sql)
$ sling run --src-conn MY_SOURCE_DB --src-stream /path/to/query.sql --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh
######################## Using YAML Config ########################
$ cat /path/to/task.yaml
source:
conn: MY_SOURCE_DB
stream: |
select *
from my_schema.my_table
where col1 is not null
target:
conn: MY_TARGET_DB
object: target_schema.target_table
mode: full-refresh
$ cat /path/to/task.file.yaml
source:
conn: MY_SOURCE_DB
stream: /path/to/query.sql
target:
conn: MY_TARGET_DB
object: target_schema.target_table
mode: full-refresh
$ sling run -c /path/to/task.yaml
$ sling run -c /path/to/task.file.yaml
Database -> Database (Incremental - New Data Upsert)
# We first need to make sure our connections are available in our environment
# See https://docs.slingdata.io/sling-cli/environment for help
export MY_SOURCE_DB='...'
export MY_TARGET_DB='...'
$ sling conns list
+---------------+------------------+-----------------+
| CONN NAME | CONN TYPE | SOURCE |
+---------------+------------------+-----------------+
| MY_SOURCE_DB | DB - PostgreSQL | env variable |
| MY_TARGET_DB | DB - Snowflake | env variable |
+---------------+------------------+-----------------+
######################## Using CLI Flags ########################
$ sling run --src-conn MY_SOURCE_DB --src-stream 'source_schema.source_table' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode incremental --primary-key 'id' --update-key 'last_modified_dt'
######################## Using YAML Config ########################
$ cat /path/to/task.yaml
source:
conn: MY_SOURCE_DB
stream: source_schema.source_table
primary_key: [id]
update_key: last_modified_dt
target:
conn: MY_TARGET_DB
object: target_schema.target_table
mode: incremental
$ sling run -c /path/to/task.yaml
Database -> Database (Incremental - Full Data Upsert)
# We first need to make sure our connections are available in our environment
# See https://docs.slingdata.io/sling-cli/environment for help
export MY_SOURCE_DB='...'
export MY_TARGET_DB='...'
$ sling conns list
+---------------+------------------+-----------------+
| CONN NAME | CONN TYPE | SOURCE |
+---------------+------------------+-----------------+
| MY_SOURCE_DB | DB - PostgreSQL | env variable |
| MY_TARGET_DB | DB - Snowflake | env variable |
+---------------+------------------+-----------------+
######################## Using CLI Flags ########################
$ sling run --src-conn MY_SOURCE_DB --src-stream 'source_schema.source_table' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode incremental --primary-key 'id'
######################## Using YAML Config ########################
$ cat /path/to/task.yaml
source:
conn: MY_SOURCE_DB
stream: source_schema.source_table
primary_key: [id]
target:
conn: MY_TARGET_DB
object: target_schema.target_table
mode: incremental
$ sling run -c /path/to/task.yaml
Database -> Database (Incremental - Append Only)
# We first need to make sure our connections are available in our environment
# See https://docs.slingdata.io/sling-cli/environment for help
export MY_SOURCE_DB='...'
export MY_TARGET_DB='...'
$ sling conns list
+---------------+------------------+-----------------+
| CONN NAME | CONN TYPE | SOURCE |
+---------------+------------------+-----------------+
| MY_SOURCE_DB | DB - PostgreSQL | env variable |
| MY_TARGET_DB | DB - Snowflake | env variable |
+---------------+------------------+-----------------+
######################## Using CLI Flags ########################
$ sling run --src-conn MY_SOURCE_DB --src-stream 'source_schema.source_table' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode incremental --update-key 'created_dt'
######################## Using YAML Config ########################
$ cat /path/to/task.yaml
source:
conn: MY_SOURCE_DB
stream: source_schema.source_table
update_key: created_dt
target:
conn: MY_TARGET_DB
object: target_schema.target_table
mode: incremental
$ sling run -c /path/to/task.yaml
Database -> Database (Truncate)
# We first need to make sure our connections are available in our environment
# See https://docs.slingdata.io/sling-cli/environment for help
export MY_SOURCE_DB='...'
export MY_TARGET_DB='...'
$ sling conns list
+---------------+------------------+-----------------+
| CONN NAME | CONN TYPE | SOURCE |
+---------------+------------------+-----------------+
| MY_SOURCE_DB | DB - PostgreSQL | env variable |
| MY_TARGET_DB | DB - Snowflake | env variable |
+---------------+------------------+-----------------+
######################## Using CLI Flags ########################
$ sling run --src-conn MY_SOURCE_DB --src-stream 'source_schema.source_table' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode truncate
######################## Using YAML Config ########################
$ cat /path/to/task.yaml
source:
conn: MY_SOURCE_DB
stream: source_schema.source_table
target:
conn: MY_TARGET_DB
object: target_schema.target_table
mode: truncate
$ sling run -c /path/to/task.yaml
Database -> Database (Snapshot)
# We first need to make sure our connections are available in our environment
# See https://docs.slingdata.io/sling-cli/environment for help
export MY_SOURCE_DB='...'
export MY_TARGET_DB='...'
$ sling conns list
+---------------+------------------+-----------------+
| CONN NAME | CONN TYPE | SOURCE |
+---------------+------------------+-----------------+
| MY_SOURCE_DB | DB - PostgreSQL | env variable |
| MY_TARGET_DB | DB - Snowflake | env variable |
+---------------+------------------+-----------------+
######################## Using CLI Flags ########################
$ sling run --src-conn MY_SOURCE_DB --src-stream 'source_schema.source_table' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode snapshot
######################## Using YAML Config ########################
$ cat /path/to/task.yaml
source:
conn: MY_SOURCE_DB
stream: source_schema.source_table
target:
conn: MY_TARGET_DB
object: target_schema.target_table
mode: snapshot
$ sling run -c /path/to/task.yaml

File To Database

Local Storage (CSV + Options) -> Database
# We first need to make sure our connections are available in our environment
# See https://docs.slingdata.io/sling-cli/environment for help
export MY_TARGET_DB='...'
$ sling conns list
+---------------+------------------+-----------------+
| CONN NAME | CONN TYPE | SOURCE |
+---------------+------------------+-----------------+
| MY_TARGET_DB | DB - PostgreSQL | env variable |
+---------------+------------------+-----------------+
######################## Using CLI Flags ########################
$ cat /tmp/my_file.csv | sling run --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh
$ sling run --src-stream 'file:///tmp/my_file.csv' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh
$ sling run --src-stream 'file:///tmp/my_csv_folder/' --src-options '{columns: {col2: string}}' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh
$ sling run --src-stream 'file:///tmp/my_csv_folder/' --src-options '{transforms: [remove_accents]}' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh
######################## Using YAML Config ########################
$ cat /path/to/task.folder.yaml
source:
stream: file:///tmp/my_csv_folder/
options:
transforms: [remove_accents] # Apply transforms. Here we are removing diacritics (accents) from string values.
columns:
col2: string # cast `col2` as string
target:
conn: MY_TARGET_DB
object: target_schema.target_table
mode: full-refresh
$ cat /path/to/task.file.yaml
source:
stream: file:///tmp/my_file.csv
target:
conn: MY_TARGET_DB
object: target_schema.target_table
mode: full-refresh
$ sling run -c /path/to/task.folder.yaml
$ sling run -c /path/to/task.file.yaml
Local Storage (JSON) -> Database
# We first need to make sure our connections are available in our environment
# See https://docs.slingdata.io/sling-cli/environment for help
export MY_TARGET_DB='...'
$ sling conns list
+---------------+------------------+-----------------+
| CONN NAME | CONN TYPE | SOURCE |
+---------------+------------------+-----------------+
| MY_TARGET_DB | DB - PostgreSQL | env variable |
+---------------+------------------+-----------------+
######################## Using CLI Flags ########################
$ cat /tmp/my_file.json | sling run --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh
$ sling run --src-stream 'file:///tmp/my_file.json' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh
$ sling run --src-stream 'file:///tmp/my_json_folder/' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh
######################## Using YAML Config ########################
$ cat /path/to/task.folder.yaml
source:
stream: file:///tmp/my_json_folder/
target:
conn: MY_TARGET_DB
object: target_schema.target_table
mode: full-refresh
$ cat /path/to/task.file.yaml
source:
stream: file:///tmp/my_file.json
target:
conn: MY_TARGET_DB
object: target_schema.target_table
mode: full-refresh
$ sling run -c /path/to/task.folder.yaml
$ sling run -c /path/to/task.file.yaml
Local Storage (JSON Flattened) -> Database
# We first need to make sure our connections are available in our environment
# See https://docs.slingdata.io/sling-cli/environment for help
export MY_TARGET_DB='...'
$ sling conns list
+---------------+------------------+-----------------+
| CONN NAME | CONN TYPE | SOURCE |
+---------------+------------------+-----------------+
| MY_TARGET_DB | DB - PostgreSQL | env variable |
+---------------+------------------+-----------------+
######################## Using CLI Flags ########################
$ cat /tmp/my_file.json | sling run --src-options '{flatten: true}' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh
$ sling run --src-stream 'file:///tmp/my_file.json' --src-options '{flatten: true}' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh
$ sling run --src-stream 'file:///tmp/my_json_folder/' --src-options '{flatten: true}' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh
######################## Using YAML Config ########################
$ cat /path/to/task.folder.yaml
source:
stream: file:///tmp/my_json_folder/
options:
flatten: true
target:
conn: MY_TARGET_DB
object: target_schema.target_table
mode: full-refresh
$ cat /path/to/task.file.yaml
source:
stream: file:///tmp/my_file.json
options:
flatten: true
target:
conn: MY_TARGET_DB
object: target_schema.target_table
mode: full-refresh
$ sling run -c /path/to/task.folder.yaml
$ sling run -c /path/to/task.file.yaml
Local Storage (Parquet) -> Database
# We first need to make sure our connections are available in our environment
# See https://docs.slingdata.io/sling-cli/environment for help
export MY_TARGET_DB='...'
$ sling conns list
+---------------+------------------+-----------------+
| CONN NAME | CONN TYPE | SOURCE |
+---------------+------------------+-----------------+
| MY_TARGET_DB | DB - PostgreSQL | env variable |
+---------------+------------------+-----------------+
######################## Using CLI Flags ########################
$ sling run --src-stream 'file:///tmp/my_file.parquet' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh
$ sling run --src-stream 'file:///tmp/my_parquet_folder/' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh
######################## Using YAML Config ########################
$ cat /path/to/task.folder.yaml
source:
stream: file:///tmp/my_parquet_folder/
target:
conn: MY_TARGET_DB
object: target_schema.target_table
mode: full-refresh
$ cat /path/to/task.file.yaml
source:
stream: file:///tmp/my_file.parquet
target:
conn: MY_TARGET_DB
object: target_schema.target_table
mode: full-refresh
$ sling run -c /path/to/task.folder.yaml
$ sling run -c /path/to/task.file.yaml
Cloud Storage (CSV) -> Database
# We first need to make sure our connections are available in our environment
# See https://docs.slingdata.io/sling-cli/environment for help
export MY_TARGET_DB='...'
$ sling conns list
+----------------+------------------+-----------------+
| CONN NAME | CONN TYPE | SOURCE |
+----------------+------------------+-----------------+
| MY_S3_BUCKET | FileSys - S3 | sling env yaml |
| MY_GS_BUCKET | FileSys - Google | sling env yaml |
| MY_AZURE_CONT | FileSys - Azure | sling env yaml |
| MY_TARGET_DB | DB - PostgreSQL | env variable |
+----------------+------------------+-----------------+
######################## Using CLI Flags ########################
$ sling run --src-conn MY_S3_BUCKET --src-stream 's3://my-bucket/my_csv_folder/' --src-options '{columns: {col2: string}}' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh
$ sling run --src-conn MY_GS_BUCKET --src-stream 'gs://my-bucket/my_csv_folder/' --src-options '{columns: {col2: string}}' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh
$ sling run --src-conn MY_AZURE_CONT --src-stream 'https://my_account.blob.core.windows.net/my-container/my_csv_folder/' --src-options '{columns: {col2: string}}' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh
$ sling run --src-conn MY_S3_BUCKET --src-stream 's3://my-bucket/my_file.csv' --src-options '{columns: {col2: string}}' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh
$ sling run --src-conn MY_GS_BUCKET --src-stream 'gs://my-bucket/my_file.csv' --src-options '{columns: {col2: string}}' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh
$ sling run --src-conn MY_AZURE_CONT --src-stream 'https://my_account.blob.core.windows.net/my-container/my_file.csv' --src-options '{columns: {col2: string}}' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh
######################## Using YAML Config ########################
$ cat /path/to/task.s3.folder.yaml
source:
conn: MY_S3_BUCKET
stream: s3://my-bucket/my_csv_folder/
options:
columns:
col2: string # cast `col2` as string
target:
conn: MY_TARGET_DB
object: target_schema.target_table
mode: full-refresh
$ cat /path/to/task.gs.folder.yaml
source:
conn: MY_GS_BUCKET
stream: gs://my-bucket/my_csv_folder/
options:
columns:
col2: string # cast `col2` as string
target:
conn: MY_TARGET_DB
object: target_schema.target_table
mode: full-refresh
$ cat /path/to/task.azure.folder.yaml
source:
conn: MY_AZURE_CONT
stream: https://my_account.blob.core.windows.net/my-container/my_csv_folder/
options:
columns:
col2: string # cast `col2` as string
target:
conn: MY_TARGET_DB
object: target_schema.target_table
mode: full-refresh
$ cat /path/to/task.s3.file.yaml
source:
conn: MY_S3_BUCKET
stream: s3://my-bucket/my_file.csv
options:
columns:
col2: string # cast `col2` as string
target:
conn: MY_TARGET_DB
object: target_schema.target_table
mode: full-refresh
$ cat /path/to/task.gs.file.yaml
source:
conn: MY_GS_BUCKET
stream: gs://my-bucket/my_file.csv
options:
columns:
col2: string # cast `col2` as string
target:
conn: MY_TARGET_DB
object: target_schema.target_table
mode: full-refresh
$ cat /path/to/task.azure.file.yaml
source:
conn: MY_AZURE_CONT
stream: https://my_account.blob.core.windows.net/my-container/my_file.csv
target:
conn: MY_TARGET_DB
object: target_schema.target_table
mode: full-refresh
$ sling run -c /path/to/task.s3.folder.yaml
$ sling run -c /path/to/task.gs.folder.yaml
$ sling run -c /path/to/task.azure.folder.yaml
$ sling run -c /path/to/task.s3.file.yaml
$ sling run -c /path/to/task.gs.file.yaml
$ sling run -c /path/to/task.azure.file.yaml
Cloud Storage (JSON) -> Database
# We first need to make sure our connections are available in our environment
# See https://docs.slingdata.io/sling-cli/environment for help
export MY_TARGET_DB='...'
$ sling conns list
+----------------+------------------+-----------------+
| CONN NAME | CONN TYPE | SOURCE |
+----------------+------------------+-----------------+
| MY_S3_BUCKET | FileSys - S3 | sling env yaml |
| MY_GS_BUCKET | FileSys - Google | sling env yaml |
| MY_AZURE_CONT | FileSys - Azure | sling env yaml |
| MY_TARGET_DB | DB - PostgreSQL | env variable |
+----------------+------------------+-----------------+
######################## Using CLI Flags ########################
$ sling run --src-conn MY_S3_BUCKET --src-stream 's3://my-bucket/my_json_folder/' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh
$ sling run --src-conn MY_GS_BUCKET --src-stream 'gs://my-bucket/my_json_folder/' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh
$ sling run --src-conn MY_AZURE_CONT --src-stream 'https://my_account.blob.core.windows.net/my-container/my_json_folder/' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh
$ sling run --src-conn MY_S3_BUCKET --src-stream 's3://my-bucket/my_file.json' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh
$ sling run --src-conn MY_GS_BUCKET --src-stream 'gs://my-bucket/my_file.json' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh
$ sling run --src-conn MY_AZURE_CONT --src-stream 'https://my_account.blob.core.windows.net/my-container/my_file.json' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh
######################## Using YAML Config ########################
$ cat /path/to/task.s3.folder.yaml
source:
conn: MY_S3_BUCKET
stream: s3://my-bucket/my_json_folder/
target:
conn: MY_TARGET_DB
object: target_schema.target_table
mode: full-refresh
$ cat /path/to/task.gs.folder.yaml
source:
conn: MY_GS_BUCKET
stream: gs://my-bucket/my_json_folder/
target:
conn: MY_TARGET_DB
object: target_schema.target_table
mode: full-refresh
$ cat /path/to/task.azure.folder.yaml
source:
conn: MY_AZURE_CONT
stream: https://my_account.blob.core.windows.net/my-container/my_json_folder/
target:
conn: MY_TARGET_DB
object: target_schema.target_table
mode: full-refresh
$ cat /path/to/task.s3.file.yaml
source:
conn: MY_S3_BUCKET
stream: s3://my-bucket/my_file.json
target:
conn: MY_TARGET_DB
object: target_schema.target_table
mode: full-refresh
$ cat /path/to/task.gs.file.yaml
source:
conn: MY_GS_BUCKET
stream: gs://my-bucket/my_file.json
target:
conn: MY_TARGET_DB
object: target_schema.target_table
mode: full-refresh
$ cat /path/to/task.azure.file.yaml
source:
conn: MY_AZURE_CONT
stream: https://my_account.blob.core.windows.net/my-container/my_file.csv
target:
conn: MY_TARGET_DB
object: target_schema.target_table
mode: full-refresh
$ sling run -c /path/to/task.s3.folder.yaml
$ sling run -c /path/to/task.gs.folder.yaml
$ sling run -c /path/to/task.azure.folder.yaml
$ sling run -c /path/to/task.s3.file.yaml
$ sling run -c /path/to/task.gs.file.yaml
$ sling run -c /path/to/task.azure.file.yaml
Cloud Storage (JSON Flattened) -> Database
# We first need to make sure our connections are available in our environment
# See https://docs.slingdata.io/sling-cli/environment for help
export MY_TARGET_DB='...'
$ sling conns list
+----------------+------------------+-----------------+
| CONN NAME | CONN TYPE | SOURCE |
+----------------+------------------+-----------------+
| MY_S3_BUCKET | FileSys - S3 | sling env yaml |
| MY_GS_BUCKET | FileSys - Google | sling env yaml |
| MY_AZURE_CONT | FileSys - Azure | sling env yaml |
| MY_TARGET_DB | DB - PostgreSQL | env variable |
+----------------+------------------+-----------------+
######################## Using CLI Flags ########################
$ sling run --src-conn MY_S3_BUCKET --src-options '{flatten: true}' --src-stream 's3://my-bucket/my_json_folder/' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh
$ sling run --src-conn MY_GS_BUCKET --src-options '{flatten: true}' --src-stream 'gs://my-bucket/my_json_folder/' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh
$ sling run --src-conn MY_AZURE_CONT --src-options '{flatten: true}' --src-stream 'https://my_account.blob.core.windows.net/my-container/my_json_folder/' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh
$ sling run --src-conn MY_S3_BUCKET --src-options '{flatten: true}' --src-stream 's3://my-bucket/my_file.json' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh
$ sling run --src-conn MY_GS_BUCKET --src-options '{flatten: true}' --src-stream 'gs://my-bucket/my_file.json' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh
$ sling run --src-conn MY_AZURE_CONT --src-options '{flatten: true}' --src-stream 'https://my_account.blob.core.windows.net/my-container/my_file.json' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh
######################## Using YAML Config ########################
$ cat /path/to/task.s3.folder.yaml
source:
conn: MY_S3_BUCKET
stream: s3://my-bucket/my_json_folder/
options:
flatten: true
target:
conn: MY_TARGET_DB
object: target_schema.target_table
mode: full-refresh
$ cat /path/to/task.gs.folder.yaml
source:
conn: MY_GS_BUCKET
stream: gs://my-bucket/my_json_folder/
options:
flatten: true
target:
conn: MY_TARGET_DB
object: target_schema.target_table
mode: full-refresh
$ cat /path/to/task.azure.folder.yaml
source:
conn: MY_AZURE_CONT
stream: https://my_account.blob.core.windows.net/my-container/my_json_folder/
options:
flatten: true
target:
conn: MY_TARGET_DB
object: target_schema.target_table
mode: full-refresh
$ cat /path/to/task.s3.file.yaml
source:
conn: MY_S3_BUCKET
stream: s3://my-bucket/my_file.json
options:
flatten: true
target:
conn: MY_TARGET_DB
object: target_schema.target_table
mode: full-refresh
$ cat /path/to/task.gs.file.yaml
source:
conn: MY_GS_BUCKET
stream: gs://my-bucket/my_file.json
options:
flatten: true
target:
conn: MY_TARGET_DB
object: target_schema.target_table
mode: full-refresh
$ cat /path/to/task.azure.file.yaml
source:
conn: MY_AZURE_CONT
stream: https://my_account.blob.core.windows.net/my-container/my_file.json
options:
flatten: true
target:
conn: MY_TARGET_DB
object: target_schema.target_table
mode: full-refresh
$ sling run -c /path/to/task.s3.folder.yaml
$ sling run -c /path/to/task.gs.folder.yaml
$ sling run -c /path/to/task.azure.folder.yaml
$ sling run -c /path/to/task.s3.file.yaml
$ sling run -c /path/to/task.gs.file.yaml
$ sling run -c /path/to/task.azure.file.yaml
Cloud Storage (Parquet) -> Database
# We first need to make sure our connections are available in our environment
# See https://docs.slingdata.io/sling-cli/environment for help
export MY_TARGET_DB='...'
$ sling conns list
+----------------+------------------+-----------------+
| CONN NAME | CONN TYPE | SOURCE |
+----------------+------------------+-----------------+
| MY_S3_BUCKET | FileSys - S3 | sling env yaml |
| MY_GS_BUCKET | FileSys - Google | sling env yaml |
| MY_AZURE_CONT | FileSys - Azure | sling env yaml |
| MY_TARGET_DB | DB - PostgreSQL | env variable |
+----------------+------------------+-----------------+
######################## Using CLI Flags ########################
$ sling run --src-conn MY_S3_BUCKET --src-stream 's3://my-bucket/my_parquet_folder/' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh
$ sling run --src-conn MY_GS_BUCKET --src-stream 'gs://my-bucket/my_parquet_folder/' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh
$ sling run --src-conn MY_AZURE_CONT --src-stream 'https://my_account.blob.core.windows.net/my-container/my_parquet_folder/' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh
$ sling run --src-conn MY_S3_BUCKET --src-stream 's3://my-bucket/my_file.parquet' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh
$ sling run --src-conn MY_GS_BUCKET --src-stream 'gs://my-bucket/my_file.parquet' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh
$ sling run --src-conn MY_AZURE_CONT --src-stream 'https://my_account.blob.core.windows.net/my-container/my_file.parquet' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh
######################## Using YAML Config ########################
$ cat /path/to/task.s3.folder.yaml
source:
conn: MY_S3_BUCKET
stream: s3://my-bucket/my_parquet_folder/
target:
conn: MY_TARGET_DB
object: target_schema.target_table
mode: full-refresh
$ cat /path/to/task.gs.folder.yaml
source:
conn: MY_GS_BUCKET
stream: gs://my-bucket/my_parquet_folder/
target:
conn: MY_TARGET_DB
object: target_schema.target_table
mode: full-refresh
$ cat /path/to/task.azure.folder.yaml
source:
conn: MY_AZURE_CONT
stream: https://my_account.blob.core.windows.net/my-container/my_parquet_folder/
target:
conn: MY_TARGET_DB
object: target_schema.target_table
mode: full-refresh
$ cat /path/to/task.s3.file.yaml