Run Single Task
Let's go over how we can run a single task with Sling.
Sling makes it easy to Extract & Load data from one connection to another.
# Pipe in your json file and flatten the nested keys into their own columns
$ cat /tmp/my_file.json | sling run --src-options '{flatten: true}' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh
# Read folder containing many CSV files
$ sling run --src-stream 'file:///tmp/my_csv_folder/' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh
# Sync only latest data from one source DB to another.
$ sling run --src-conn MY_SOURCE_DB --src-stream 'source_schema.source_table' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode incremental --primary-key 'id' --update-key 'last_modified_dt'
# Export / Backup database tables to JSON files
$ sling run --src-conn MY_SOURCE_DB --src-stream 'source_schema.source_table' --tgt-conn MY_S3_BUCKET --tgt-object 's3://my-bucket/my_json_folder/' --tgt-options '{file_max_rows: 100000, format: jsonlines}'
There are 2 ways to configure a task:
- Command Line Flags
- YAML Configuration Text
CLI Flag | YAML Config Key | Description |
---|---|---|
--src-conn | source.conn | The source database / API connection (name, conn string or URL). |
--src-stream | source.stream | The source table (schema.table) or local / cloud file path. Can also be the path of sql file or in-line text. |
--tgt-object | target.object | The path/url of the target file (local, s3, gc, azure). |
--tgt-conn | target.conn | The target database connection (name, conn string or URL). |
--mode | mode | The target load mode to use: incremental , truncate , full-refresh or snapshot . Default is full-refresh . |
--primary-key | source.primary_key | The column(s) to use as primary key. If composite key, use array. |
--update-key | source.update_key | The column to use as update key (for incremental mode). |
--src-options | source.options | in-line options to further configure source (JSON or YAML). See below for details. |
--tgt-options | target.options | in-line options to further configure target (JSON or YAML). See below for details. |
--stdout | stdout | Output the stream to standard output (STDOUT). |
Here the various modes available. All modes load into a new temporary table on tgt-conn prior to final load.
Mode | Description |
---|---|
full-refresh | This is the default mode. The target table will be dropped and recreated with the source data. |
incremental | The source data will be merged or appended into the target table. If the table does not exists, it will be created. See below for more details. |
truncate | Similar to full-refresh , except that the target table is truncated instead of dropped. This keeps any special DDL / GRANT applied. |
snapshot | If the target table exists, Sling will insert into / append data with a _sling_loaded_at column. If it does not, the table will be created. |
Load Strategy | Primary Key Provided | Update Key Provided | Stream Strategy |
---|---|---|---|
New Data Upsert (update/insert) | yes | yes | Only new records after max(update_key) |
Full Data Upsert (update/insert) | yes | no | Full data |
Append Only (insert, no update) | no | yes | Only new records after max(update_key) |
Child Key | Type | Description |
---|---|---|
compression | string | (Only for file source) The type of compression to use when reading files. Valid inputs are none , auto and gzip , zstd , snappy . Default is auto . |
format | string | (Only for file source) The format of the file(s). Options are: csv , json , jsonlines and xml . |
delimiter | string | (Only for file source) The delimiter to use when parsing tabular files. Default is auto . |
header | bool | (Only for file source) Whether to consider the first line as header. Default is true . |
flatten | bool | (Only for file source) Whether to flatten a semi-structure file source format (JSON, XML) |
jmespath | string | (Only for file source) Specify a JMESPath expression to use to filter / extract nested JSON data. See https://jmespath.org/ for more |
datetime_format | string | |
empty_as_null | bool | Whether empty fields should be treated as NULL . Default is true . |
null_if | string | Whether this case-sensitive value should be treated as NULL when encountered. Default is NULL . |
transforms | array | An array/list of built-in transforms to apply to records. Available values: replace_accents , parse_uuid , trim_space . See here for source code. [BETA] |
columns | object | When source column type cannot be inferred (e.g. source is File), you can use this object/map to specify the type that a column should be casted as. It is not necessary to include all columns. Sling with attempt to auto-detect all unspecified column types. The map key is the column name, and the value is the type ( string , json , integer , bigint , decimal , datetime or bool ). |
skip_blank_lines | bool | Whether blank lines should be skipped when encountered. Default is false . |
trim_space | bool | Whether white spaces at beginning or end of parsed values should be removed. Default is false . |
time_format | string | |
sheet | string | (Only for Excel source files) The name of the sheet to use as a data source. Default is the first sheet. |
range | string | (Only for Excel source files) The range to use on the sheet used as a data source. Example is B1:H70 . Default is all the range with data. |
Child Key | Type | Description |
---|---|---|
compression | string | (Only for file target) The type of compression to use when writing files. Valid inputs are none , auto and gzip , zstd , snappy . Default is auto . |
delimiter | string | (Only for file target) The delimiter to use when writing tabular files. Default is , . |
header | bool | (Only for file target) Whether to write the first line as header. Default is true . |
datetime_format | string | (Only for file target) |
file_max_rows | integer | (Only for file target) The maximum number of rows (usually lines) to write to a file. 0 means infinite number of rows. When a value greater than 0 is specified, the output location will be a folder with many parts in it. Default is 0 . |
file_max_bytes | integer | (Only for file target) The maximum number of bytes to write to a file. 0 means infinite number of bytes. When a value greater than 0 is specified, the output location will be a folder with many parts in it. Default is 0 . |
format | string | (Only for file target) The format of the file(s). Options are: csv , json and jsonlines . |
table_ddl | string | (Only for database target) The table DDL to use when writing to a database. Default is auto-generated by Sling. |
table_tmp | string | (Only for database target) The temporary table name that should be used when loading into a database. Default is auto-generated by Sling. |
pre_sql | string | (Only for database target) The SQL query to run before loading. |
post_sql | string | (Only for database target) The SQL query to run after loading. |
use_bulk | bool | (Only for database target) Whether to use bulk loading methodology, if available. If false , traditional batch INSERT loading will be used. Default is true . |
add_new_columns | bool | (Only for database target) Whether to add new columns from stream not found in target table (when mode is not full-refresh ). Default is true . |
adjust_column_type | bool | (Only for database target) Whether to adjust the column type when needed. Default is false . [BETA] |
# We first need to make sure our connections are available in our environment
# See https://docs.slingdata.io/sling-cli/environment for help
export MY_SOURCE_DB='...'
export MY_TARGET_DB='...'
$ sling conns list
+---------------+------------------+-----------------+
| CONN NAME | CONN TYPE | SOURCE |
+---------------+------------------+-----------------+
| MY_SOURCE_DB | DB - PostgreSQL | env variable |
| MY_TARGET_DB | DB - Snowflake | env variable |
+---------------+------------------+-----------------+
######################## Using CLI Flags ########################
$ sling run --src-conn MY_SOURCE_DB --src-stream 'source_schema.source_table' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh
######################## Using YAML Config ########################
$ cat /path/to/task.yaml
source:
conn: MY_SOURCE_DB
stream: source_schema.source_table
target:
conn: MY_TARGET_DB
object: target_schema.target_table
mode: full-refresh
$ sling run -c /path/to/task.yaml
# We first need to make sure our connections are available in our environment
# See https://docs.slingdata.io/sling-cli/environment for help
export MY_SOURCE_DB='...'
export MY_TARGET_DB='...'
$ sling conns list
+---------------+------------------+-----------------+
| CONN NAME | CONN TYPE | SOURCE |
+---------------+------------------+-----------------+
| MY_SOURCE_DB | DB - PostgreSQL | env variable |
| MY_TARGET_DB | DB - Snowflake | env variable |
+---------------+------------------+-----------------+
######################## Using CLI Flags ########################
$ sling run --src-conn MY_SOURCE_DB --src-stream 'select * from my_schema.my_table where col1 is not null' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh
# we can also read from a SQL file (/path/to/query.sql)
$ sling run --src-conn MY_SOURCE_DB --src-stream /path/to/query.sql --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh
######################## Using YAML Config ########################
$ cat /path/to/task.yaml
source:
conn: MY_SOURCE_DB
stream: |
select *
from my_schema.my_table
where col1 is not null
target:
conn: MY_TARGET_DB
object: target_schema.target_table
mode: full-refresh
$ cat /path/to/task.file.yaml
source:
conn: MY_SOURCE_DB
stream: /path/to/query.sql
target:
conn: MY_TARGET_DB
object: target_schema.target_table
mode: full-refresh
$ sling run -c /path/to/task.yaml
$ sling run -c /path/to/task.file.yaml
# We first need to make sure our connections are available in our environment
# See https://docs.slingdata.io/sling-cli/environment for help
export MY_SOURCE_DB='...'
export MY_TARGET_DB='...'
$ sling conns list
+---------------+------------------+-----------------+
| CONN NAME | CONN TYPE | SOURCE |
+---------------+------------------+-----------------+
| MY_SOURCE_DB | DB - PostgreSQL | env variable |
| MY_TARGET_DB | DB - Snowflake | env variable |
+---------------+------------------+-----------------+
######################## Using CLI Flags ########################
$ sling run --src-conn MY_SOURCE_DB --src-stream 'source_schema.source_table' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode incremental --primary-key 'id' --update-key 'last_modified_dt'
######################## Using YAML Config ########################
$ cat /path/to/task.yaml
source:
conn: MY_SOURCE_DB
stream: source_schema.source_table
primary_key: [id]
update_key: last_modified_dt
target:
conn: MY_TARGET_DB
object: target_schema.target_table
mode: incremental
$ sling run -c /path/to/task.yaml
# We first need to make sure our connections are available in our environment
# See https://docs.slingdata.io/sling-cli/environment for help
export MY_SOURCE_DB='...'
export MY_TARGET_DB='...'
$ sling conns list
+---------------+------------------+-----------------+
| CONN NAME | CONN TYPE | SOURCE |
+---------------+------------------+-----------------+
| MY_SOURCE_DB | DB - PostgreSQL | env variable |
| MY_TARGET_DB | DB - Snowflake | env variable |
+---------------+------------------+-----------------+
######################## Using CLI Flags ########################
$ sling run --src-conn MY_SOURCE_DB --src-stream 'source_schema.source_table' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode incremental --primary-key 'id'
######################## Using YAML Config ########################
$ cat /path/to/task.yaml
source:
conn: MY_SOURCE_DB
stream: source_schema.source_table
primary_key: [id]
target:
conn: MY_TARGET_DB
object: target_schema.target_table
mode: incremental
$ sling run -c /path/to/task.yaml
# We first need to make sure our connections are available in our environment
# See https://docs.slingdata.io/sling-cli/environment for help
export MY_SOURCE_DB='...'
export MY_TARGET_DB='...'
$ sling conns list
+---------------+------------------+-----------------+
| CONN NAME | CONN TYPE | SOURCE |
+---------------+------------------+-----------------+
| MY_SOURCE_DB | DB - PostgreSQL | env variable |
| MY_TARGET_DB | DB - Snowflake | env variable |
+---------------+------------------+-----------------+
######################## Using CLI Flags ########################
$ sling run --src-conn MY_SOURCE_DB --src-stream 'source_schema.source_table' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode incremental --update-key 'created_dt'
######################## Using YAML Config ########################
$ cat /path/to/task.yaml
source:
conn: MY_SOURCE_DB
stream: source_schema.source_table
update_key: created_dt
target:
conn: MY_TARGET_DB
object: target_schema.target_table
mode: incremental
$ sling run -c /path/to/task.yaml
# We first need to make sure our connections are available in our environment
# See https://docs.slingdata.io/sling-cli/environment for help
export MY_SOURCE_DB='...'
export MY_TARGET_DB='...'
$ sling conns list
+---------------+------------------+-----------------+
| CONN NAME | CONN TYPE | SOURCE |
+---------------+------------------+-----------------+
| MY_SOURCE_DB | DB - PostgreSQL | env variable |
| MY_TARGET_DB | DB - Snowflake | env variable |
+---------------+------------------+-----------------+
######################## Using CLI Flags ########################
$ sling run --src-conn MY_SOURCE_DB --src-stream 'source_schema.source_table' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode truncate
######################## Using YAML Config ########################
$ cat /path/to/task.yaml
source:
conn: MY_SOURCE_DB
stream: source_schema.source_table
target:
conn: MY_TARGET_DB
object: target_schema.target_table
mode: truncate
$ sling run -c /path/to/task.yaml
# We first need to make sure our connections are available in our environment
# See https://docs.slingdata.io/sling-cli/environment for help
export MY_SOURCE_DB='...'
export MY_TARGET_DB='...'
$ sling conns list
+---------------+------------------+-----------------+
| CONN NAME | CONN TYPE | SOURCE |
+---------------+------------------+-----------------+
| MY_SOURCE_DB | DB - PostgreSQL | env variable |
| MY_TARGET_DB | DB - Snowflake | env variable |
+---------------+------------------+-----------------+
######################## Using CLI Flags ########################
$ sling run --src-conn MY_SOURCE_DB --src-stream 'source_schema.source_table' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode snapshot
######################## Using YAML Config ########################
$ cat /path/to/task.yaml
source:
conn: MY_SOURCE_DB
stream: source_schema.source_table
target:
conn: MY_TARGET_DB
object: target_schema.target_table
mode: snapshot
$ sling run -c /path/to/task.yaml
# We first need to make sure our connections are available in our environment
# See https://docs.slingdata.io/sling-cli/environment for help
export MY_TARGET_DB='...'
$ sling conns list
+---------------+------------------+-----------------+
| CONN NAME | CONN TYPE | SOURCE |
+---------------+------------------+-----------------+
| MY_TARGET_DB | DB - PostgreSQL | env variable |
+---------------+------------------+-----------------+
######################## Using CLI Flags ########################
$ cat /tmp/my_file.csv | sling run --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh
$ sling run --src-stream 'file:///tmp/my_file.csv' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh
$ sling run --src-stream 'file:///tmp/my_csv_folder/' --src-options '{columns: {col2: string}}' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh
$ sling run --src-stream 'file:///tmp/my_csv_folder/' --src-options '{transforms: [remove_accents]}' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh
######################## Using YAML Config ########################
$ cat /path/to/task.folder.yaml
source:
stream: file:///tmp/my_csv_folder/
options:
transforms: [remove_accents] # Apply transforms. Here we are removing diacritics (accents) from string values.
columns:
col2: string # cast `col2` as string
target:
conn: MY_TARGET_DB
object: target_schema.target_table
mode: full-refresh
$ cat /path/to/task.file.yaml
source:
stream: file:///tmp/my_file.csv
target:
conn: MY_TARGET_DB
object: target_schema.target_table
mode: full-refresh
$ sling run -c /path/to/task.folder.yaml
$ sling run -c /path/to/task.file.yaml
# We first need to make sure our connections are available in our environment
# See https://docs.slingdata.io/sling-cli/environment for help
export MY_TARGET_DB='...'
$ sling conns list
+---------------+------------------+-----------------+
| CONN NAME | CONN TYPE | SOURCE |
+---------------+------------------+-----------------+
| MY_TARGET_DB | DB - PostgreSQL | env variable |
+---------------+------------------+-----------------+
######################## Using CLI Flags ########################
$ cat /tmp/my_file.json | sling run --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh
$ sling run --src-stream 'file:///tmp/my_file.json' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh
$ sling run --src-stream 'file:///tmp/my_json_folder/' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh
######################## Using YAML Config ########################
$ cat /path/to/task.folder.yaml
source:
stream: file:///tmp/my_json_folder/
target:
conn: MY_TARGET_DB
object: target_schema.target_table
mode: full-refresh
$ cat /path/to/task.file.yaml
source:
stream: file:///tmp/my_file.json
target:
conn: MY_TARGET_DB
object: target_schema.target_table
mode: full-refresh
$ sling run -c /path/to/task.folder.yaml
$ sling run -c /path/to/task.file.yaml
# We first need to make sure our connections are available in our environment
# See https://docs.slingdata.io/sling-cli/environment for help
export MY_TARGET_DB='...'
$ sling conns list
+---------------+------------------+-----------------+
| CONN NAME | CONN TYPE | SOURCE |
+---------------+------------------+-----------------+
| MY_TARGET_DB | DB - PostgreSQL | env variable |
+---------------+------------------+-----------------+
######################## Using CLI Flags ########################
$ cat /tmp/my_file.json | sling run --src-options '{flatten: true}' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh
$ sling run --src-stream 'file:///tmp/my_file.json' --src-options '{flatten: true}' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh
$ sling run --src-stream 'file:///tmp/my_json_folder/' --src-options '{flatten: true}' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh
######################## Using YAML Config ########################
$ cat /path/to/task.folder.yaml
source:
stream: file:///tmp/my_json_folder/
options:
flatten: true
target:
conn: MY_TARGET_DB
object: target_schema.target_table
mode: full-refresh
$ cat /path/to/task.file.yaml
source:
stream: file:///tmp/my_file.json
options:
flatten: true
target:
conn: MY_TARGET_DB
object: target_schema.target_table
mode: full-refresh
$ sling run -c /path/to/task.folder.yaml
$ sling run -c /path/to/task.file.yaml
# We first need to make sure our connections are available in our environment
# See https://docs.slingdata.io/sling-cli/environment for help
export MY_TARGET_DB='...'
$ sling conns list
+---------------+------------------+-----------------+
| CONN NAME | CONN TYPE | SOURCE |
+---------------+------------------+-----------------+
| MY_TARGET_DB | DB - PostgreSQL | env variable |
+---------------+------------------+-----------------+
######################## Using CLI Flags ########################
$ sling run --src-stream 'file:///tmp/my_file.parquet' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh
$ sling run --src-stream 'file:///tmp/my_parquet_folder/' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh
######################## Using YAML Config ########################
$ cat /path/to/task.folder.yaml
source:
stream: file:///tmp/my_parquet_folder/
target:
conn: MY_TARGET_DB
object: target_schema.target_table
mode: full-refresh
$ cat /path/to/task.file.yaml
source:
stream: file:///tmp/my_file.parquet
target:
conn: MY_TARGET_DB
object: target_schema.target_table
mode: full-refresh
$ sling run -c /path/to/task.folder.yaml
$ sling run -c /path/to/task.file.yaml
# We first need to make sure our connections are available in our environment
# See https://docs.slingdata.io/sling-cli/environment for help
export MY_TARGET_DB='...'
$ sling conns list
+----------------+------------------+-----------------+
| CONN NAME | CONN TYPE | SOURCE |
+----------------+------------------+-----------------+
| MY_S3_BUCKET | FileSys - S3 | sling env yaml |
| MY_GS_BUCKET | FileSys - Google | sling env yaml |
| MY_AZURE_CONT | FileSys - Azure | sling env yaml |
| MY_TARGET_DB | DB - PostgreSQL | env variable |
+----------------+------------------+-----------------+
######################## Using CLI Flags ########################
$ sling run --src-conn MY_S3_BUCKET --src-stream 's3://my-bucket/my_csv_folder/' --src-options '{columns: {col2: string}}' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh
$ sling run --src-conn MY_GS_BUCKET --src-stream 'gs://my-bucket/my_csv_folder/' --src-options '{columns: {col2: string}}' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh
$ sling run --src-conn MY_AZURE_CONT --src-stream 'https://my_account.blob.core.windows.net/my-container/my_csv_folder/' --src-options '{columns: {col2: string}}' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh
$ sling run --src-conn MY_S3_BUCKET --src-stream 's3://my-bucket/my_file.csv' --src-options '{columns: {col2: string}}' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh
$ sling run --src-conn MY_GS_BUCKET --src-stream 'gs://my-bucket/my_file.csv' --src-options '{columns: {col2: string}}' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh
$ sling run --src-conn MY_AZURE_CONT --src-stream 'https://my_account.blob.core.windows.net/my-container/my_file.csv' --src-options '{columns: {col2: string}}' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh
######################## Using YAML Config ########################
$ cat /path/to/task.s3.folder.yaml
source:
conn: MY_S3_BUCKET
stream: s3://my-bucket/my_csv_folder/
options:
columns:
col2: string # cast `col2` as string
target:
conn: MY_TARGET_DB
object: target_schema.target_table
mode: full-refresh
$ cat /path/to/task.gs.folder.yaml
source:
conn: MY_GS_BUCKET
stream: gs://my-bucket/my_csv_folder/
options:
columns:
col2: string # cast `col2` as string
target:
conn: MY_TARGET_DB
object: target_schema.target_table
mode: full-refresh
$ cat /path/to/task.azure.folder.yaml
source:
conn: MY_AZURE_CONT
stream: https://my_account.blob.core.windows.net/my-container/my_csv_folder/
options:
columns:
col2: string # cast `col2` as string
target:
conn: MY_TARGET_DB
object: target_schema.target_table
mode: full-refresh
$ cat /path/to/task.s3.file.yaml
source:
conn: MY_S3_BUCKET
stream: s3://my-bucket/my_file.csv
options:
columns:
col2: string # cast `col2` as string
target:
conn: MY_TARGET_DB
object: target_schema.target_table
mode: full-refresh
$ cat /path/to/task.gs.file.yaml
source:
conn: MY_GS_BUCKET
stream: gs://my-bucket/my_file.csv
options:
columns:
col2: string # cast `col2` as string
target:
conn: MY_TARGET_DB
object: target_schema.target_table
mode: full-refresh
$ cat /path/to/task.azure.file.yaml
source:
conn: MY_AZURE_CONT
stream: https://my_account.blob.core.windows.net/my-container/my_file.csv
target:
conn: MY_TARGET_DB
object: target_schema.target_table
mode: full-refresh
$ sling run -c /path/to/task.s3.folder.yaml
$ sling run -c /path/to/task.gs.folder.yaml
$ sling run -c /path/to/task.azure.folder.yaml
$ sling run -c /path/to/task.s3.file.yaml
$ sling run -c /path/to/task.gs.file.yaml
$ sling run -c /path/to/task.azure.file.yaml
# We first need to make sure our connections are available in our environment
# See https://docs.slingdata.io/sling-cli/environment for help
export MY_TARGET_DB='...'
$ sling conns list
+----------------+------------------+-----------------+
| CONN NAME | CONN TYPE | SOURCE |
+----------------+------------------+-----------------+
| MY_S3_BUCKET | FileSys - S3 | sling env yaml |
| MY_GS_BUCKET | FileSys - Google | sling env yaml |
| MY_AZURE_CONT | FileSys - Azure | sling env yaml |
| MY_TARGET_DB | DB - PostgreSQL | env variable |
+----------------+------------------+-----------------+
######################## Using CLI Flags ########################
$ sling run --src-conn MY_S3_BUCKET --src-stream 's3://my-bucket/my_json_folder/' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh
$ sling run --src-conn MY_GS_BUCKET --src-stream 'gs://my-bucket/my_json_folder/' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh
$ sling run --src-conn MY_AZURE_CONT --src-stream 'https://my_account.blob.core.windows.net/my-container/my_json_folder/' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh
$ sling run --src-conn MY_S3_BUCKET --src-stream 's3://my-bucket/my_file.json' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh
$ sling run --src-conn MY_GS_BUCKET --src-stream 'gs://my-bucket/my_file.json' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh
$ sling run --src-conn MY_AZURE_CONT --src-stream 'https://my_account.blob.core.windows.net/my-container/my_file.json' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh
######################## Using YAML Config ########################
$ cat /path/to/task.s3.folder.yaml
source:
conn: MY_S3_BUCKET
stream: s3://my-bucket/my_json_folder/
target:
conn: MY_TARGET_DB
object: target_schema.target_table
mode: full-refresh
$ cat /path/to/task.gs.folder.yaml
source:
conn: MY_GS_BUCKET
stream: gs://my-bucket/my_json_folder/
target:
conn: MY_TARGET_DB
object: target_schema.target_table
mode: full-refresh
$ cat /path/to/task.azure.folder.yaml
source:
conn: MY_AZURE_CONT
stream: https://my_account.blob.core.windows.net/my-container/my_json_folder/
target:
conn: MY_TARGET_DB
object: target_schema.target_table
mode: full-refresh
$ cat /path/to/task.s3.file.yaml
source:
conn: MY_S3_BUCKET
stream: s3://my-bucket/my_file.json
target:
conn: MY_TARGET_DB
object: target_schema.target_table
mode: full-refresh
$ cat /path/to/task.gs.file.yaml
source:
conn: MY_GS_BUCKET
stream: gs://my-bucket/my_file.json
target:
conn: MY_TARGET_DB
object: target_schema.target_table
mode: full-refresh
$ cat /path/to/task.azure.file.yaml
source:
conn: MY_AZURE_CONT
stream: https://my_account.blob.core.windows.net/my-container/my_file.csv
target:
conn: MY_TARGET_DB
object: target_schema.target_table
mode: full-refresh
$ sling run -c /path/to/task.s3.folder.yaml
$ sling run -c /path/to/task.gs.folder.yaml
$ sling run -c /path/to/task.azure.folder.yaml
$ sling run -c /path/to/task.s3.file.yaml
$ sling run -c /path/to/task.gs.file.yaml
$ sling run -c /path/to/task.azure.file.yaml
# We first need to make sure our connections are available in our environment
# See https://docs.slingdata.io/sling-cli/environment for help
export MY_TARGET_DB='...'
$ sling conns list
+----------------+------------------+-----------------+
| CONN NAME | CONN TYPE | SOURCE |
+----------------+------------------+-----------------+
| MY_S3_BUCKET | FileSys - S3 | sling env yaml |
| MY_GS_BUCKET | FileSys - Google | sling env yaml |
| MY_AZURE_CONT | FileSys - Azure | sling env yaml |
| MY_TARGET_DB | DB - PostgreSQL | env variable |
+----------------+------------------+-----------------+
######################## Using CLI Flags ########################
$ sling run --src-conn MY_S3_BUCKET --src-options '{flatten: true}' --src-stream 's3://my-bucket/my_json_folder/' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh
$ sling run --src-conn MY_GS_BUCKET --src-options '{flatten: true}' --src-stream 'gs://my-bucket/my_json_folder/' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh
$ sling run --src-conn MY_AZURE_CONT --src-options '{flatten: true}' --src-stream 'https://my_account.blob.core.windows.net/my-container/my_json_folder/' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh
$ sling run --src-conn MY_S3_BUCKET --src-options '{flatten: true}' --src-stream 's3://my-bucket/my_file.json' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh
$ sling run --src-conn MY_GS_BUCKET --src-options '{flatten: true}' --src-stream 'gs://my-bucket/my_file.json' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh
$ sling run --src-conn MY_AZURE_CONT --src-options '{flatten: true}' --src-stream 'https://my_account.blob.core.windows.net/my-container/my_file.json' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh
######################## Using YAML Config ########################
$ cat /path/to/task.s3.folder.yaml
source:
conn: MY_S3_BUCKET
stream: s3://my-bucket/my_json_folder/
options:
flatten: true
target:
conn: MY_TARGET_DB
object: target_schema.target_table
mode: full-refresh
$ cat /path/to/task.gs.folder.yaml
source:
conn: MY_GS_BUCKET
stream: gs://my-bucket/my_json_folder/
options:
flatten: true
target:
conn: MY_TARGET_DB
object: target_schema.target_table
mode: full-refresh
$ cat /path/to/task.azure.folder.yaml
source:
conn: MY_AZURE_CONT
stream: https://my_account.blob.core.windows.net/my-container/my_json_folder/
options:
flatten: true
target:
conn: MY_TARGET_DB
object: target_schema.target_table
mode: full-refresh
$ cat /path/to/task.s3.file.yaml
source:
conn: MY_S3_BUCKET
stream: s3://my-bucket/my_file.json
options:
flatten: true
target:
conn: MY_TARGET_DB
object: target_schema.target_table
mode: full-refresh
$ cat /path/to/task.gs.file.yaml
source:
conn: MY_GS_BUCKET
stream: gs://my-bucket/my_file.json
options:
flatten: true
target:
conn: MY_TARGET_DB
object: target_schema.target_table
mode: full-refresh
$ cat /path/to/task.azure.file.yaml
source:
conn: MY_AZURE_CONT
stream: https://my_account.blob.core.windows.net/my-container/my_file.json
options:
flatten: true
target:
conn: MY_TARGET_DB
object: target_schema.target_table
mode: full-refresh
$ sling run -c /path/to/task.s3.folder.yaml
$ sling run -c /path/to/task.gs.folder.yaml
$ sling run -c /path/to/task.azure.folder.yaml
$ sling run -c /path/to/task.s3.file.yaml
$ sling run -c /path/to/task.gs.file.yaml
$ sling run -c /path/to/task.azure.file.yaml
# We first need to make sure our connections are available in our environment
# See https://docs.slingdata.io/sling-cli/environment for help
export MY_TARGET_DB='...'
$ sling conns list
+----------------+------------------+-----------------+
| CONN NAME | CONN TYPE | SOURCE |
+----------------+------------------+-----------------+
| MY_S3_BUCKET | FileSys - S3 | sling env yaml |
| MY_GS_BUCKET | FileSys - Google | sling env yaml |
| MY_AZURE_CONT | FileSys - Azure | sling env yaml |
| MY_TARGET_DB | DB - PostgreSQL | env variable |
+----------------+------------------+-----------------+
######################## Using CLI Flags ########################
$ sling run --src-conn MY_S3_BUCKET --src-stream 's3://my-bucket/my_parquet_folder/' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh
$ sling run --src-conn MY_GS_BUCKET --src-stream 'gs://my-bucket/my_parquet_folder/' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh
$ sling run --src-conn MY_AZURE_CONT --src-stream 'https://my_account.blob.core.windows.net/my-container/my_parquet_folder/' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh
$ sling run --src-conn MY_S3_BUCKET --src-stream 's3://my-bucket/my_file.parquet' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh
$ sling run --src-conn MY_GS_BUCKET --src-stream 'gs://my-bucket/my_file.parquet' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh
$ sling run --src-conn MY_AZURE_CONT --src-stream 'https://my_account.blob.core.windows.net/my-container/my_file.parquet' --tgt-conn MY_TARGET_DB --tgt-object 'target_schema.target_table' --mode full-refresh
######################## Using YAML Config ########################
$ cat /path/to/task.s3.folder.yaml
source:
conn: MY_S3_BUCKET
stream: s3://my-bucket/my_parquet_folder/
target:
conn: MY_TARGET_DB
object: target_schema.target_table
mode: full-refresh
$ cat /path/to/task.gs.folder.yaml
source:
conn: MY_GS_BUCKET
stream: gs://my-bucket/my_parquet_folder/
target:
conn: MY_TARGET_DB
object: target_schema.target_table
mode: full-refresh
$ cat /path/to/task.azure.folder.yaml
source:
conn: MY_AZURE_CONT
stream: https://my_account.blob.core.windows.net/my-container/my_parquet_folder/
target:
conn: MY_TARGET_DB
object: target_schema.target_table
mode: full-refresh
$ cat /path/to/task.s3.file.yaml