Configuration
Configuration Concepts for your Sling runs
Configuration Inputs
Sling CLI offers 2 ways to input your configuration:
CLI Flags: Quick ad-hoc runs from your terminal shell or script.
sling run --src-stream file://my_file.csv --tgt-conn postgres --tgt-object my_schema.my_table --mode full-refresh
Replication: Multiple streams in a YAML or JSON file. Best way to scale Sling.
sling run -r /path/to/replication.yaml
Modes
Here the various loading modes available. All modes load into a new temporary table prior to final load.
Mode | Description |
---|---|
| This is the default mode. The target table will be dropped and recreated with the source data. |
| The source data will be merged or appended into the target table. If the table does not exist, it will be created. See below for more details. |
| Similar to |
| Appends the full dataset with an added timestamp column. If the target table exists, Sling will insert into / append data with a |
| Similar to |
Incremental Mode Strategies
Load Strategy | Primary Key | Update Key | Stream Strategy |
---|---|---|---|
New Data Upsert (update/insert) |
|
| Only new records after |
Full Data Upsert (update/insert) |
| no | Full data |
Append Only (insert, no update) | no |
| Only new records after |
Incremental or Backfill Mode With Custom SQL
When using incremental
or backfill
mode, with a custom SQL stream, as well as with a provided update_key
, it is necessary to include the placeholder {incremental_where_cond}
or {incremental_value}
in the SQL text. This will allow Sling to inject the necessary values to apply the watermark in order to get only newer records.
For example, let us assume the custom SQL stream is select * from my_schema.my_table where {incremental_where_cond}
. When Sling executes, if the table does not exist (first-run), the {incremental_where_cond}
value will be 1=1
, and the rendered query will be select * from my_schema.my_table where 1=1
. If the table exists, Sling will pull the max value of the update_key
column (being 2001-01-01 01:01:01
, rendering {incremental_where_cond}
as my_update_key > '2001-01-01 01:01:01'
), and inject the value: select * from my_schema.my_table where my_update_key > '2001-01-01 01:01:01'
.
Furthermore, let us assume the custom SQL stream is select * from my_schema.my_table where my_int_key > coalesce({incremental_value}, 0)
(for a timestamp column, we'd put something like coalesce({incremental_value}, '2001-01-01')
). When Sling executes, if the table does not exist (first-run), the {incremental_value}
value will be null
, and the rendered query will be select * from my_schema.my_table where my_int_key > coalesce(null, 0)
. If the table exists, Sling will pull the max value of the update_key
column (being 99
), and inject the value: select * from my_schema.my_table where my_int_key > coalesce(99, 0)
.
Options
Source
Here is the Go struct for source options:
Key | Description |
---|---|
| When source column type cannot be inferred (e.g. source is File), you can use this object/map to specify the type that a column should be cast as. It is not necessary to include all columns. Sling with attempt to auto-detect all unspecified column types. The map key is the column name, and the value is the type ( |
| (Only for file source) The type of compression to use when reading files. Valid inputs are |
| The ISO 8601 date format to use when reading date values. Default is |
| (Only for file source) The delimiter to use when parsing tabular files. Default is |
| (Only for file source - since v1.2.4) The escape character to use when parsing tabular files. Default is |
| Whether empty fields should be treated as |
| (Only for file source) Whether to flatten a semi-structure file source format (JSON, XML) |
| (Only for file source) The format of the file(s). Options are: |
| (Only for file source) Whether to consider the first line as header. Default is |
| (Only for file source) Specify a JMESPath expression to use to filter / extract nested JSON data. See https://jmespath.org/ for more |
| The maximum number of rows to pull from the source |
| Whether this case-sensitive value should be treated as |
| (Only for Excel source files) The name of the sheet to use as a data source, for example |
| The range to use for |
| Whether blank lines should be skipped when encountered. Default is |
| An object/map, or array/list of built-in transforms to apply to records. When an array is provided, it will apply the transforms to all columns. Available transforms:
Array example: |
| Whether white spaces at beginning or end of parsed values should be removed. Default is |
Target
Here is the Go struct for target options:
Key | Description |
---|---|
| (Only for database target) Whether to add new columns from stream not found in target table (when mode is not |
| (Only for database target) Whether to adjust the column type when needed. Default is |
| (Only for database target — since v1.2.11) The maximum number of records per transaction batch. |
| Whether to convert the column name casing. This facilitates querying tables in target database without using quotes. Accepts |
| (Only for file target) The type of compression to use when writing files. Valid inputs are |
| (Only for file target) The ISO 8601 date format to use when writing date values. Default is |
| (Only for file target) The delimiter to use when writing tabular files. Default is |
| (Only for file target) The maximum number of bytes to write to a file. |
| (Only for file target) The maximum number of rows (usually lines) to write to a file. |
| (Only for file target) The format of the file(s). Options are: |
| Ignore existing target file/table if it exists (do not overwrite/modify). Default is |
| (Only for file target) Whether to write the first line as header. Default is |
| (Only for database target) The SQL query to run after loading. |
| (Only for database target) The SQL query to run before loading. |
| (Only for database target) The table DDL to use when writing to a database. Default is auto-generated by Sling. Accepts the |
| (Only for database target) The table keys to define when creating a table, such as |
| (Only for database target) The temporary table name that should be used when loading into a database. Default is auto-generated by Sling. |
|
Third Party Tools / Drivers
sling
can take advantage of several 3rd party free tools to speed up insert speeds depending on the kind of database being interacted with. Those need to be installed separately and available in the PATH
environment variable for use. If you use our docker image, those tools should be already available (including the Oracle client). The following tools are supported:
bcp
for Microsoft SQL Server databases (installation instructions).sqlldr
for Oracle databases (installation instructions).mysql
client tool for MySQL databases (installation instructions).
Last updated