Variables

Learn how to use Runtime & Environment Variables with Sling

Runtime Variables

A powerful logic which allows dynamic configuration. The used parts will be replaced at runtime with the corresponding values. So you could name your target object {target_schema}.{stream_schema}_{stream_table}, and at runtime it will be formatted correctly as depicted below.

run_timestamp: The run timestamp of the task (2006_01_02_150405)
source_account: the name of the account of the source connection (when source conn is AZURE)
source_bucket: the name of the bucket of the source connection (when source conn is GCS or S3)
source_container: the name of the container of the source connection (when source conn is AZURE)
source_name: the name of the source connection
stream_file_folder: the file parent folder name of the stream (when source is a file system)
stream_file_name: the file name of the stream (when source is a file system)
stream_file_ext: the file extension of the stream (when source is a file system)
stream_file_path: the file path of the stream (when source is a file system)
stream_name: the name of the stream
stream_schema: the schema name of the source stream (when source is a database)
stream_table: the table name of the source stream (when source is a database)
target_account: the name of the account of the target connection (when target is AZURE)
target_bucket: the name of the bucket of the target connection (when target is GCS or S3)
target_container: the name of the container of the target connection (when target is AZURE)
target_name: the name of the target connection
target_schema: the default target schema defined in connection (when target is a database)
object_schema: the target object table schema (when target is a database)
object_table: the target object table name (when target is a database)
object_name: the target object name

Timestamp Patterns

YYYY: The 4 digit year of the run timestamp of the task
YY: The 2 digit year of the run timestamp of the task
MMM: The abbreviation of the month of the run timestamp of the task
MM: The 2 digit month of the run timestamp of the task
DD: The 2 digit day of the run timestamp of the task
HH: The 2 digit 24-hour of the run timestamp of the task
hh: The 2 digit 12-hour of the run timestamp of the task
mm: The 2 digit minute of the run timestamp of the task
ss: The 2 digit second of the run timestamp of the task
ISO8601: The ISO-8601 format of the run timestamp of the task (2006-01-02T15:04:05Z)

Environment Variables

Sling also allows you to pass-in environment variables in order to further customize configurations in a scalable manner. We are then able to reuse them in various places in our config files.

Definition

A convenient way to embed global variables is in the env.yaml file. You could also simply define it in the environment, the traditional way.

env.yaml

connections:
  MYSQL:
    type: mysql
  S3_ZONE_A:
    type: s3

# this sets environment variables in sling process
variables:
  path_prefix: /my/path/prefix
  schema_name: main

Replication

Below we are displaying the full use of Environment Variables as well as Runtime Vars (such as stream_schema, stream_table, YYYY, MM and DD).

replication.yaml

source: MYSQL
target: S3_ZONE_A

defaults:
  # {path_prefix} here is filled in from env var
  object: {path_prefix}/{stream_schema}/{stream_table}/{YYYY}_{MM}_{DD}.parquet
  target_options:
    format: parquet

streams:

  # all tables in schema
  my_schema.*:
    # overwrites default object
    object: {stream_schema}/{stream_table}/{YYYY}_{MM}_{DD}/
    target_options:
      file_max_rows: 400000 # will split files into folder
  
  mysql.my_table:
    sql: |
      select * from mysql.my_table
      where date between '{start_date}' and '{end_date}'

env:
  # ${path_prefix} pulls from environment variables in sling process or env
  path_prefix: '${path_prefix}' # From env.yaml (not in Environment)
  start_date: '${START_DATE}'   # From Environment
  end_date: '${END_DATE}'       # From Environment

Global Environment Variables

Sling utilizes the following global environment variables to further configure the load behavior. You can simply define them in your environment, the env.yaml file or the env section in a task or replication.

Variable Name Description

Variable Name	Description
`SLING_HOME_DIR`	The sling home directory, which contains `env.yaml`. Will use default if not provided.
`SLING_LOADED_AT_COLUMN`	Whether to add an audit timestamp column named `_sling_loaded_at` in target object. Accepts values `true`, `false`, `unix` (for epoch integer values) or `timestamp`. `true` defaults to `unix`.
`SLING_STREAM_URL_COLUMN`	If source is file, whether to add a column `_sling_stream_url` with the source file path / url in target object. To enable, set to `true`.
`SLING_RECURSIVE_LIMIT`	The number limit of file names to pull, when listing from cloud file systems such as S3, GCP and Azure Storage.
`SLING_ROW_ID_COLUMN`	Whether to add a column named `_sling_row_id` in the target object, which will have a random UUIDv7 value. This will be unique. To enable, set to `true`.
`SLING_ROW_NUM_COLUMN`	If source is file, whether to add a column named `_sling_row_num` in the target object, which will be the row number of the stream (incremented by record processed). To enable, set to `true`.
`SLING_ALLOW_EMPTY`	This is useful to create tables / files using the stream columns structure, even if there is no data. To enable, set to `true`.
`SLING_DISABLE_TELEMETRY`	this disables any anonymous usage reporting. These are used to improve sling. To disable, set this to `true`.
`SLING_SHOW_PROGRESS`	Whether the progress of the stream should be displayed (`true` or `false`).
`SLING_LOGGING`	How sling formats the log lines. Accepts values `JSON`, `NO_COLOR` or `CONSOLE` (default).
`SAMPLE_SIZE`	The number of records to process in order to infer column types (especially for file sources). Default is `900`.

SLING_HOME_DIR

The sling home directory, which contains env.yaml. Will use default if not provided.

SLING_LOADED_AT_COLUMN

Whether to add an audit timestamp column named _sling_loaded_at in target object. Accepts values true, false, unix (for epoch integer values) or timestamp. true defaults to unix.

SLING_STREAM_URL_COLUMN

If source is file, whether to add a column _sling_stream_url with the source file path / url in target object. To enable, set to true.

SLING_RECURSIVE_LIMIT

The number limit of file names to pull, when listing from cloud file systems such as S3, GCP and Azure Storage.

SLING_ROW_ID_COLUMN

Whether to add a column named _sling_row_id in the target object, which will have a random UUIDv7 value. This will be unique. To enable, set to true.

SLING_ROW_NUM_COLUMN

If source is file, whether to add a column named _sling_row_num in the target object, which will be the row number of the stream (incremented by record processed). To enable, set to true.

SLING_ALLOW_EMPTY

This is useful to create tables / files using the stream columns structure, even if there is no data. To enable, set to true.

SLING_DISABLE_TELEMETRY

this disables any anonymous usage reporting. These are used to improve sling. To disable, set this to true.

SLING_SHOW_PROGRESS

Whether the progress of the stream should be displayed (true or false).

SLING_LOGGING

How sling formats the log lines. Accepts values JSON, NO_COLOR or CONSOLE (default).

SAMPLE_SIZE

The number of records to process in order to infer column types (especially for file sources). Default is 900.

PreviousReplication NextTransformations

Last updated 27 days ago