Runtime Variables
Learn how to use Runtime & Environment Variables with Sling
Runtime Variables
A powerful feature that allows dynamic configuration. The used parts will be replaced at runtime with the corresponding values. So you could name your target object {target.schema}.{stream.schema}_{stream.table}, and at runtime it will be formatted correctly as depicted below.
source.account: the name of the account of the source connection (when source conn is AZURE)source.bucket: the name of the bucket of the source connection (when source conn is GCS or S3)source.container: the name of the container of the source connection (when source conn is AZURE)source.name: the name of the source connectionstream.file_folder: the file parent folder name of the stream (when source is a file system)stream.file_name: the file name of the stream (when source is a file system)stream.file_ext: the file extension of the stream (when source is a file system)stream.file_path: the file path of the stream (when source is a file system)stream.name: the name of the streamstream.schema/stream.schema_lower/stream.schema_upper: the schema name of the source stream (when source is a database)stream.table/stream.table_lower/stream.table_upper: the table name of the source stream (when source is a database)stream.full_name: the full qualified table name of the source stream (when source is a database)target.account: the name of the account of the target connection (when target is AZURE)target.bucket: the name of the bucket of the target connection (when target is GCS or S3)target.container: the name of the container of the target connection (when target is AZURE)target.name: the name of the target connectiontarget.schema: the default target schema defined in connection (when target is a database)object.schema: the target object table schema (when target is a database)object.table: the target object table name (when target is a database)object.full_name: the target object full qualified table name (when target is a database)object.name: the target object name
Timestamp Patterns
run_timestamp: The run timestamp of the task (2006_01_02_150405)YYYY: The 4 digit year of the run timestamp of the taskYY: The 2 digit year of the run timestamp of the taskMMM: The abbreviation of the month of the run timestamp of the taskMM: The 2 digit month of the run timestamp of the taskDD: The 2 digit day of the run timestamp of the taskHH: The 2 digit 24-hour of the run timestamp of the taskhh: The 2 digit 12-hour of the run timestamp of the taskmm: The 2 digit minute of the run timestamp of the taskss: The 2 digit second of the run timestamp of the task
Partition Patterns
This only applies when writing parquet files. You must specified the update_key along with a part_ variable in the object_name, for example: object: my/folder/{part_year_month}/{part_day}.
part_year: The 4 digit year partition value of theupdate_key.part_month: The 2 digit month partition value of theupdate_key.part_year_month: Combination of the 4 digit year and the 2 digit month partition values of theupdate_key(e.g.2024-11as one value).part_day: The 2 digit day partition value of theupdate_key.part_week: The ISO-8601 2 digit week partition value of theupdate_key.part_hour: The 2 digit hour partition value of theupdate_key.part_minute: The 2 digit minute partition value of theupdate_key.
When using partition patterns, by default, sling will set the write_partition_columns true so that duckdb includes the partition columns in the dataset. When setting write_partition_columns as true, the way DuckDB writes the parquet schema may cause some issues with other tools reading the data at folder level (see here for more details). If you'd like to disable this behavior, set environment variable DUCKDB_WRITE_PARTITION_COLS=false (applies to version 1.4.20+).
Environment Variables
Sling also allows you to pass-in environment variables in order to further customize configurations in a scalable manner. We are then able to reuse them in various places in our config files.
Definition
A convenient way to embed global variables is in the env.yaml file. You could also simply define it in the environment, the traditional way.
connections:
MYSQL:
type: mysql
S3_ZONE_A:
type: s3
# this sets environment variables in sling process
variables:
path_prefix: /my/path/prefix
schema_name: main
SLING_CLI_TOKEN: xxxxxxxxxxxxxxxx # picked up machine wideReplication
Below we are displaying the full use of Environment Variables as well as Runtime Vars (such as stream.schema, stream.table, YYYY, MM and DD).
source: MYSQL
target: S3_ZONE_A
defaults:
# {path_prefix} here is filled in from env var
object: {path_prefix}/{stream.schema}/{stream.table}/{YYYY}_{MM}_{DD}.parquet
target_options:
format: parquet
streams:
# all tables in schema
my_schema.*:
# overwrites default object
object: {stream.schema}/{stream.table}/{YYYY}_{MM}_{DD}/
target_options:
file_max_rows: 400000 # will split files into folder
mysql.my_table:
sql: |
select * from mysql.my_table
where date between '{start_date}' and '{end_date}'
env:
# ${path_prefix} pulls from environment variables in sling process or env
path_prefix: '${path_prefix}' # From env.yaml (not in Environment)
start_date: '${START_DATE}' # From Environment
end_date: '${END_DATE}' # From EnvironmentGlobal Environment Variables
Sling utilizes global environment variables to further configure the load behavior. You can simply define them in your environment, the env.yaml file or the env section in a task or replication. See Global Environment Variables for more details.
Last updated
Was this helpful?