Transformations
Using Sling transformations
Can Sling do the T part of ELT?
Not in the traditional sense. However, Sling is capable of the below transformations.
Here is a list of built-in transforms that Sling can perform:
decode_latin1
: Decodes Latin-1 (ISO-8859-1) encoded text to UTF-8decode_latin5
: Decodes Latin-5 (ISO-8859-5) encoded text to UTF-8decode_latin9
: Decodes Latin-9 (ISO-8859-15) encoded text to UTF-8decode_utf8
: Decodes UTF-8 encoded textdecode_utf8_bom
: Decodes UTF-8 with BOM encoded textdecode_utf16
: Decodes UTF-16 encoded text to UTF-8decode_windows1250
: Decodes Windows-1250 encoded text to UTF-8decode_windows1252
: Decodes Windows-1252 encoded text to UTF-8duckdb_list_to_text
: Adds a space suffix to DuckDB lists to prevent JSON parsing errorsencode_latin1
: Encodes text to Latin-1 (ISO-8859-1)encode_latin5
: Encodes text to Latin-5 (ISO-8859-5)encode_latin9
: Encodes text to Latin-9 (ISO-8859-15)encode_utf8
: Encodes text to UTF-8encode_utf8_bom
: Encodes text to UTF-8 with BOMencode_utf16
: Encodes text to UTF-16encode_windows1250
: Encodes text to Windows-1250encode_windows1252
: Encodes text to Windows-1252hash_md5
: Generates MD5 hash of the inputhash_sha256
: Generates SHA-256 hash of the inputhash_sha512
: Generates SHA-512 hash of the inputparse_bit
: Parses binary data as bitsparse_fix
: Parses FIX (Financial Information eXchange) protocol messages into JSON formatparse_uuid
: Parses 16-byte UUID into string formatparse_ms_uuid
: Parses 16-byte Microsoft UUID into string formatreplace_0x00
: Replaces null characters (0x00) with an empty stringreplace_accents
: Replaces accented characters with their non-accented equivalentsreplace_non_printable
: Replaces or removes non-printable characterstrim_space
: Removes leading and trailing whitespaceset_timezone
(v1.2.16): Sets the timezone for datetime values. Accepts the IANA timezone string (e.g. "America/New_York")
Examples using the CLI
Examples using a replication.yaml
Other Transformations
Custom SQL
Custom SELECT
query as the source stream, using a custom SQL input as a source.stream
. Here is an example::
Column Type Casting
Column type coercion, the columns
key in source.options (see here). This is especially useful for file sources. Here is an example casting types of columns:
JSON Flattening
JSON Flattening: the flatten
key in source.options (see here). Example:
JMESPath Transforms
JMESPath transformations for JSON source files: the jmespath
key in source.options
. Here is a example, extracting the models information from the DBT manifest file using jmespath
+ flatten
options:
Schema Evolution
When using Sling to extract/load data in a incremental
manner, it will attempt to match whatever columns are present in both the source stream and target table. If an extra column is present in the source stream, it will add it in the target table. If no columns match from source stream at all, it will error. At least the primary_key
or update_key
must be present in the target table.
See below for a simple example, mimicking the addition and removal of columns.
We can see that sling handled the changes properly, in a non-destructive manner. If the source stream were from a database, the same rules would apply, whether a column disappeared or appeared.
Suggestions?
There are currently no plans to supports other custom transformations, in the traditional "T" sense. However, any suggestions are welcome.
Last updated