Transforms
Using Sling transforms
Sling provides powerful data transformation capabilities that allow you to clean, modify, and enhance your data during the replication process. Transform data inline without needing separate ETL tools or complex SQL queries.
Starting with v1.4.17, Sling introduced Functions and Staged Transforms - a major enhancement that provides access to 50+ built-in functions with flexible expression-based transformations. Before this version, only a limited set of legacy transforms were available.
Transform Input Structures
Sling supports three different input structures for transforms, each offering varying degrees of flexibility and simplicity:
Array of Strings - Simple Global Transforms
Apply transform functions in sequence to all column values. Works with functions that accept 1 parameter.
# Apply to all columns globally
transforms: ["trim_space", "remove_diacritics"]# CLI usage
sling run --transforms '["trim_space", "remove_diacritics"]' ...Use case: When you need the same transformation applied to every column in your dataset.
Map/Object - Column-Specific Transforms
Apply transform functions in sequence to specific columns. Works with functions that accept 1 parameter.
# Apply to specific columns
transforms:
name: ["trim_space", "upper"]
email: ["lower"]
customer_id: ["hash"]
target: ["bool_parse"]
text: ["replace_non_printable"]Use case: When you need different transformations for different columns, or want to combine global and column-specific transforms.
Array of Objects - Staged Transforms
Multi-stage transformations with expressions and functions, evaluated in order. Each stage can modify existing columns or create new ones.
Use case: When you need complex, multi-step transformations with conditional logic, cross-column references, or computed fields.
Staged Transforms
Staged transforms provide the most powerful and flexible transformation capabilities in Sling. They allow you to:
Multi-stage processing: Apply transformations in sequential stages
Cross-column references: Use
record.<column>to reference other column valuesConditional logic: Apply
if/then/elselogic based on multiple conditionsCreate new columns: Generate computed columns based on existing data
Access to 50+ functions: Use string, numeric, date, and utility functions
Syntax
Staged transforms use an array of transformation stages. Each stage can modify existing columns or create new ones using expressions parsed by Goval, a powerful Go expression evaluator:
Key Features
1. Value Transformations
Use value to reference the current column's value:
2. Record References
Use record.<column> to reference other columns in the same row:
3. Multi-stage Processing
Each stage can build upon the results of previous stages:
4. Conditional Transformations
Use ternary operators and conditional logic:
5. Available Functions
Access to 50+ built-in functions including:
String functions:
upper(),lower(),trim_space(),replace(),substring()Numeric functions:
int_parse(),float_parse(),greatest(),least()Date functions:
now(),date_parse(),date_format(),date_add()Utility functions:
hash(),coalesce(),cast(),uuid()Conditional functions:
if(),equals(),is_null(),is_empty()
See the complete Functions Reference for all available functions.
6. Creating New Columns On-The-Fly
One of the most powerful features of staged transforms is the ability to create new columns dynamically during data processing. Unlike traditional transformations that only modify existing columns, staged transforms let you add computed columns, derived fields, and calculated values without modifying your source data or schema.
Key Benefits:
No schema changes needed: Add columns without altering source tables
Dynamic calculations: Create computed fields based on existing data
Multi-stage logic: Build complex columns using results from previous stages
Flexible data enrichment: Add metadata, hashes, flags, or derived metrics
Common Use Cases:
Data enrichment: Add calculated fields, hash values, or lookup results
Business logic: Create status flags, categories, or computed metrics
Data quality: Add validation flags, completeness scores, or data lineage
Analytics preparation: Create derived dimensions or calculated measures
Examples
Example 1: Customer Data Processing
Example 2: E-commerce Order Processing
Example 3: Using CLI
Other Transformation Methods
Beyond the main transform structures, Sling provides additional methods for data transformation:
Custom SQL Transformations
Use custom SELECT queries as the source stream for complex transformations that go beyond what built-in transforms can handle:
JSON Flattening
Automatically flatten nested JSON structures using the flatten key in source.options (see configuration docs):
JMESPath Transforms
Extract and transform specific data from JSON responses using JMESPath expressions via the jmespath key in source.options. Here's an example extracting models information from a DBT manifest file:
Schema Evolution
When using Sling to extract/load data in a incremental manner, it will attempt to match whatever columns are present in both the source stream and target table. If an extra column is present in the source stream, it will add it in the target table. If no columns match from source stream at all, it will error. At least the primary_key or update_key must be present in the target table.
See below for a simple example, mimicking the addition and removal of columns.
We can see that sling handled the changes properly, in a non-destructive manner. If the source stream were from a database, the same rules would apply, whether a column disappeared or appeared.
Legacy Transform Syntax (Before v1.4.17)
Note: This is the legacy syntax for transforms used before v1.4.17. While still supported for backwards compatibility, we recommend using the new Transform Input Structures for new implementations as they provide much more powerful capabilities and access to functions.
These are legacy single-column transforms that can be applied to clean and modify data:
parse_bit: Parses binary data as bitsparse_fix: Parses FIX (Financial Information eXchange) protocol messages into JSON formatparse_uuid: Parses 16-byte UUID into string formatparse_ms_uuid: Parses 16-byte Microsoft UUID into string formatreplace_0x00: Replaces null characters (0x00) with an empty stringreplace_accents: Replaces accented characters with their non-accented equivalentsreplace_non_printable: Replaces or removes non-printable characterstrim_space: Removes leading and trailing whitespaceempty_as_null: If value is empty, set as null
Last updated
Was this helpful?