Constraints
Learn how to use constraints
Constraints are a powerful feature that allow you to evaluate each value of a certain column and handle any failures. They can be specified using SQL-like syntax, separated by a | symbol. The advantage of using constraints is that data quality can be ensured while ingesting the data (at runtime), not way later in the pipeline.
source: source_name
target: target_name
streams:
my_stream:
columns:
# id values cannot be null
id: bigint | value is not null
# status values can only be active or inactive
status: string | value in ('active', 'inactive')
# col_1 value length can only be 6, 7 or 8
col_1: string(8) | value_len > 5 and value_len <= 8Handling failures
Sling looks for the environment variable SLING_ON_CONSTRAINT_FAILURE to know what to do.
Here are the values allowed in SLING_ON_CONSTRAINT_FAILURE:
warn: This is the default. When using the Sling Platform, this will emit a warning status, which can notify you (Email, Slack, etc.)skip: Skip the record (do not ingest into target)abort: Will immediately abort the run, and fail/error. When using the Sling Platform, this can notify you (Email, Slack, etc.)
Supported Operators
is null- Check if value is nullis not null- Check if value is not null==- Equal to!=or<>- Not equal to>- Greater than>=- Greater than or equal to<- Less than<=- Less than or equal to~- Matches regex pattern!~- Does not match regex patternin- Value matches any in listnot in- Value does not match any in listand- Combine multiple conditions (all must be true)or- Combine multiple conditions (at least one must be true)
Special Variables
value- Record value for respective columnvalue_len- Length of record value for respective column
Using CLI Flags
# template. data type must be specified when using constraints
sling run --columns '{ "<column_name>": "<data_type> | <constraint expression>" }'
# values cannot null
sling run --columns '{ "my_column": "bigint | value is not null" }'
# cannot be zero
sling run --columns '{ "my_column": "bigint | value != 0" }'
Using YAML
Using the defaults and streams keys, you can specify different columns/constraints for each stream.
source: source_name
target: target_name
defaults:
# apply to all streams by default
columns:
id: bigint | value is not null # value cannot be null
streams:
# inherit defaults
my_stream:
my_other_stream:
columns:
# can evaluate value_len (number of digits)
id: bigint | value_len > 0 or value = -10
# enum values
my_column: string | value in ('first', 'second', 'third')
# pattern matching or is null
my_other_column: string | value ~ 'abc(d)' or value is nullLast updated
Was this helpful?