Constraints

Learn how to use constraints

Constraints are a powerful feature that allow you to evaluate each value of a certain column and handle any failures. They can be specified using SQL-like syntax, separated by a | symbol. The advantage of using constraints is that data quality can be ensured while ingesting the data (at runtime), not way later in the pipeline.

source: source_name
target: target_name

streams:
  my_stream:
    columns:
      # id values cannot be null
      id: bigint | value is not null

      # status values can only be active or inactive
      status: string | value in ('active', 'inactive')

      # col_1 value length can only be 6, 7 or 8
      col_1: string(8) | value_len > 5 and value_len <= 8

Handling failures

Sling looks for the environment variable SLING_ON_CONSTRAINT_FAILURE to know what to do.

Here are the values allowed in SLING_ON_CONSTRAINT_FAILURE:

  • warn: This is the default. When using the Sling Platform, this will emit a warning status, which can notify you (Email, Slack, etc.)

  • skip: Skip the record (do not ingest into target)

  • abort: Will immediately abort the run, and fail/error. When using the Sling Platform, this can notify you (Email, Slack, etc.)

Supported Operators

  • is null - Check if value is null

  • is not null - Check if value is not null

  • == - Equal to

  • != or <> - Not equal to

  • > - Greater than

  • >= - Greater than or equal to

  • < - Less than

  • <= - Less than or equal to

  • ~ - Matches regex pattern

  • !~ - Does not match regex pattern

  • in - Value matches any in list

  • not in - Value does not match any in list

  • and - Combine multiple conditions (all must be true)

  • or - Combine multiple conditions (at least one must be true)

Special Variables

  • value - Record value for respective column

  • value_len - Length of record value for respective column

Using CLI Flags

# template. data type must be specified when using constraints
sling run --columns '{ "<column_name>": "<data_type> | <constraint expression>" }'

# values cannot null
sling run --columns '{ "my_column": "bigint | value is not null" }'

# cannot be zero
sling run --columns '{ "my_column": "bigint | value != 0" }'

Using YAML

Using the defaults and streams keys, you can specify different columns/constraints for each stream.

source: source_name
target: target_name

defaults:
  # apply to all streams by default
  columns:
    id: bigint | value is not null  # value cannot be null

streams:
  # inherit defaults
  my_stream:

  my_other_stream:
    columns:
      # can evaluate value_len (number of digits)
      id: bigint | value_len > 0 or value = -10  

      # enum values
      my_column: string | value in ('first', 'second', 'third')

      # pattern matching or is null
      my_other_column: string | value ~ 'abc(d)' or value is null

Last updated