Sling
Slingdata.ioBlogGithubHelp!
  • Introduction
  • Sling CLI
    • Installation
    • Environment
    • Running Sling
    • Global Variables
    • CLI Pro
  • Sling Platform
    • Sling Platform
      • Architecture
      • Agents
      • Connections
      • Editor
      • API
      • Deploy from CLI
  • Concepts
    • Replications
      • Structure
      • Modes
      • Source Options
      • Target Options
      • Columns
      • Transforms
      • Runtime Variables
      • Tags & Wildcards
    • Hooks / Steps
      • Check
      • Command
      • Copy
      • Delete
      • Group
      • Http
      • Inspect
      • List
      • Log
      • Query
      • Replication
      • Store
    • Pipelines
    • Data Quality
      • Constraints
  • Examples
    • File to Database
      • Custom SQL
      • Incremental
    • Database to Database
      • Custom SQL
      • Incremental
      • Backfill
    • Database to File
      • Incremental
  • Connections
    • Database Connections
      • BigTable
      • BigQuery
      • Cloudflare D1
      • Clickhouse
      • DuckDB
      • MotherDuck
      • MariaDB
      • MongoDB
      • Elasticsearch
      • MySQL
      • Oracle
      • Postgres
      • Prometheus
      • Proton
      • Redshift
      • StarRocks
      • SQLite
      • SQL Server
      • Snowflake
      • Trino
    • Storage Connections
      • AWS S3
      • Azure Storage
      • Backblaze B2
      • Cloudflare R2
      • DigitalOcean Spaces
      • FTP
      • Google Storage
      • Local Storage
      • Min.IO
      • SFTP
      • Wasabi
Powered by GitBook
On this page
  1. Concepts
  2. Data Quality

Constraints

Learn how to use constraints

Constraints are a powerful feature that allow you to evaluate each value of a certain column and handle any failures. They can be specified using SQL-like syntax, separated by a | symbol. The advantage of using constraints is that data quality can be ensured while ingesting the data (at runtime), not way later in the pipeline.

source: source_name
target: target_name

streams:
  my_stream:
    columns:
      # id values cannot be null
      id: bigint | value is not null

      # status values can only be active or inactive
      status: string | value in ('active', 'inactive')

      # col_1 value length can only be 6, 7 or 8
      col_1: string(8) | value_len > 5 and value_len <= 8

Handling failures

Sling looks for the environment variable SLING_ON_CONSTRAINT_FAILURE to know what to do.

Here are the values allowed in SLING_ON_CONSTRAINT_FAILURE:

  • warn: This is the default. When using the Sling Platform, this will emit a warning status, which can notify you (Email, Slack, etc.)

  • skip: Skip the record (do not ingest into target)

  • abort: Will immediately abort the run, and fail/error. When using the Sling Platform, this can notify you (Email, Slack, etc.)

Supported Operators

  • is null - Check if value is null

  • is not null - Check if value is not null

  • == - Equal to

  • != or <> - Not equal to

  • > - Greater than

  • >= - Greater than or equal to

  • < - Less than

  • <= - Less than or equal to

  • ~ - Matches regex pattern

  • !~ - Does not match regex pattern

  • in - Value matches any in list

  • not in - Value does not match any in list

  • and - Combine multiple conditions (all must be true)

  • or - Combine multiple conditions (at least one must be true)

Special Variables

  • value - Record value for respective column

  • value_len - Length of record value for respective column

Using CLI Flags

# template. data type must be specified when using constraints
sling run --columns '{ "<column_name>": "<data_type> | <constraint expression>" }'

# values cannot null
sling run --columns '{ "my_column": "bigint | value is not null" }'

# cannot be zero
sling run --columns '{ "my_column": "bigint | value != 0" }'

Using YAML

Using the defaults and streams keys, you can specify different columns/constraints for each stream.

source: source_name
target: target_name

defaults:
  # apply to all streams by default
  columns:
    id: bigint | value is not null  # value cannot be null

streams:
  # inherit defaults
  my_stream:

  my_other_stream:
    columns:
      # can evaluate value_len (number of digits)
      id: bigint | value_len > 0 or value = -10  

      # enum values
      my_column: string | value in ('first', 'second', 'third')

      # pattern matching or is null
      my_other_column: string | value ~ 'abc(d)' or value is null
PreviousData QualityNextFile to Database

Last updated 3 months ago