Check

Check hooks allow you to validate conditions and control the flow of your replication process. They are useful for implementing data quality checks, validating prerequisites, and ensuring business rules are met.

Configuration

- type: check
  check: "run.total_rows > threshold"  # Required: The condition to evaluate
  vars:                       # Optional: Local variables for the check
    threshold: 1000
    min_date: "2023-01-01"
  on_failure: abort          # Optional: abort/warn/quiet/skip
  id: my_id                  # Optional. Will be generated. Use `log` hook with {runtime_state} to view state.

Properties

Property
Required
Description

check

Yes

The condition to evaluate

vars

No

Map of scoped variables that can be used in the check

on_failure

No

What to do if the check fails (abort/warn/quiet/skip)

Output

When the check hook executes successfully, it returns the following output that can be accessed in subsequent hooks:

status: success  # Status of the hook execution
result: true     # The result of the check evaluation (true/false)

You can access these values in subsequent hooks using the following syntax (jmespath):

  • {state.hook_id.check} - the compiled expresion to check

  • {state.hook_id.status} - Status of the hook execution

  • {state.hook_id.result} - Boolean result of the check

Examples

Basic Row Count Validation

Ensure that the replication processed a minimum number of rows:

hooks:
  post:
    - type: check
      check: "run.total_rows >= min_rows"
      vars:
        min_rows: 100
      on_failure: abort

Multiple Condition Check

Validate multiple conditions before starting replication:

hooks:
  pre:
    - type: check
      check: |
        run.stream.schema != '' && 
        run.object.schema != '' && 
        timestamp.hour >= 1 && 
        timestamp.hour <= 23
      on_failure: abort

Data Quality Threshold Check

Verify that the error rate in processed data is below a threshold:

hooks:
  post:
    - type: check
      check: |
        state.quality_check.result.error_rate <= max_error_rate
      vars:
        max_error_rate: 0.01  # 1% error rate threshold
      on_failure: warn

Time Window Validation

Ensure replication runs within specific time windows:

hooks:
  pre:
    - type: check
      check: |
        (timestamp.hour >= start_hour && 
         timestamp.hour <= end_hour) ||
        (timestamp.day_name in allowed_days)
      vars:
        start_hour: 20  # 8 PM
        end_hour: 6    # 6 AM
        allowed_days: ["Saturday", "Sunday"]
      on_failure: skip

Complex Business Rule Validation

Implement complex business rules with multiple conditions:

hooks:
  post:
    - type: check
      check: |
        (run.total_rows >= min_rows && 
         run.total_rows <= max_rows) &&
        (run.duration <= max_duration) &&
        (state.data_check.result.null_percentage <= max_null_percent)
      vars:
        min_rows: 1000
        max_rows: 1000000
        max_duration: 3600  # 1 hour
        max_null_percent: 5
      on_failure: abort

Environment-Based Validation

Apply different validation rules based on the environment:

hooks:
  pre:
    - type: check
      check: |
        (env.ENVIRONMENT == 'production' AND 
         run.stream.name IN prod_allowed_streams) OR
        (env.ENVIRONMENT != 'production')
      vars:
        prod_allowed_streams: ["customers", "orders", "products"]
      on_failure: abort

Resource Usage Check

Validate system resource availability before proceeding:

hooks:
  pre:
    - type: check
      check: |
        state.resource_check.result.available_disk_space >= min_disk_space &&
        state.resource_check.result.available_memory >= min_memory
      vars:
        min_disk_space: 10737418240  # 10GB in bytes
        min_memory: 4294967296      # 4GB in bytes
      on_failure: warn

Data Freshness Check

Ensure source data is fresh enough before replication:

hooks:
  pre:
    - type: check
      check: |
        state.freshness_check.result.last_update_time >= 
        timestamp.unix - max_age_seconds
      vars:
        max_age_seconds: 3600  # 1 hour
      on_failure: skip

Last updated