Check
Check hooks allow you to validate conditions and control the flow of your replication process. They are useful for implementing data quality checks, validating prerequisites, and ensuring business rules are met.
Configuration
- type: check
check: "run.total_rows > threshold" # Required: The condition to evaluate
failure_message: '{run.total_rows} is below threshold' # Optional: the message to use as an error
vars: # Optional: Local variables for the check
threshold: 1000
min_date: "2023-01-01"
on_failure: abort # Optional: abort/warn/quiet/skip
id: my_id # Optional. Will be generated. Use `log` hook with {runtime_state} to view state.Properties
check
Yes
The condition to evaluate
failure_message
No
A Message to use as the error if check fails
vars
No
Map of scoped variables that can be used in the check
on_failure
No
What to do if the check fails (abort/warn/quiet/skip)
Output
When the check hook executes successfully, it returns the following output that can be accessed in subsequent hooks:
status: success # Status of the hook execution
failure_message: message # The rendered message
result: true # The result of the check evaluation (true/false)You can access these values in subsequent hooks using the following syntax (jmespath):
{state.hook_id.check}- the compiled expresion to check{state.hook_id.status}- Status of the hook execution{state.hook_id.result}- Boolean result of the check
Examples
Basic Row Count Validation
Ensure that the replication processed a minimum number of rows:
hooks:
post:
- type: check
check: "run.total_rows >= min_rows"
vars:
min_rows: 100
on_failure: abortMultiple Condition Check
Validate multiple conditions before starting replication:
hooks:
pre:
- type: check
check: |
run.stream.schema != '' &&
run.object.schema != '' &&
timestamp.hour >= 1 &&
timestamp.hour <= 23
on_failure: abortData Quality Threshold Check
Verify that the error rate in processed data is below a threshold:
hooks:
post:
- type: check
check: |
state.quality_check.result.error_rate <= max_error_rate
vars:
max_error_rate: 0.01 # 1% error rate threshold
on_failure: warnTime Window Validation
Ensure replication runs within specific time windows:
hooks:
pre:
- type: check
check: |
(timestamp.hour >= start_hour &&
timestamp.hour <= end_hour) ||
(timestamp.day_name in allowed_days)
vars:
start_hour: 20 # 8 PM
end_hour: 6 # 6 AM
allowed_days: ["Saturday", "Sunday"]
on_failure: skipComplex Business Rule Validation
Implement complex business rules with multiple conditions:
hooks:
post:
- type: check
check: |
(run.total_rows >= min_rows &&
run.total_rows <= max_rows) &&
(run.duration <= max_duration) &&
(state.data_check.result.null_percentage <= max_null_percent)
vars:
min_rows: 1000
max_rows: 1000000
max_duration: 3600 # 1 hour
max_null_percent: 5
on_failure: abortEnvironment-Based Validation
Apply different validation rules based on the environment:
hooks:
pre:
- type: check
check: |
(env.ENVIRONMENT == 'production' AND
run.stream.name IN prod_allowed_streams) OR
(env.ENVIRONMENT != 'production')
vars:
prod_allowed_streams: ["customers", "orders", "products"]
on_failure: abortResource Usage Check
Validate system resource availability before proceeding:
hooks:
pre:
- type: check
check: |
state.resource_check.result.available_disk_space >= min_disk_space &&
state.resource_check.result.available_memory >= min_memory
vars:
min_disk_space: 10737418240 # 10GB in bytes
min_memory: 4294967296 # 4GB in bytes
on_failure: warnData Freshness Check
Ensure source data is fresh enough before replication:
hooks:
pre:
- type: check
check: |
state.freshness_check.result.last_update_time >=
timestamp.unix - max_age_seconds
vars:
max_age_seconds: 3600 # 1 hour
on_failure: skipLast updated
Was this helpful?