Hooks / Steps
Execute custom actions throughout your replication or pipeline
Hooks are powerful mechanisms in Sling that allow you to execute custom actions before (pre-hooks) or after (post-hooks) a replication stream, as well as at the start or end of the replication parent cycle (before first stream and/or after last stream). They enable you to extend and customize your data pipeline with various operations such as data validation, notifications, file management, and custom processing.
Hooks are the same as Steps, when using sling in Pipeline mode.
Some typical operations include:
Pre-Hooks: Execute before the replication stream run starts
Validate prerequisites
Download necessary files
Set up configurations
Perform cleanup operations
Post-Hooks: Execute after the replication stream run completes
Validate results
Send notifications
Upload processed files
Clean up temporary files
Log completion status
Available Hook Types
Check
Validate conditions and control flow
Command
Run any command/process
Copy
Transfer files between local or remote storage connections
Delete
Remove files from local or remote storage connections
Group
Run sequences of steps or loop over values
HTTP
Make HTTP requests to external services
Inspect
Inspect a file or folder
List
List files in folder
Log
Output custom messages and create audit trails
Query
Execute SQL queries against any defined connection
Replication
Run a Replication
Hook Configuration
Hooks can be configured in two locations:
At the
defaults
level (applies to all streams)At the individual
stream
level (overrides defaults)
Stream level Hooks
We can use the following structure to decare hooks with the hooks
key, under the defaults
branch or under any stream branch.
Replication level Hooks
We can also define hooks to run at the replication file level, meaning before any of the streams run and/or after all the streams have ran. For replication level hooks, we must declare the start
and end
hooks at the root of the YAML configuration.
Common Hook Properties
All hook types share some common properties:
type
The type of hook (query
/ http
/ check
/ copy
/ delete
/ log
/ inspect
)
Yes
if
Optional condition to determine if the hook should execute
No
id
a specify identifier to refer to the hook output data.
No
on_failure
What to do if the hook fails (abort
/ warn
/ quiet
/skip
)
No (defaults to abort
)
Variables Available
runtime_state
- Contains all state variables availableenv.*
- All variables defined in theenv
timestamp.*
- Various timestamp parts informationsource.*
- Source connection informationexecution.*
- Replication run level informationtarget.*
- Target connection informationstream.*
- Current source stream infoobject.*
- Current target object infostate.*
- All hooks output state informationruns.*
- All runs informationrun.*
- Current stream run information
The best way to view any available variables is to print the runtime_state
variable.
For example, using the log
hook as shown below will print all available variables.
Shows something like below JSON payload.
Furthermore, we can access any data-point using a jmespath
expression:
state["start-02"].status
- Gets the status of a hook (returnssuccess
)run.total_rows
- Gets the number of rows processed in the current run (returns34
)timestamp.unix
- The epoch/unix timestamp (return1737286051
)source.bucket
- Gets the source S3 bucket name (returnsmy-bucket-1
)run.config.primary_key[0]
- Gets the first primary key column (returnsid
)stream.file_path
- Gets the current stream's file path (returnstest/public_test1k_postgres_pg_parquet/update_dt_year=2018/update_dt_month=11
)
Complete Example
Best Practices
Error Handling: Specify appropriate
on_failure
behavior. The default value isabort
.Validation: Use
check
hooks to validate prerequisites and resultsLogging: Implement
log
hooks for better observabilityCleanup: Use
delete
hooks to manage temporary / old filesModularity: Break down complex operations into multiple hooks
Conditions: Use
if
conditions to control hook executionEnvironment Awareness: Consider different environments in hook configurations
Last updated