Hooks / Steps
Execute custom actions throughout your replication or pipeline
Last updated
Execute custom actions throughout your replication or pipeline
Last updated
Hooks are powerful mechanisms in Sling that allow you to execute custom actions before (pre-hooks) or after (post-hooks) a replication stream, as well as at the start or end of the replication parent cycle (before first stream and/or after last stream). They enable you to extend and customize your data pipeline with various operations such as data validation, notifications, file management, and custom processing.
Some typical operations include:
Pre-Hooks: Execute before the replication stream run starts
Validate prerequisites
Download necessary files
Set up configurations
Perform cleanup operations
Post-Hooks: Execute after the replication stream run completes
Validate results
Send notifications
Upload processed files
Clean up temporary files
Log completion status
Check
Validate conditions and control flow
Command
Run any command/process
Copy
Transfer files between local or remote storage connections
Delete
Remove files from local or remote storage connections
Group
Run sequences of steps or loop over values
HTTP
Make HTTP requests to external services
Inspect
Inspect a file or folder
List
List files in folder
Log
Output custom messages and create audit trails
Query
Execute SQL queries against any defined connection
Replication
Run a Replication
Store
Store values for later in-process access
Hooks can be configured in two locations:
At the defaults
level (applies to all streams)
At the individual stream
level (overrides defaults)
Stream level Hooks
We can use the following structure to decare hooks with the hooks
key, under the defaults
branch or under any stream branch.
Replication level Hooks
We can also define hooks to run at the replication file level, meaning before any of the streams run and/or after all the streams have ran. For replication level hooks, we must declare the start
and end
hooks at the root of the YAML configuration.
All hook types share some common properties:
type
The type of hook (query
/ http
/ check
/ copy
/ delete
/ log
/ inspect
)
Yes
if
Optional condition to determine if the hook should execute
No
id
a specify identifier to refer to the hook output data.
No
on_failure
What to do if the hook fails (abort
/ warn
/ quiet
/skip
)
No (defaults to abort
)
runtime_state
- Contains all state variables available
env.*
- All variables defined in the env
timestamp.*
- Various timestamp parts information
source.*
- Source connection information
execution.*
- Replication run level information
target.*
- Target connection information
stream.*
- Current source stream info
object.*
- Current target object info
state.*
- All hooks output state information
runs.*
- All runs information
run.*
- Current stream run information
The best way to view any available variables is to print the runtime_state
variable.
For example, using the log
hook as shown below will print all available variables.
Shows something like below JSON payload.
state["start-02"].status
- Gets the status of a hook (returns success
)
run.total_rows
- Gets the number of rows processed in the current run (returns 34
)
timestamp.unix
- The epoch/unix timestamp (return 1737286051
)
source.bucket
- Gets the source S3 bucket name (returns my-bucket-1
)
run.config.primary_key[0]
- Gets the first primary key column (returns id
)
stream.file_path
- Gets the current stream's file path (returns test/public_test1k_postgres_pg_parquet/update_dt_year=2018/update_dt_month=11
)
Error Handling: Specify appropriate on_failure
behavior. The default value is abort
.
Validation: Use check
hooks to validate prerequisites and results
Logging: Implement log
hooks for better observability
Cleanup: Use delete
hooks to manage temporary / old files
Modularity: Break down complex operations into multiple hooks
Conditions: Use if
conditions to control hook execution
Environment Awareness: Consider different environments in hook configurations
Furthermore, we can access any data-point using a :