Command

Command hooks allow you to execute system commands or scripts as part of your replication workflow. This is particularly useful for running data processing scripts, triggering external processes, or performing system-level operations.

Configuration

- type: command
  command: ["executable", "arg1", "arg2"]  # Required: Command and arguments as array or string
  print: true      # Optional: Print command output to console (default: true)
  capture: true    # Optional: Capture command output in hook result (default: false)
  timeout: 300     # Optional: Command timeout in seconds (default: no timeout)
  env:              # Optional: Environment variables for the command
    ENV_VAR1: "value1"
    ENV_VAR2: "value2"
  on_failure: abort # Optional: abort/warn/quiet/skip
  id: my_id         # Optional. Will be generated. Use `log` hook with {runtime_state} to view state.

Properties

Property
Required
Description

command

Yes

String or Array containing the command and its arguments

print

No

Whether to print command output to console (default: false)

capture

No

Whether to capture command output in hook result (default: false)

timeout

No

Command timeout in seconds. If 0 or not specified, no timeout is applied

env

No

Map of environment variables to set for the command

on_failure

No

What to do if the command fails (abort/warn/quiet/skip)

Output

When the command hook executes successfully, it returns the following output that can be accessed in subsequent hooks:

status: success  # Status of the hook execution
binary: "/path/to/executable"  # The binary that was executed
arguments: ["arg1", "arg2"]  # The arguments passed to the command
start: "2024-01-01T00:00:00Z"  # Command start time
end: "2024-01-01T00:00:01Z"  # Command end time
timeout: 300  # Only present if timeout was specified
output:  # Only present if capture: true
  stdout: "Standard output text"
  stderr: "Standard error text"
  combined: "Combined output text"

You can access these values in subsequent hooks using the following syntax (jmespath):

  • {state.hook_id.status} - Status of the hook execution

  • {state.hook_id.binary} - The binary that was executed

  • {state.hook_id.arguments} - The arguments passed to the command

  • {state.hook_id.start} - Command start time

  • {state.hook_id.end} - Command end time

  • {state.hook_id.timeout} - Timeout value (if specified)

  • {state.hook_id.output.stdout} - Standard output (if capture: true)

  • {state.hook_id.output.stderr} - Standard error (if capture: true)

  • {state.hook_id.output.combined} - Combined output (if capture: true)

Examples

Run Data Processing Script

Execute a Python script to process data before replication:

hooks:
  pre:
    - type: command
      command: python scripts/process_data.py --stream "{run.stream.name}"
      timeout: 600  # 10 minute timeout
      env:
        PYTHONPATH: "/path/to/libs"
        DATA_DIR: "{env.data_directory}"
      print: true
      on_failure: abort

System Cleanup

Clean up temporary files after processing:

hooks:
  post:
    - type: command
      command: ["rm", "-rf", "/tmp/processed/{run.stream.name}/*"]
      on_failure: warn

Run Data Quality Checks

Execute a data quality checking script and capture its output:

hooks:
  post:
    - type: command
      command: [
        "python",
        "scripts/quality_check.py",
        "--table", "{run.object.full_name}",
        "--date", "{timestamp.date}"
      ]
      timeout: 1800  # 30 minute timeout
      capture: true
      env:
        DB_CONNECTION: "{target.connection_string}"
      on_failure: warn

Conditional Command Execution

Run commands based on environment or conditions:

hooks:
  post:
    - type: command
      if: env.PRODUCTION == "true"
      command: notify-admin --stream "{run.stream.name}" --status {run.status}
      print: true
      env:
        NOTIFY_TOKEN: "{env.notification_token}"

Run Shell Script

Execute a shell script with parameters:

hooks:
  pre:
    - type: command
      command: [
        "bash",
        "scripts/prepare_environment.sh",
        "{target.environment}",
        "{run.stream.name}"
      ]
      print: true
      capture: true
      on_failure: abort

Generate Reports

Run a report generation tool after successful replication:

hooks:
  post:
    - type: command
      if: run.status == "success"
      command: [
        "report-generator",
        "--input", "{run.object.full_name}",
        "--output", "reports/{run.stream.name}_{timestamp.date}.pdf"
      ]
      timeout: 900  # 15 minute timeout
      env:
        REPORT_TEMPLATE: "templates/standard.tpl"
        OUTPUT_DIR: "/var/reports"

Last updated

Was this helpful?