Pipelines

Use Pipelines to orchestrate multiple steps in sequence

A Pipeline in Sling allows you to execute multiple steps in sequence. Each step can be a different type of operation, enabling you to create complex workflows by chaining together various actions like running replications, executing queries, making HTTP requests, and more.

Pipeline Configuration

A pipeline is defined in YAML format with a steps key at the root level containing an array of steps. Each step supports the same types and configurations as Hooks.

steps:
  - type: log
    message: "Starting pipeline execution"

  - type: replication
    path: path/to/replication.yaml
    id: my_replication

  - type: query
    if: state.my_replication.status == "success"
    connection: my_database
    query: "UPDATE status SET completed = true"

env:
  MY_KEY: VALUE

Available Step Types

Pipelines support all the same types as Hooks:

Step Type
Description
Documentation

Check

Validate conditions and control flow

Command

Run any command/process

Copy

Transfer files between local or remote storage connections

Delete

Remove files from local or remote storage connections

Group

Run sequences of steps or loop over values

HTTP

Make HTTP requests to external services

Inspect

Inspect a file or folder

List

List files in folder

Log

Output custom messages and create audit trails

Query

Execute SQL queries against any defined connection

Replication

Run a Replication

Common Step Properties

Each step shares the same common properties as hooks:

Property
Description
Required

type

The type of step (query/ http/ check/ copy / delete/ log / inspect)

Yes

if

Optional condition to determine if the step should execute

No

id

A specific identifier to refer to the step output data

No

on_failure

What to do if the step fails (abort/ warn/ quiet/skip)

No (defaults to abort)

Variables Available

Pipeline steps have access to the runtime state which includes various variables that can be referenced using curly braces {variable}. The available variables include:

  • runtime_state - Contains all state variables available

  • env.* - All variables defined in the env

  • timestamp.* - Various timestamp parts information

  • steps.* - Output data from previous steps (referenced by their id)

You can view all available variables by using a log step:

steps:
  - type: log
    message: '{runtime_state}'

Example Pipeline

Here's a complete example that demonstrates various pipeline capabilities:

env:
  DATABASE: production
  NOTIFY_URL: https://api.example.com/webhook

steps:
  # Log the start of execution
  - type: log
    message: "Starting pipeline execution"

  # Run a replication
  - type: replication
    path: replications/daily_sync.yaml
    id: daily_sync
    on_failure: warn

  # Validate the results
  - type: check
    check: state.daily_sync.status == "success"
    message: "Daily sync failed"
    on_failure: abort

  # Update status in database
  - type: query
    connection: "{env.DATABASE}"
    query: |
      UPDATE pipeline_status 
      SET last_run = current_timestamp
      WHERE name = 'daily_sync'
    on_failure: warn

  # Send notification
  - type: http
    url: "{env.NOTIFY_URL}"
    method: POST
    payload: |
      {
        "pipeline": "daily_sync",
        "status": "success",
      }

  # Log completion
  - type: log
    message: "Pipeline completed successfully"

Best Practices

  1. Error Handling: Use appropriate on_failure behaviors for each step

  2. Validation: Include check steps to validate critical conditions

  3. Logging: Add log steps for better observability

  4. Modularity: Break down complex operations into multiple steps

  5. Conditions: Use if conditions to control step execution

  6. Variables: Leverage environment variables and runtime state for dynamic configuration

  7. Identifiers: Use meaningful ids for steps when you need to reference their output later

Running a Pipeline

You can run a pipeline using the Sling CLI:

sling run --pipeline path/to/pipeline.yaml

Last updated