Pipelines

Use Pipelines to orchestrate multiple steps in sequence

A Pipeline in Sling allows you to execute multiple steps in sequence. Each step can be a different type of operation, enabling you to create complex workflows by chaining together various actions like running replications, executing queries, making HTTP requests, and more.

Pipeline Configuration

A pipeline is defined in YAML format with a steps key at the root level containing an array of steps. Each step supports the same types and configurations as Hooks.

steps:
  - type: log
    message: "Starting pipeline execution"

  - type: replication
    path: path/to/replication.yaml
    id: my_replication

  - type: query
    if: state.my_replication.status == "success"
    connection: my_database
    query: "UPDATE status SET completed = true"

env:
  MY_KEY: VALUE

Available Step Types

Pipelines support all the same types as Hooks:

Step Type

Description

Documentation

Check

Validate conditions and control flow

Check Step

Command

Run any command/process

Command Step

Copy

Transfer files between local or remote storage connections

Copy Step

Delete

Remove files from local or remote storage connections

Delete Step

Group

Run sequences of steps or loop over values

Group Step

HTTP

Make HTTP requests to external services

HTTP Step

Inspect

Inspect a file or folder

Inspect Step

List

List files in folder

List Step

Log

Output custom messages and create audit trails

Log Step

Query

Execute SQL queries against any defined connection

Query Step

Replication

Run a Replication

Replication Step

Common Step Properties

Each step shares the same common properties as hooks:

Property

Description

Required

type

The type of step (query/ http/ check/ copy / delete/ log / inspect)

Yes

if

Optional condition to determine if the step should execute

id

A specific identifier to refer to the step output data

on_failure

What to do if the step fails (abort/ warn/ quiet/skip)

No (defaults to abort)

Variables Available

Pipeline steps have access to the runtime state which includes various variables that can be referenced using curly braces {variable}. The available variables include:

runtime_state - Contains all state variables available
env.* - All variables defined in the env
timestamp.* - Various timestamp parts information
steps.* - Output data from previous steps (referenced by their id)

You can view all available variables by using a log step:

steps:
  - type: log
    message: '{runtime_state}'

Example Pipeline

Here's a complete example that demonstrates various pipeline capabilities:

env:
  DATABASE: production
  NOTIFY_URL: https://api.example.com/webhook

steps:
  # Log the start of execution
  - type: log
    message: "Starting pipeline execution"

  # Run a replication
  - type: replication
    path: replications/daily_sync.yaml
    id: daily_sync
    on_failure: warn

  # Validate the results
  - type: check
    check: state.daily_sync.status == "success"
    message: "Daily sync failed"
    on_failure: abort

  # Update status in database
  - type: query
    connection: "{env.DATABASE}"
    query: |
      UPDATE pipeline_status 
      SET last_run = current_timestamp
      WHERE name = 'daily_sync'
    on_failure: warn

  # Send notification
  - type: http
    url: "{env.NOTIFY_URL}"
    method: POST
    payload: |
      {
        "pipeline": "daily_sync",
        "status": "success",
      }

  # Log completion
  - type: log
    message: "Pipeline completed successfully"

Best Practices

Error Handling: Use appropriate on_failure behaviors for each step
Validation: Include check steps to validate critical conditions
Logging: Add log steps for better observability
Modularity: Break down complex operations into multiple steps
Conditions: Use if conditions to control step execution
Variables: Leverage environment variables and runtime state for dynamic configuration
Identifiers: Use meaningful ids for steps when you need to reference their output later

Running a Pipeline

You can run a pipeline using the Sling CLI:

sling run --pipeline path/to/pipeline.yaml

PreviousReplication NextData Quality

Last updated 1 month ago