Sling
Slingdata.ioBlogGithubHelp!
  • Introduction
  • Sling CLI
    • Installation
    • Environment
    • Running Sling
    • Global Variables
    • CLI Pro
  • Sling Platform
    • Sling Platform
      • Architecture
      • Agents
      • Connections
      • Editor
      • API
      • Deploy from CLI
  • Concepts
    • Replications
      • Structure
      • Modes
      • Source Options
      • Target Options
      • Columns
      • Transforms
      • Runtime Variables
      • Tags & Wildcards
    • Hooks / Steps
      • Check
      • Command
      • Copy
      • Delete
      • Group
      • Http
      • Inspect
      • List
      • Log
      • Query
      • Replication
      • Store
    • Pipelines
    • Data Quality
      • Constraints
  • Examples
    • File to Database
      • Custom SQL
      • Incremental
    • Database to Database
      • Custom SQL
      • Incremental
      • Backfill
    • Database to File
      • Incremental
  • Connections
    • Database Connections
      • BigTable
      • BigQuery
      • Cloudflare D1
      • Clickhouse
      • DuckDB
      • MotherDuck
      • MariaDB
      • MongoDB
      • Elasticsearch
      • MySQL
      • Oracle
      • Postgres
      • Prometheus
      • Proton
      • Redshift
      • StarRocks
      • SQLite
      • SQL Server
      • Snowflake
      • Trino
    • Storage Connections
      • AWS S3
      • Azure Storage
      • Backblaze B2
      • Cloudflare R2
      • DigitalOcean Spaces
      • FTP
      • Google Storage
      • Local Storage
      • Min.IO
      • SFTP
      • Wasabi
Powered by GitBook
On this page
  • Configuration
  • Properties
  • Output
  • Examples
  • Verify File Existence (Pre-Hook)
  • Check File Size (Pre-Hook)
  • Monitor File Changes (Pre-Hook)
  • Directory Validation (Pre-Hook)
  • File Age Check (Pre-Hook)
  • Notes
  1. Concepts
  2. Hooks / Steps

Inspect

The inspect hook allows you to retrieve metadata about a file or directory from any supported filesystem connection. This is particularly useful for validating file existence, checking file properties, or monitoring file changes.

Configuration

- type: inspect
  location: aws_s3/path/to/file      # Required: Location name
  recursive: true/false   # Optional: whether to get total count/size nested files 
  on_failure: abort       # Optional: abort/warn/quiet/skip
  id: my_id               # Optional. Will be generated. Use `log` hook with {runtime_state} to view state.

Properties

Property
Required
Description

location

Yes

recursive

No

Whether to get total count/size nested files (true/false)

on_failure

No

What to do if the inspection fails (abort/warn/quiet/skip)

Output

When the inspect hook executes successfully, it returns the following output that can be accessed in subsequent hooks:

status: success  # Status of the hook execution
exists: true     # Whether the path exists
path: "path/to/file"  # The normalized path
name: "file"     # The name of the file/directory
uri: "s3://bucket/path/to/file"  # The full URI
is_dir: false    # Whether the path is a directory
size: 1024       # Total size in bytes (0 for directories if not recursive)
node_count: 0    # Total count of folders/files (if recursive)
folder_count: 0  # Total count of folders (if recursive)
file_count: 1    # Total count of files (if recursive)
created_at: "2023-01-01T00:00:00Z"  # Creation timestamp if available
created_at_unix: 1672531200  # Creation unix timestamp if available
updated_at: "2023-01-02T00:00:00Z"  # Last modified timestamp if available
updated_at_unix: 1672617600  # Last modified unix timestamp if available

You can access these values in subsequent hooks using the following syntax (jmespath):

  • {state.hook_id.status} - Status of the hook execution

  • {state.hook_id.exists} - Whether the path exists

  • {state.hook_id.path} - The normalized path

  • {state.hook_id.name} - The name of the file/directory

  • {state.hook_id.uri} - The full URI

  • {state.hook_id.is_dir} - Whether the path is a directory

  • {state.hook_id.size} - Size in bytes

  • {state.hook_id.created_at} - Creation timestamp

  • {state.hook_id.created_at_unix} - Creation unix timestamp

  • {state.hook_id.updated_at} - Last modified timestamp

  • {state.hook_id.updated_at_unix} - Last modified unix timestamp

Examples

Verify File Existence (Pre-Hook)

Check if a required file exists before processing:

hooks:
  pre:
    - type: inspect
      location: "aws_s3/data/config/{run.stream.name}.json"
      id: config_check
      on_failure: warn

    - type: check
      check: "state.config_check.exists == true"
      on_failure: abort

Check File Size (Pre-Hook)

Ensure a file is not empty before processing:

hooks:
  pre:
    - type: inspect
      location: "gcs/input/{run.stream.name}/data.csv"
      id: file_check

    - type: check
      check: "state.file_check.size < 10"
      on_failure: abort

Monitor File Changes (Pre-Hook)

Track when a file was last modified:

hooks:
  pre:
    - type: inspect
      location: "azure_blob/source/{env.data_dir}/latest.parquet"
      id: source_check

    - type: log
      message: "Source file last modified at {state.source_check.updated_at}"
      level: info

Directory Validation (Pre-Hook)

Verify a path is a directory before processing:

hooks:
  pre:
    - type: inspect
      location: "aws_s3/data/{run.stream.name}/"
      id: dir_check

    - type: check
      check: "state.dir_check.is_dir == true"
      on_failure: abort

File Age Check (Pre-Hook)

Skip processing if file is too old:

hooks:
  pre:
    - type: inspect
      location: "aws_s3/data/{run.stream.name}/input.csv"
      id: age_check

    - type: check
      check: 'state.age_check.updated_at_unix >= unix_marker'
      vars:
        unix_marker: 'timestamp.unix - 24*60*60'  # 24 hours ago
      on_failure: skip

Notes

  • Not all filesystems provide all metadata fields

  • Timestamps may be zero if not supported by the filesystem

  • Directory sizes are typically reported as 0

  • The hook will not fail if file_path is invalid. It will simply return data as exists: false

PreviousHttpNextList

Last updated 3 months ago

The string. Contains connection name and path.

location