Inspect

The inspect hook allows you to retrieve metadata about a file or directory from any supported filesystem connection. This is particularly useful for validating file existence, checking file properties, or monitoring file changes.

Configuration

- type: inspect
  location: aws_s3/path/to/file      # Required: Location name
  recursive: true/false   # Optional: whether to get total count/size nested files 
  on_failure: abort       # Optional: abort/warn/quiet/skip
  id: my_id               # Optional. Will be generated. Use `log` hook with {runtime_state} to view state.

Properties

Property
Required
Description

location

Yes

recursive

No

Whether to get total count/size nested files (true/false)

on_failure

No

What to do if the inspection fails (abort/warn/quiet/skip)

Output

When the inspect hook executes successfully, it returns the following output that can be accessed in subsequent hooks:

status: success  # Status of the hook execution
exists: true     # Whether the path exists
path: "path/to/file"  # The normalized path
name: "file"     # The name of the file/directory
uri: "s3://bucket/path/to/file"  # The full URI
is_dir: false    # Whether the path is a directory
size: 1024       # Total size in bytes (0 for directories if not recursive)
node_count: 0    # Total count of folders/files (if recursive)
folder_count: 0  # Total count of folders (if recursive)
file_count: 1    # Total count of files (if recursive)
created_at: "2023-01-01T00:00:00Z"  # Creation timestamp if available
created_at_unix: 1672531200  # Creation unix timestamp if available
updated_at: "2023-01-02T00:00:00Z"  # Last modified timestamp if available
updated_at_unix: 1672617600  # Last modified unix timestamp if available

You can access these values in subsequent hooks using the following syntax (jmespath):

  • {state.hook_id.status} - Status of the hook execution

  • {state.hook_id.exists} - Whether the path exists

  • {state.hook_id.path} - The normalized path

  • {state.hook_id.name} - The name of the file/directory

  • {state.hook_id.uri} - The full URI

  • {state.hook_id.is_dir} - Whether the path is a directory

  • {state.hook_id.size} - Size in bytes

  • {state.hook_id.created_at} - Creation timestamp

  • {state.hook_id.created_at_unix} - Creation unix timestamp

  • {state.hook_id.updated_at} - Last modified timestamp

  • {state.hook_id.updated_at_unix} - Last modified unix timestamp

Examples

Verify File Existence (Pre-Hook)

Check if a required file exists before processing:

hooks:
  pre:
    - type: inspect
      location: "aws_s3/data/config/{run.stream.name}.json"
      id: config_check
      on_failure: warn

    - type: check
      check: "state.config_check.exists == true"
      on_failure: abort

Check File Size (Pre-Hook)

Ensure a file is not empty before processing:

hooks:
  pre:
    - type: inspect
      location: "gcs/input/{run.stream.name}/data.csv"
      id: file_check

    - type: check
      check: "state.file_check.size < 10"
      on_failure: abort

Monitor File Changes (Pre-Hook)

Track when a file was last modified:

hooks:
  pre:
    - type: inspect
      location: "azure_blob/source/{env.data_dir}/latest.parquet"
      id: source_check

    - type: log
      message: "Source file last modified at {state.source_check.updated_at}"
      level: info

Directory Validation (Pre-Hook)

Verify a path is a directory before processing:

hooks:
  pre:
    - type: inspect
      location: "aws_s3/data/{run.stream.name}/"
      id: dir_check

    - type: check
      check: "state.dir_check.is_dir == true"
      on_failure: abort

File Age Check (Pre-Hook)

Skip processing if file is too old:

hooks:
  pre:
    - type: inspect
      location: "aws_s3/data/{run.stream.name}/input.csv"
      id: age_check

    - type: check
      check: 'state.age_check.updated_at_unix >= unix_marker'
      vars:
        unix_marker: 'timestamp.unix - 24*60*60'  # 24 hours ago
      on_failure: skip

Notes

  • Not all filesystems provide all metadata fields

  • Timestamps may be zero if not supported by the filesystem

  • Directory sizes are typically reported as 0

  • The hook will not fail if file_path is invalid. It will simply return data as exists: false

Last updated