Inspect

The inspect hook allows you to retrieve metadata about files, directories, or database objects from any supported connection. This is particularly useful for validating existence, checking properties, or monitoring changes.

Configuration

- type: inspect
  location: connection_name/path_or_object  # Required: Location string (database or storage)
  recursive: true/false                    # Optional: For files, get nested file stats
  on_failure: abort                        # Optional: abort/warn/quiet/skip
  id: my_id                               # Optional: Generated if not provided

Properties

Property
Required
Description

location

Yes

The location string. Can be a database location (e.g., postgres/public.table_name) or storage location (e.g., aws_s3/bucket/path/file.txt)

recursive

No

For file systems: whether to get total count/size of nested files

on_failure

No

What to do if the inspection fails (abort/warn/quiet/skip)

Location Examples

  • Database: postgres/public.users, mysql_db/analytics.events, snowflake/DATABASE.SCHEMA.TABLE

  • Storage: aws_s3/bucket/data/file.csv, local//tmp/data/file.json, gcs/bucket/folder/

Output

File System Objects

When inspecting files or directories, the hook returns:

status: success                          # Status of the hook execution
exists: true                            # Whether the path exists
path: "path/to/file"                    # The normalized path
name: "file"                           # The name of the file/directory
uri: "s3://bucket/path/to/file"        # The full URI
is_dir: false                          # Whether the path is a directory
size: 1024                             # File size in bytes
node_count: 5                          # Total nodes (if recursive)
folder_count: 2                        # Total folders (if recursive)
file_count: 3                          # Total files (if recursive)
created_at: "2023-01-01T00:00:00Z"     # Creation timestamp if available
created_at_unix: 1672531200            # Creation unix timestamp if available
updated_at: "2023-01-02T00:00:00Z"     # Last modified timestamp if available
updated_at_unix: 1672617600            # Last modified unix timestamp if available

Database Objects

When inspecting database tables, the hook returns:

status: success                         # Status of the hook execution
exists: true                           # Whether the table exists
database: "my_database"                # Database name
schema: "public"                       # Schema name
name: "my_table"                       # Table name
fdqn: "my_database.public.my_table"    # Fully qualified name
columns:                               # Array of column details
  - name: "id"
    type: "integer"
    db_type: "int4"
    position: 1
  - name: "name"
    type: "string"
    db_type: "varchar"
    position: 2
    precision: 255
column_names: ["id", "name"]           # Array of column names
column_map:                            # Map of columns by name
  id:
    name: "id"
    type: "integer"
    db_type: "int4"
    position: 1
  name:
    name: "name"
    type: "string"
    db_type: "varchar"
    position: 2
    precision: 255

Accessing Output Data

You can access these values in subsequent hooks using JMESPath syntax:

File System Data

  • {state.hook_id.exists} - Whether the path exists

  • {state.hook_id.size} - File size in bytes

  • {state.hook_id.is_dir} - Whether it's a directory

  • {state.hook_id.created_at} - Creation timestamp

  • {state.hook_id.updated_at} - Last modified timestamp

Database Data

  • {state.hook_id.exists} - Whether the table exists

  • {state.hook_id.column_names} - Array of column names

  • {state.hook_id.column_map.column_name.type} - Specific column type

  • {state.hook_id.fdqn} - Fully qualified table name

Examples

Database Examples

Verify Table Exists

Check if a required table exists:

steps:
  - type: inspect
    id: table_check
    location: postgres/public.customer_data

  - type: check
    check: state.table_check.exists == true
    failure_message: "Required table does not exist"
    on_failure: abort

Validate Table Schema

Ensure required columns exist and check column properties:

steps:
  - type: inspect
    id: schema_check
    location: postgres/public.test_table

  - type: log
    message: |
      Table inspection results:
      - Exists: {state.schema_check.exists}
      - Database: {state.schema_check.database}
      - Schema: {state.schema_check.schema}
      - Name: {state.schema_check.name}
      - FDQN: {state.schema_check.fdqn}
      - Column Count: {length(state.schema_check.columns)}

  - type: check
    check: contains(state.schema_check.column_names, 'user_id') && contains(state.schema_check.column_names, 'event_timestamp')
    failure_message: "Table missing required columns"
    on_failure: abort

  - type: check
    check: state.schema_check.columns[2].precision == 10
    failure_message: "Column precision mismatch"

Check Column Types

Validate specific column data types:

steps:
  - type: inspect
    id: column_check
    location: mysql_db/analytics.user_events

  - type: check
    check: state.column_check.column_map.created_at.type == 'datetime'
    failure_message: "created_at column must be datetime type"
    on_failure: warn

File System Examples

Verify File Existence

Check if a required file exists before processing:

steps:
  - type: inspect
    id: config_check
    location: aws_s3/data/config/{run.stream.name}.json

  - type: check
    check: state.config_check.exists == true
    failure_message: "Config file not found for stream {run.stream.name}"
    on_failure: abort

Check File Properties

Ensure a file meets requirements:

steps:
  - type: inspect
    id: file_check
    location: local//tmp/data/input.csv

  - type: log
    message: |
      File inspection results:
      - Exists: {state.file_check.exists}
      - Path: {state.file_check.path}
      - Name: {state.file_check.name}
      - Size: {state.file_check.size}
      - Is Directory: {state.file_check.is_dir}

  - type: check
    check: state.file_check.size > 0
    failure_message: "Input file is empty"
    on_failure: abort

  - type: check
    check: state.file_check.name == "input.csv"
    failure_message: "Wrong file name"

Directory Inspection with Recursive Stats

Get total file count and size in a directory:

steps:
  - type: inspect
    id: dir_stats
    location: aws_s3/bucket/data/
    recursive: true

  - type: log
    message: |
      Directory inspection results (recursive):
      - Total Size: {state.dir_stats.size}
      - Node Count: {state.dir_stats.node_count}
      - File Count: {state.dir_stats.file_count}
      - Folder Count: {state.dir_stats.folder_count}

  - type: check
    check: state.dir_stats.file_count >= 1
    failure_message: "Directory should contain at least one file"

File Age Check

Skip processing if file is too old:

steps:
  - type: inspect
    id: age_check
    location: local//tmp/data/input.csv

  - type: check
    check: state.age_check.updated_at_unix >= (timestamp.unix - 86400)
    failure_message: "File is older than 24 hours"
    on_failure: skip

Notes

  • Database Locations: Use the format connection_name/database.schema.table or connection_name/schema.table depending on your database

  • Storage Locations: Use the format connection_name/path/to/file or connection_name/path/to/directory/

  • File Systems: Not all filesystems provide all metadata fields

  • File Systems: Timestamps may be zero if not supported by the filesystem

  • File Systems: Directory sizes are typically reported as 0 unless recursive: true

  • Databases: Column precision and scale are only populated for decimal/numeric types

  • General: The hook will not fail if the path/object is invalid; it returns exists: false

  • General: Use appropriate connection types (file connections for files, database connections for tables)

Last updated

Was this helpful?