Inspect
The inspect hook allows you to retrieve metadata about files, directories, or database objects from any supported connection. This is particularly useful for validating existence, checking properties, or monitoring changes.
Configuration
- type: inspect
location: connection_name/path_or_object # Required: Location string (database or storage)
recursive: true/false # Optional: For files, get nested file stats
on_failure: abort # Optional: abort/warn/quiet/skip
id: my_id # Optional: Generated if not providedProperties
location
Yes
The location string. Can be a database location (e.g., postgres/public.table_name) or storage location (e.g., aws_s3/bucket/path/file.txt)
recursive
No
For file systems: whether to get total count/size of nested files
on_failure
No
What to do if the inspection fails (abort/warn/quiet/skip)
Location Examples
Database:
postgres/public.users,mysql_db/analytics.events,snowflake/DATABASE.SCHEMA.TABLEStorage:
aws_s3/bucket/data/file.csv,local//tmp/data/file.json,gcs/bucket/folder/
Output
File System Objects
When inspecting files or directories, the hook returns:
status: success # Status of the hook execution
exists: true # Whether the path exists
path: "path/to/file" # The normalized path
name: "file" # The name of the file/directory
uri: "s3://bucket/path/to/file" # The full URI
is_dir: false # Whether the path is a directory
size: 1024 # File size in bytes
node_count: 5 # Total nodes (if recursive)
folder_count: 2 # Total folders (if recursive)
file_count: 3 # Total files (if recursive)
created_at: "2023-01-01T00:00:00Z" # Creation timestamp if available
created_at_unix: 1672531200 # Creation unix timestamp if available
updated_at: "2023-01-02T00:00:00Z" # Last modified timestamp if available
updated_at_unix: 1672617600 # Last modified unix timestamp if availableDatabase Objects
When inspecting database tables, the hook returns:
status: success # Status of the hook execution
exists: true # Whether the table exists
database: "my_database" # Database name
schema: "public" # Schema name
name: "my_table" # Table name
fdqn: "my_database.public.my_table" # Fully qualified name
columns: # Array of column details
- name: "id"
type: "integer"
db_type: "int4"
position: 1
- name: "name"
type: "string"
db_type: "varchar"
position: 2
precision: 255
column_names: ["id", "name"] # Array of column names
column_map: # Map of columns by name
id:
name: "id"
type: "integer"
db_type: "int4"
position: 1
name:
name: "name"
type: "string"
db_type: "varchar"
position: 2
precision: 255Accessing Output Data
You can access these values in subsequent hooks using JMESPath syntax:
File System Data
{state.hook_id.exists}- Whether the path exists{state.hook_id.size}- File size in bytes{state.hook_id.is_dir}- Whether it's a directory{state.hook_id.created_at}- Creation timestamp{state.hook_id.updated_at}- Last modified timestamp
Database Data
{state.hook_id.exists}- Whether the table exists{state.hook_id.column_names}- Array of column names{state.hook_id.column_map.column_name.type}- Specific column type{state.hook_id.fdqn}- Fully qualified table name
Examples
Database Examples
Verify Table Exists
Check if a required table exists:
steps:
- type: inspect
id: table_check
location: postgres/public.customer_data
- type: check
check: state.table_check.exists == true
failure_message: "Required table does not exist"
on_failure: abortValidate Table Schema
Ensure required columns exist and check column properties:
steps:
- type: inspect
id: schema_check
location: postgres/public.test_table
- type: log
message: |
Table inspection results:
- Exists: {state.schema_check.exists}
- Database: {state.schema_check.database}
- Schema: {state.schema_check.schema}
- Name: {state.schema_check.name}
- FDQN: {state.schema_check.fdqn}
- Column Count: {length(state.schema_check.columns)}
- type: check
check: contains(state.schema_check.column_names, 'user_id') && contains(state.schema_check.column_names, 'event_timestamp')
failure_message: "Table missing required columns"
on_failure: abort
- type: check
check: state.schema_check.columns[2].precision == 10
failure_message: "Column precision mismatch"Check Column Types
Validate specific column data types:
steps:
- type: inspect
id: column_check
location: mysql_db/analytics.user_events
- type: check
check: state.column_check.column_map.created_at.type == 'datetime'
failure_message: "created_at column must be datetime type"
on_failure: warnFile System Examples
Verify File Existence
Check if a required file exists before processing:
steps:
- type: inspect
id: config_check
location: aws_s3/data/config/{run.stream.name}.json
- type: check
check: state.config_check.exists == true
failure_message: "Config file not found for stream {run.stream.name}"
on_failure: abortCheck File Properties
Ensure a file meets requirements:
steps:
- type: inspect
id: file_check
location: local//tmp/data/input.csv
- type: log
message: |
File inspection results:
- Exists: {state.file_check.exists}
- Path: {state.file_check.path}
- Name: {state.file_check.name}
- Size: {state.file_check.size}
- Is Directory: {state.file_check.is_dir}
- type: check
check: state.file_check.size > 0
failure_message: "Input file is empty"
on_failure: abort
- type: check
check: state.file_check.name == "input.csv"
failure_message: "Wrong file name"Directory Inspection with Recursive Stats
Get total file count and size in a directory:
steps:
- type: inspect
id: dir_stats
location: aws_s3/bucket/data/
recursive: true
- type: log
message: |
Directory inspection results (recursive):
- Total Size: {state.dir_stats.size}
- Node Count: {state.dir_stats.node_count}
- File Count: {state.dir_stats.file_count}
- Folder Count: {state.dir_stats.folder_count}
- type: check
check: state.dir_stats.file_count >= 1
failure_message: "Directory should contain at least one file"File Age Check
Skip processing if file is too old:
steps:
- type: inspect
id: age_check
location: local//tmp/data/input.csv
- type: check
check: state.age_check.updated_at_unix >= (timestamp.unix - 86400)
failure_message: "File is older than 24 hours"
on_failure: skipNotes
Database Locations: Use the format
connection_name/database.schema.tableorconnection_name/schema.tabledepending on your databaseStorage Locations: Use the format
connection_name/path/to/fileorconnection_name/path/to/directory/File Systems: Not all filesystems provide all metadata fields
File Systems: Timestamps may be zero if not supported by the filesystem
File Systems: Directory sizes are typically reported as 0 unless
recursive: trueDatabases: Column precision and scale are only populated for decimal/numeric types
General: The hook will not fail if the path/object is invalid; it returns
exists: falseGeneral: Use appropriate connection types (file connections for files, database connections for tables)
Last updated
Was this helpful?