# Inspect

The inspect hook allows you to retrieve metadata about files, directories, or database objects from any supported connection. This is particularly useful for validating existence, checking properties, or monitoring changes.

## Configuration

```yaml
- type: inspect
  location: connection_name/path_or_object  # Required: Location string (database or storage)
  recursive: true/false                    # Optional: For files, get nested file stats
  on_failure: abort                        # Optional: abort/warn/quiet/skip
  id: my_id                               # Optional: Generated if not provided
```

## Properties

| Property    | Required | Description                                                                                                                                                                                                               |
| ----------- | -------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| location    | Yes      | The [location](https://docs.slingdata.io/sling-cli/environment#location-string) string. Can be a **database location** (e.g., `postgres/public.table_name`) or **storage location** (e.g., `aws_s3/bucket/path/file.txt`) |
| recursive   | No       | For file systems: whether to get total count/size of nested files                                                                                                                                                         |
| on\_failure | No       | What to do if the inspection fails (abort/warn/quiet/skip)                                                                                                                                                                |

### Location Examples

* **Database**: `postgres/public.users`, `mysql_db/analytics.events`, `snowflake/DATABASE.SCHEMA.TABLE`
* **Storage**: `aws_s3/bucket/data/file.csv`, `local//tmp/data/file.json`, `gcs/bucket/folder/`

## Output

### File System Objects

When inspecting files or directories, the hook returns:

```yaml
status: success                          # Status of the hook execution
exists: true                            # Whether the path exists
path: "path/to/file"                    # The normalized path
name: "file"                           # The name of the file/directory
uri: "s3://bucket/path/to/file"        # The full URI
is_dir: false                          # Whether the path is a directory
size: 1024                             # File size in bytes
node_count: 5                          # Total nodes (if recursive)
folder_count: 2                        # Total folders (if recursive)
file_count: 3                          # Total files (if recursive)
created_at: "2023-01-01T00:00:00Z"     # Creation timestamp if available
created_at_unix: 1672531200            # Creation unix timestamp if available
updated_at: "2023-01-02T00:00:00Z"     # Last modified timestamp if available
updated_at_unix: 1672617600            # Last modified unix timestamp if available
```

### Database Objects

When inspecting database tables, the hook returns:

```yaml
status: success                         # Status of the hook execution
exists: true                           # Whether the table exists
database: "my_database"                # Database name
schema: "public"                       # Schema name
name: "my_table"                       # Table name
fdqn: "my_database.public.my_table"    # Fully qualified name
columns:                               # Array of column details
  - name: "id"
    type: "integer"
    db_type: "int4"
    position: 1
  - name: "name"
    type: "string"
    db_type: "varchar"
    position: 2
    precision: 255
column_names: ["id", "name"]           # Array of column names
column_map:                            # Map of columns by name
  id:
    name: "id"
    type: "integer"
    db_type: "int4"
    position: 1
  name:
    name: "name"
    type: "string"
    db_type: "varchar"
    position: 2
    precision: 255
```

## Accessing Output Data

You can access these values in subsequent hooks using JMESPath syntax:

### File System Data

* `{state.hook_id.exists}` - Whether the path exists
* `{state.hook_id.size}` - File size in bytes
* `{state.hook_id.is_dir}` - Whether it's a directory
* `{state.hook_id.created_at}` - Creation timestamp
* `{state.hook_id.updated_at}` - Last modified timestamp

### Database Data

* `{state.hook_id.exists}` - Whether the table exists
* `{state.hook_id.column_names}` - Array of column names
* `{state.hook_id.column_map.column_name.type}` - Specific column type
* `{state.hook_id.fdqn}` - Fully qualified table name

## Examples

### Database Examples

#### Verify Table Exists

Check if a required table exists:

```yaml
steps:
  - type: inspect
    id: table_check
    location: postgres/public.customer_data

  - type: check
    check: state.table_check.exists == true
    failure_message: "Required table does not exist"
    on_failure: abort
```

#### Validate Table Schema

Ensure required columns exist and check column properties:

```yaml
steps:
  - type: inspect
    id: schema_check
    location: postgres/public.test_table

  - type: log
    message: |
      Table inspection results:
      - Exists: {state.schema_check.exists}
      - Database: {state.schema_check.database}
      - Schema: {state.schema_check.schema}
      - Name: {state.schema_check.name}
      - FDQN: {state.schema_check.fdqn}
      - Column Count: {length(state.schema_check.columns)}

  - type: check
    check: contains(state.schema_check.column_names, 'user_id') && contains(state.schema_check.column_names, 'event_timestamp')
    failure_message: "Table missing required columns"
    on_failure: abort

  - type: check
    check: state.schema_check.columns[2].precision == 10
    failure_message: "Column precision mismatch"
```

#### Check Column Types

Validate specific column data types:

```yaml
steps:
  - type: inspect
    id: column_check
    location: mysql_db/analytics.user_events

  - type: check
    check: state.column_check.column_map.created_at.type == 'datetime'
    failure_message: "created_at column must be datetime type"
    on_failure: warn
```

### File System Examples

#### Verify File Existence

Check if a required file exists before processing:

```yaml
steps:
  - type: inspect
    id: config_check
    location: aws_s3/data/config/{run.stream.name}.json

  - type: check
    check: state.config_check.exists == true
    failure_message: "Config file not found for stream {run.stream.name}"
    on_failure: abort
```

#### Check File Properties

Ensure a file meets requirements:

```yaml
steps:
  - type: inspect
    id: file_check
    location: local//tmp/data/input.csv

  - type: log
    message: |
      File inspection results:
      - Exists: {state.file_check.exists}
      - Path: {state.file_check.path}
      - Name: {state.file_check.name}
      - Size: {state.file_check.size}
      - Is Directory: {state.file_check.is_dir}

  - type: check
    check: state.file_check.size > 0
    failure_message: "Input file is empty"
    on_failure: abort

  - type: check
    check: state.file_check.name == "input.csv"
    failure_message: "Wrong file name"
```

#### Directory Inspection with Recursive Stats

Get total file count and size in a directory:

```yaml
steps:
  - type: inspect
    id: dir_stats
    location: aws_s3/bucket/data/
    recursive: true

  - type: log
    message: |
      Directory inspection results (recursive):
      - Total Size: {state.dir_stats.size}
      - Node Count: {state.dir_stats.node_count}
      - File Count: {state.dir_stats.file_count}
      - Folder Count: {state.dir_stats.folder_count}

  - type: check
    check: state.dir_stats.file_count >= 1
    failure_message: "Directory should contain at least one file"
```

#### File Age Check

Skip processing if file is too old:

```yaml
steps:
  - type: inspect
    id: age_check
    location: local//tmp/data/input.csv

  - type: check
    check: state.age_check.updated_at_unix >= (timestamp.unix - 86400)
    failure_message: "File is older than 24 hours"
    on_failure: skip
```

## Notes

* **Database Locations**: Use the format `connection_name/database.schema.table` or `connection_name/schema.table` depending on your database
* **Storage Locations**: Use the format `connection_name/path/to/file` or `connection_name/path/to/directory/`
* **File Systems**: Not all filesystems provide all metadata fields
* **File Systems**: Timestamps may be zero if not supported by the filesystem
* **File Systems**: Directory sizes are typically reported as 0 unless `recursive: true`
* **Databases**: Column precision and scale are only populated for decimal/numeric types
* **General**: The hook will not fail if the path/object is invalid; it returns `exists: false`
* **General**: Use appropriate connection types (file connections for files, database connections for tables)
