# Structure

Below is the structure of the monitor configuration file.

## Root Level

At the root level, a monitor configuration accepts the following keys:

```yaml
# 'connection' and 'objects' keys are required
connection: <connection name>

defaults: <monitor object config>

objects:
  <object name or pattern>: <monitor object config>

schemata:
  enabled: true | false
  exclude: [<glob patterns>]
```

## Object Configuration

The `<object name or pattern>` identifies the table or view to monitor. This can be a fully qualified name (e.g., `public.users`), or a wildcard pattern using `*` and `?` (e.g., `public.*`, `analytics.fact_*`).

The `<monitor object config>` accepts the following keys:

```yaml
# Object-level metrics
disabled: true | false
metadata: true | false
row_count: true | false
body_md5: true | false

# Freshness
freshness_threshold: <duration string>
freshness_column: <column name>

# Anomaly detection tuning
anomaly_detection:
  z_score_threshold: <float>
  min_history_points: <int>
  min_history_days: <int>
  history_days: <int>

# Alert triggers
alert_on_change:
  - name | type | timestamp | size | body | count

# Column-level monitoring
columns:
  <column name or "*">: <monitor column config>
```

## Column Configuration

The `<monitor column config>` accepts the following keys:

```yaml
# Statistics
count: true | false
null_count: true | false
count_distinct: true | false
unique_count: true | false
size: true | false
min_max_mean: true | false
min_max_len: true | false
percentile: true | false

# Validation
regex_match:
  - <regex pattern>
regex_not_match:
  - <regex pattern>
accepted_values:
  - <value>
rejected_values:
  - <value>

# Alert triggers
alert_on_change:
  - name | type | timestamp | size | body | count
```

## Schemata Configuration

The `schemata` block controls schema change detection across the monitored connection:

```yaml
schemata:
  enabled: true
  exclude:
    - "temp_schema.*"
    - "*.staging_*"
```

| Key       | Type      | Default       | Description                                               |
| --------- | --------- | ------------- | --------------------------------------------------------- |
| `enabled` | bool      | `false`       | Enable schema change detection                            |
| `exclude` | string\[] | `["*.*_tmp"]` | Glob patterns for objects to exclude from schema tracking |

{% hint style="info" %}
When `exclude` is not set, the default pattern `["*.*_tmp"]` is applied automatically. Set `exclude: []` (empty array) to disable all exclusions.
{% endhint %}

## Configuration Reference

### Object-Level Keys

| Key                   | Type      | Default | Description                                                                |
| --------------------- | --------- | ------- | -------------------------------------------------------------------------- |
| `disabled`            | bool      | `false` | Exclude this object from monitoring                                        |
| `metadata`            | bool      | `false` | Collect schema info (columns, types). Required for schema change detection |
| `row_count`           | bool      | `false` | Count total rows                                                           |
| `body_md5`            | bool      | `false` | Track MD5 hash of view/procedure definitions                               |
| `freshness_threshold` | string    | —       | Maximum data age before staleness alert (e.g., `"24h"`, `"7d"`)            |
| `freshness_column`    | string    | —       | Column to query `MAX()` for data age                                       |
| `anomaly_detection`   | object    | —       | Override anomaly detection parameters                                      |
| `alert_on_change`     | string\[] | —       | Change types that trigger alerts                                           |
| `columns`             | map       | —       | Column-level monitoring configuration                                      |

### Column-Level Keys

| Key               | Type      | Default | Description                                                       |
| ----------------- | --------- | ------- | ----------------------------------------------------------------- |
| `count`           | bool      | `false` | Non-null and null value counts                                    |
| `null_count`      | bool      | `false` | Null value count                                                  |
| `count_distinct`  | bool      | `false` | Unique value count (cardinality)                                  |
| `unique_count`    | bool      | `false` | Unique value count                                                |
| `size`            | bool      | `false` | Total size in bytes                                               |
| `min_max_mean`    | bool      | `false` | Minimum, maximum, and mean for numeric columns                    |
| `min_max_len`     | bool      | `false` | Minimum and maximum string length for text columns                |
| `percentile`      | bool      | `false` | Percentile statistics (p50, p90, p95, p99) and standard deviation |
| `regex_match`     | string\[] | —       | Patterns that values should match                                 |
| `regex_not_match` | string\[] | —       | Patterns that values should NOT match                             |
| `accepted_values` | string\[] | —       | Valid values (anything else is a violation)                       |
| `rejected_values` | string\[] | —       | Values that should not appear                                     |
| `alert_on_change` | string\[] | —       | Change types that trigger alerts                                  |

### Anomaly Detection Keys

| Key                  | Type  | Default | Description                                          |
| -------------------- | ----- | ------- | ---------------------------------------------------- |
| `z_score_threshold`  | float | `3.0`   | Z-score threshold for anomaly detection              |
| `min_history_points` | int   | `7`     | Minimum data points required before detection begins |
| `min_history_days`   | int   | `7`     | Minimum days of history required                     |
| `history_days`       | int   | `30`    | Lookback window in days for baseline calculation     |

## Wildcards & Patterns

Use `*` and `?` wildcards in object names to monitor multiple objects with the same configuration:

* `*` matches any sequence of characters
* `?` matches a single character

```yaml
objects:
  # All tables in public schema
  public.*:
    metadata: true
    row_count: true

  # Fact tables in analytics schema
  analytics.fact_*:
    row_count: true
    freshness_threshold: "6h"

  # Single character wildcard
  staging.tmp_?:
    metadata: true
```

Use `"*"` as a column name in `defaults` to apply column metrics to all columns:

```yaml
defaults:
  columns:
    "*":
      null_count: true
      count_distinct: true
```

### Definition Order

Objects are processed in **definition order**. When a wildcard expands to include a table and a later entry targets that same table, the later entry's configuration wins entirely.

**Exclude specific tables from a wildcard:**

```yaml
objects:
  # Monitor all public tables...
  public.*:
    metadata: true
    row_count: true

  # ...except these
  public.sensitive_table:
    disabled: true
  public.audit_logs:
    disabled: true
```

**Disable all, then re-enable specific tables:**

```yaml
objects:
  public.*:
    disabled: true

  public.users:
    metadata: true
    row_count: true
  public.orders:
    row_count: true
    freshness_threshold: "12h"
```

{% hint style="info" %}
Definition order matters: when a wildcard expands to include a table, and a later entry targets that same table, the later entry's configuration wins entirely.
{% endhint %}

## Defaults Inheritance

Settings defined under `defaults` are applied to all objects. Individual objects can override any default value.

```yaml
connection: MY_POSTGRES

defaults:
  metadata: true
  row_count: true
  freshness_threshold: "24h"
  columns:
    "*":
      null_count: true

objects:
  public.users: {}              # inherits all defaults
  public.orders:
    freshness_threshold: "6h"   # overrides default threshold
    columns:
      total:
        min_max_mean: true      # adds to inherited column config
  public.staging:
    disabled: true              # excluded entirely
```
