# Column Metrics & Validation

Column-level monitoring lets you collect detailed statistics and apply validation rules to individual columns. These metrics feed into [anomaly detection](https://docs.slingdata.io/sling-platform/platform/monitors/anomaly-detection) over time, alerting you when column values deviate from historical patterns.

## Column Statistics

| Key              | Type | Description                                                       |
| ---------------- | ---- | ----------------------------------------------------------------- |
| `count`          | bool | Total non-null and null value counts                              |
| `null_count`     | bool | Number of null values                                             |
| `count_distinct` | bool | Number of unique values (cardinality)                             |
| `unique_count`   | bool | Unique value count                                                |
| `size`           | bool | Total size in bytes of column values                              |
| `min_max_mean`   | bool | Minimum, maximum, and mean values for numeric columns             |
| `min_max_len`    | bool | Minimum and maximum string length for text columns                |
| `percentile`     | bool | Percentile statistics (p50, p90, p95, p99) and standard deviation |

```yaml
objects:
  public.orders:
    columns:
      revenue:
        count: true
        min_max_mean: true
        percentile: true

      customer_id:
        count_distinct: true
        null_count: true

      description:
        min_max_len: true
```

## Column Validation

Validation rules check column values against defined patterns or value lists. Violations are reported as anomaly events.

### Regex Patterns

Use `regex_match` to define patterns that values **should** match, and `regex_not_match` for patterns that values should **not** match:

```yaml
objects:
  public.users:
    columns:
      email:
        regex_match:
          - "^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\\.[a-zA-Z0-9-.]+$"

      phone:
        regex_not_match:
          - "^000"        # flag placeholder numbers
          - "^555"        # flag test numbers
```

### Value Lists

Use `accepted_values` to define valid values (anything else is a violation), and `rejected_values` to define values that should not appear:

```yaml
objects:
  public.orders:
    columns:
      status:
        accepted_values:
          - pending
          - processing
          - shipped
          - delivered
          - cancelled

      source:
        rejected_values:
          - test
          - unknown
          - "null"
          - ""
```

{% hint style="info" %}
Validation results include match counts, violation counts, and a `valid` boolean. These are tracked over time and can trigger anomaly alerts when violation rates change.
{% endhint %}

## Column Wildcard

Use `"*"` as a column name in `defaults` to apply metrics to all columns across all monitored objects:

```yaml
defaults:
  columns:
    "*":
      null_count: true
      count_distinct: true

objects:
  public.users: {}          # all columns get null_count + count_distinct
  public.orders:
    columns:
      revenue:
        min_max_mean: true   # adds to the inherited wildcard metrics
```

## Complete Example

```yaml
connection: MY_POSTGRES

defaults:
  metadata: true
  row_count: true
  columns:
    "*":
      null_count: true

objects:
  public.users:
    columns:
      email:
        count_distinct: true
        regex_match:
          - "^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\\.[a-zA-Z0-9-.]+$"
      status:
        accepted_values:
          - active
          - inactive
          - pending
      name:
        min_max_len: true

  public.orders:
    freshness_threshold: "6h"
    freshness_column: created_at
    columns:
      total:
        min_max_mean: true
        percentile: true
      quantity:
        min_max_mean: true
      discount_code:
        count_distinct: true
        rejected_values:
          - test
          - "EXPIRED_2023"

  public.products:
    columns:
      price:
        min_max_mean: true
        percentile: true
      category:
        count_distinct: true
        accepted_values:
          - electronics
          - clothing
          - food
          - home
```
