> For the complete documentation index, see [llms.txt](https://docs.slingdata.io/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.slingdata.io/concepts/replication/columns.md).

# Columns

## Selecting Columns

Use the `select` option to narrow a stream down to the columns you actually need. Replication is read-once, write-once, so dropping unused columns shrinks the bytes transferred, the storage footprint at the target, and the time spent inferring/casting types.

From *v1.5.19+*,`select` works on every source: databases (where it becomes `SELECT col1, col2, ...`), file sources (CSV, JSON, JSONL, Parquet), and API sources. Three things make up the grammar:

* **Exact names** to include (`id`, `email`)
* **Glob patterns** to include or exclude (`user_*`, `*_at`, `-internal_*`)
* **Renames** with the `as` keyword (`id as user_id`)

### Using CLI Flags

```bash
# include specific columns
sling run --select 'id,email,created_at'

# include everything except a few (minus prefix is the exclude marker)
sling run --select '-password,-ssn'

# include with a glob; here, only columns named like `event_*`
sling run --select 'event_*'

# rename inline
sling run --select 'id as user_id,name as full_name'
```

### Using YAML

```yaml
source: source_name
target: target_name

defaults:
  # narrow every stream to these columns
  select: [id, email, created_at]

streams:
  my_stream:
    # this stream: drop a few columns, keep the rest
    select: [-password, -ssn]

  my_other_stream:
    # this stream: include + rename
    select: [id, name as full_name, email]
```

### Picking a Subset

The most common use of `select` is to whittle a wide source table down to the columns the target actually needs. Two shapes show up most often:

**1. Include exactly what you want.** Everything not listed is dropped.

```yaml
streams:
  public.users:
    object: public.users_slim
    select: [id, email, created_at]
```

**2. Include everything except a few.** Use `-` prefixes. When every item is an exclusion, Sling treats it as "select all *except* these."

```yaml
streams:
  public.users:
    object: public.users_no_pii
    select: [-password, -ssn, -date_of_birth]
```

You cannot mix exact include names with `-` exclusions in the same list — pick one shape per stream.

### Globs

Globs let one pattern match many columns. They work in both include lists and exclude lists.

```yaml
streams:
  # include: keep only the audit timestamps
  public.events:
    object: public.events_audit
    select: ['*_at']            # created_at, updated_at, deleted_at, ...

  # include: keep all the user_* columns
  public.profile_join:
    select: ['user_*']           # user_id, user_name, user_role, ...

  # exclude: drop everything starting with `internal_`
  public.audit_log:
    object: public.audit_clean
    select: ['-internal_*']

  # exclude: drop trailing-`_internal` columns plus one specific field
  public.audit_log_v2:
    object: public.audit_v2_clean
    select: ['-*_internal', '-debug_info']
```

Supported patterns: prefix (`prefix_*`), suffix (`*_suffix`), contains (`*middle*`), and exact names. Matching is case-insensitive.

### Renaming

The `as` keyword renames columns on the way out. For database sources this generates `SELECT column AS alias` directly; for file and API sources Sling rewrites the column header after read. Renames sit alongside includes — list anything you don't rename as-is. Requires *v1.5.5+*.

```yaml
streams:
  public.users:
    object: public.customers
    select:
      - 'id as user_id'
      - 'first_name as name'
      - 'email as contact_email'
```

When using custom SQL with the `{fields}` placeholder, the renamed columns are substituted automatically:

```yaml
streams:
  my_stream:
    sql: SELECT {fields} FROM public.users WHERE active = true
    object: public.active_customers
    select:
      - 'id'
      - 'first_name as customer_name'
      - 'score as customer_score'
```

### Controlling Column Order

The order of items in the `select` list is the order columns appear in the target. This matters when you're writing to a file (CSV, Parquet, JSON) where readers care about column order, or when you want a stable layout regardless of how the source happens to list its columns.

You can pin specific columns to the front, let the rest follow in source order via `*`, and pin others to the back:

```yaml
streams:
  public.users:
    object: public.users_reordered
    select:
      - id              # front pin
      - email           # front pin
      - '*'             # everything else, in source order
      - created_at      # back pin
      - updated_at      # back pin
```

`*` expands to whatever's left after the pins are accounted for, in the order the source declared the columns. Globs can also be used as pins (`'user_*'` between two exact pins drops every matching column there in source order).

If you don't need reordering, just leave `*` out — Sling preserves source order by default.

### Reusing `columns` for Order

When you've already listed your columns under [`columns`](#casting-columns) to pin their types, you don't have to repeat every name under `select` just to control order. From *v1.5.20+*, the `@columns` token expands to the names declared in `columns`, **in that declared order**.

`@columns` must be the **first** item in the `select` list. Use it alone to emit exactly the declared columns, or follow it with `*` to pin the declared columns first and let the rest follow.

```yaml
streams:
  res_partner:
    columns:
      id: integer
      name: string
      email: string
      # ... more type-pinned columns, in your preferred order ...

    # emit exactly the columns above, in the order they're declared
    select: ['@columns']
```

```yaml
streams:
  res_partner:
    columns:
      id: integer
      name: string
      email: string

    # pin the declared columns first (in order), then everything else
    select: ['@columns', '*']
```

Notes:

* `@columns` is only honored as the first item — using it anywhere else is an error.
* It requires a `columns` block on the stream (after [merging with defaults](#merging-with-defaults)); using it with no columns defined is an error.
* Names that would be duplicated by the expansion are kept once (first occurrence wins), so `['@columns', 'id']` won't list `id` twice.

## Casting Columns

When running a replication, you can specify the column types to tell Sling to cast the data to the correct type. It is not necessary to include all columns. Sling will automatically detect the types for any unspecified columns. See [here](https://github.com/slingdata-io/sling-cli/blob/main/core/dbio/templates/types_general_to_native.tsv) and [here](https://github.com/slingdata-io/sling-cli/blob/main/core/dbio/templates/types_native_to_general.tsv) for various type mappings between native and generic types for all types of databases.

Acceptable data types are:

* `bigint`
* `bool`
* `datetime`
* `decimal` or `decimal(precision, scale)` (such as `decimal(10, 2)`)
* `integer`
* `json`
* `string` or `string(length)` (such as `string(100)`)
* `text` or `text(length)` (such as `text(3000)`)
* `geometry`

Sling allows you the ability to apply constraints, such as `value > 0`. See [Constraints](/concepts/data-quality/constraints.md) for details.

### Column Modifiers

From *v1.5.20+*, the type slot accepts space-separated **modifiers** after the data type. These shape the DDL Sling generates when it creates the target table — adding `NOT NULL`, primary keys, unique constraints, column descriptions, and indexes — without having to hand-write `table_ddl`. They apply on a fresh `CREATE TABLE` regardless of any schema-migration setting.

```yaml
streams:
  public.users:
    object: public.users_dim
    columns:
      id:      bigint not_null primary_key
      email:   text not_null unique
      region:  text index
      slug:    text unique_index
      notes:   text description('free-form user notes')
```

The first token is always the type; everything after it is a modifier. The supported modifiers are:

| Modifier                             | Effect                                                                                                                                                                                           |
| ------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `not_null`                           | Column is `NOT NULL`.                                                                                                                                                                            |
| `nullable`                           | Column is explicitly nullable (the default; useful to override a source-derived `not_null`).                                                                                                     |
| `primary_key`                        | Column is part of the table's `PRIMARY KEY` (DDL only — does not set the incremental-mode primary key; use [`primary_key`](/concepts/replication/structure.md) / `table_keys.primary` for that). |
| `unique`                             | Adds a `UNIQUE` constraint on the column.                                                                                                                                                        |
| `index`                              | Creates a plain index on the column.                                                                                                                                                             |
| `index(...)`                         | Creates a plain index with options (see kwargs below).                                                                                                                                           |
| `unique_index` / `unique_index(...)` | Creates a unique index on the column.                                                                                                                                                            |
| `description('<text>')`              | Sets the column comment/description. A user-supplied description wins over one inferred from the source.                                                                                         |

The `index(...)` / `unique_index(...)` forms accept keyword arguments to compose multi-column or partial indexes across several columns. Give the same `name` to two columns to build a composite index, and use `priority` to order the members:

```yaml
streams:
  public.events:
    object: public.events_fact
    columns:
      org_id:     bigint not_null index(name=idx_org_created, priority=1)
      created_at: datetime index(name=idx_org_created, priority=2, sort=desc)
      status:     text index(name=idx_active, where='deleted_at IS NULL')
```

Supported kwargs: `name`, `priority`, `sort` (`asc`/`desc`), `where`, `type`, `include`. See also [`table_keys.index`](/concepts/replication/target-options.md#table-keys) options for defining composite indexes.

{% hint style="info" %}
Whether each modifier renders depends on the target engine. For example, ClickHouse honors `not_null` (columns are *not* wrapped in `Nullable(...)`) and renders indexes inline, while engines without secondary indexes (Snowflake, Redshift) silently ignore `index`. Index and description statements are only applied to the final table, never the transient staging table.
{% endhint %}

The runtime [constraint](/concepts/data-quality/constraints.md) slot (after `|`) is unaffected and can be combined with modifiers:

```yaml
columns:
  amount: integer not_null | value >= 0
```

### Using CLI Flags

```bash
# template
sling run --columns '{ "<column_name>": "<data_type>" }'

# template with constraint
sling run --columns '{ "<column_name>": "<data_type> | <constraint>" }'

# cast column to bigint
sling run --columns '{ "my_column": "bigint" }'

# cast column to decimal(10, 2)
sling run --columns '{ "my_column": "decimal(10, 2)" }'

# cast multiple columns
sling run --columns '{ "my_column": "string(150)", "my_other_column": "decimal(10, 2)" }'

# cast all columns to string
sling run --columns '{ "*": "string" }'

# apply constraint, value should be greater than 0
sling run --columns '{ "my_column": "integert | value > 0" }'

```

### Using YAML

Using the `defaults` and `streams` keys, you can specify different columns for each stream.

```yaml
source: source_name
target: target_name

defaults:
  # apply to all streams by default
  columns:
    id: bigint

streams:
  # inherit defaults
  my_stream:

  my_other_stream:
    # apply to this stream only (replaces defaults)
    columns:
      id: bigint
      my_column: string(150)
      my_other_column: decimal(10, 2)
      my_int: int | value > 0   # apply constraint, value should be greater than 0

  # cast all columns to string
  another_stream:
    columns:
      "*": string
```

#### Merging with Defaults

By default, stream-level `columns` **replace** `defaults.columns` entirely. To **merge** stream columns with defaults instead, prefix column names with `+`. This inherits all defaults and lets you add or override specific columns. Requires *v1.5.12+*.

```yaml
source: oracle
target: postgres

defaults:
  columns:
    note_id: bigint
    source_id: bigint
    trip_header_id: bigint
    note: text

streams:
  # inherits all defaults
  table_a:

  # "+" prefix: merges with defaults
  # inherits note_id, source_id, trip_header_id, note from defaults
  # adds trip_id: bigint
  table_b:
    columns:
      +trip_id: bigint

  # without "+": replaces defaults entirely (legacy behavior)
  # only trip_id: bigint is applied, no defaults inherited
  table_c:
    columns:
      trip_id: bigint
```

{% hint style="warning" %}
You cannot mix `+` prefixed and non-prefixed column names in the same `columns` block. Either all columns use the `+` prefix (merge mode) or none do (replace mode).
{% endhint %}

#### Unsetting a Default

When using merge mode (`+` prefix), you can remove a default column type for a specific stream by setting it to `null` using YAML's `~`. This reverts that column to the auto-detected type from the source.

```yaml
defaults:
  columns:
    id: bigint
    name: text

streams:
  my_stream:
    columns:
      +id: ~      # unset: uses the auto-inferred type from the source
                   # name: text is still inherited from defaults
```

## Column Casing

The `column_casing` [target option](/concepts/replication/target-options.md) allows you to control how column names are formatted when creating tables in the target database. This is useful for ensuring consistent naming conventions and avoiding the need to use quotes when querying tables in databases with case-sensitive identifiers.

Starting in `v1.4.5`, the default is `normalize`. Before this version, the default was `source`. See note [here](https://github.com/slingdata-io/sling-cli/issues/538#issuecomment-2799022785) for details.

### Available Casing Options

* `normalize` - Normalize column names to target database's default casing (upper or lower case), but preserve mixed-case column names. This helps with querying tables without needing quotes for standard column names.
* `source` - Keep the original casing from the source data.
* `target` - Convert all column names according to the target database's default casing (upper case for Oracle, lower case for PostgreSQL, etc.).
* `snake` - Convert `camelCase` and other formats to `snake_case`, then apply the target database's default casing.
* `upper` - Convert all column names to UPPER CASE.
* `lower` - Convert all column names to lower case.

### Examples

Assuming source column names: `customerId`, `first_name`, `LAST_NAME`, `email-address`

#### Using CLI Flag

```bash
# Keep source casing
sling run --tgt-options '{"column_casing": "source"}'

# Use target database casing
sling run --tgt-options '{"column_casing": "target"}'

# Convert to snake case with target database casing
sling run --tgt-options '{"column_casing": "snake"}'

# Convert to upper case
sling run --tgt-options '{"column_casing": "upper"}'

# Convert to lower case
sling run --tgt-options '{"column_casing": "lower"}'
```

#### Using YAML

```yaml
source: mysql.customer_data
target: postgres.public.customers

defaults:
  mode: full-refresh
  target_options:
    column_casing: snake
```

#### Example Results

Let's see how each option transforms our sample column names for a PostgreSQL, DuckDB or MySQL target (which defaults to lowercase):

| Original Column | source        | normalize     | target         | snake          | upper          | lower          |
| --------------- | ------------- | ------------- | -------------- | -------------- | -------------- | -------------- |
| customerId      | customerId    | customerId    | customerid     | customer\_id   | CUSTOMERID     | customerid     |
| first\_name     | first\_name   | first\_name   | first\_name    | first\_name    | FIRST\_NAME    | first\_name    |
| LAST\_NAME      | LAST\_NAME    | last\_name    | last\_name     | last\_name     | LAST\_NAME     | last\_name     |
| email-address   | email-address | email-address | email\_address | email\_address | EMAIL\_ADDRESS | email\_address |

For an Oracle or Snowflake target (which defaults to uppercase):

| Original Column | source        | normalize     | target         | snake          | upper          | lower          |
| --------------- | ------------- | ------------- | -------------- | -------------- | -------------- | -------------- |
| customerId      | customerId    | customerId    | CUSTOMERID     | CUSTOMER\_ID   | CUSTOMERID     | customerid     |
| first\_name     | first\_name   | FIRST\_NAME   | FIRST\_NAME    | FIRST\_NAME    | FIRST\_NAME    | first\_name    |
| LAST\_NAME      | LAST\_NAME    | LAST\_NAME    | LAST\_NAME     | LAST\_NAME     | LAST\_NAME     | last\_name     |
| email-address   | email-address | email-address | EMAIL\_ADDRESS | EMAIL\_ADDRESS | EMAIL\_ADDRESS | email\_address |

This functionality makes it easier to work with column names when moving data between systems with different naming conventions or case sensitivity requirements.

## Column Typing

Starting in `v1.4.5`, the `column_typing` [target option](/concepts/replication/target-options.md) allows you to configure how Sling generates column types when creating tables in the target database. This is particularly useful when you need to ensure string columns have sufficient length to accommodate all possible values, especially when dealing with different database systems or character encodings.

### Structure

The `column_typing` configuration has the following structure:

```yaml
target_options:
  column_typing:
    string:
      length_factor: <int>
      min_length: <int>
      max_length: <int>
      use_max: <bool>

    decimal:
      min_precision: <int>
      max_precision: <int>
      min_scale: <int>
      max_scale: <int>
      cast_as: <string>  # "float" or "string"

    json:
      as_text: <bool>

    boolean:
      cast_as: <string>  # "integer" or "string"
```

Where:

* `string`: Settings for string type columns
  * `length_factor`: A multiplier applied to the detected length of string columns (default: 1)
  * `min_length`: The minimum length to use for string columns (if specified)
  * `max_length`: The maximum length to use for string columns (if specified)
  * `use_max`: Whether to always use the max\_length value instead of calculated lengths (default: false)
* `decimal`: Settings for decimal type columns
  * `min_precision`: The minimum total number of digits (precision) for decimal columns.
  * `max_precision`: The maximum total number of digits (precision) for decimal columns.
  * `min_scale`: The minimum number of digits after the decimal point (scale).
  * `max_scale`: The maximum number of digits after the decimal point (scale).
  * `cast_as`: Force decimal columns to be cast as a specific type (available in `v1.4.27`+):
    * `"float"`: Convert decimal columns to floating-point type (e.g., `DOUBLE PRECISION` in PostgreSQL)
    * `"string"`: Convert decimal columns to string/text type (e.g., `VARCHAR` in PostgreSQL)
* `json`: Settings for JSON type columns
  * `as_text`: When set to `true`, JSON columns are stored as text/string type instead of native JSON type (default: false). This is useful when the target database doesn't support native JSON types, or when you need to store JSON data as plain text for compatibility.
* `boolean`: Settings for boolean type columns
  * `cast_as`: Force boolean columns to be cast as a specific type:
    * `"integer"`: Convert boolean columns to integer type (1 for true, 0 for false)
    * `"string"`: Convert boolean columns to string type ("true" or "false")

### How It Works

When Sling creates tables in the target database, it analyzes the source data to determine appropriate column types.

For string columns:

1. Sling determines the maximum string length from the source data
2. If `length_factor` is specified, this value is multiplied by the factor
3. If `max_length` is specified and `use_max` is false, the length is capped at this value
4. If `use_max` is true, `max_length` is used regardless of the calculated length
5. If `min_length` is specified, the length will be at least that number

For decimal columns:

1. Sling determines the required precision and scale based on the source data.
2. If `cast_as` is specified:
   * `"float"`: Decimal columns are converted to floating-point type, ignoring precision/scale settings
   * `"string"`: Decimal columns are converted to string/text type, preserving exact decimal representation
3. Otherwise, if `column_typing.decimal` settings are provided, Sling adjusts the calculated precision and scale based on the `min_precision`, `max_precision`, `min_scale`, and `max_scale` values.
4. The final precision and scale are used to generate the `decimal(precision, scale)` type in the target database DDL.

This helps prevent truncation issues when moving data between systems with different character encoding requirements or different decimal precision/scale needs.

### Examples

#### Double String Column Lengths

```yaml
source: mssql
target: redshift

defaults:
  mode: truncate
  object: public.{stream_table}
  
streams:
  dbo.test_sling_unicode:
    target_options:
      column_typing:
        string:
          length_factor: 2
```

In this example, all string columns in the `dbo.test_sling_unicode` table will have their length doubled when created in the PostgreSQL target. This is useful when moving from a database that uses single-byte encoding to one that uses multi-byte encoding (like UTF-8).

#### Set Maximum String Length

```yaml
source: mysql
target: oracle

defaults:
  target_options:
    column_typing:
      string:
        max_length: 8000
```

This example sets a maximum length of 8000 characters for all string columns across all streams, which is useful for databases with column size limitations.

#### Use Fixed String Length

```yaml
streams:
  sales.customers:
    target_options:
      column_typing:
        string:
          max_length: 1000
          use_max: true
```

This configuration forces all string columns in the `sales.customers` table to use a fixed length of 1000, regardless of the actual data length.

#### Cast Decimals as Float

```yaml
source: mssql
target: postgres

streams:
  dbo.financial_data:
    target_options:
      column_typing:
        decimal:
          cast_as: float
```

This example converts all decimal columns to floating-point type (`DOUBLE PRECISION` in PostgreSQL). This is useful when you need better performance for numeric operations and can accept the loss of exact decimal precision.

#### Cast Decimals as String

```yaml
source: mssql
target: postgres

streams:
  dbo.accounting_records:
    target_options:
      column_typing:
        decimal:
          cast_as: string
```

This example converts all decimal columns to string type (`VARCHAR` in PostgreSQL). This is useful when you need to preserve exact decimal representation, including trailing zeros, or when the target database doesn't support the required decimal precision.

#### Store JSON as Text

```yaml
source: postgres
target: mysql

streams:
  public.events:
    target_options:
      column_typing:
        json:
          as_text: true
```

This example stores JSON columns as text instead of native JSON type. This is useful when replicating to databases that don't have native JSON support, or when you want to store JSON data as plain text for compatibility with older systems or specific application requirements.

#### Cast Booleans as Integer

```yaml
source: postgres
target: oracle

streams:
  public.feature_flags:
    target_options:
      column_typing:
        boolean:
          cast_as: integer
```

This example converts boolean columns to integer type (1 for true, 0 for false). This is useful when replicating to databases that don't have native boolean support, such as Oracle, or when integrating with legacy systems that expect numeric flags. Available in v1.5.3+.

#### Cast Booleans as String

```yaml
source: mysql
target: bigquery

streams:
  app.settings:
    target_options:
      column_typing:
        boolean:
          cast_as: string
```

This example converts boolean columns to string type ("true" or "false"). This can be useful for data warehousing scenarios where you want boolean values stored as human-readable text, or when the target system expects string representations of boolean values.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.slingdata.io/concepts/replication/columns.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
