# Columns

## Selecting Columns

You can specify which columns to include or exclude during replication using the `select` option.

### Using CLI Flags

```bash
# select specific columns
sling run --select 'col1,col2'

# exclude specific columns (use minus prefix)
sling run --select "-col3,-col4"

# rename columns using 'as' syntax
sling run --select 'name as full_name,id as user_id'
```

### Using YAML

```yaml
source: source_name
target: target_name

defaults:
  # select specific columns for all streams
  select: [col1, col2]

streams:
  my_stream:
    # exclude specific columns for this stream
    select: [-col3, -col4]

  my_other_stream:
    # select only these columns for this stream
    select: [id, name as full_name, email]
```

### Renaming Columns

You can rename columns during replication using the `as` syntax. This generates the appropriate `SELECT column AS alias` SQL when reading from database sources. Requires *v1.5.5+*.

```yaml
streams:
  public.users:
    object: public.customers
    select:
      - 'id as user_id'
      - 'first_name as name'
      - 'email as contact_email'
```

When using custom SQL with the `{fields}` placeholder, the renamed columns are automatically substituted:

```yaml
streams:
  my_stream:
    sql: SELECT {fields} FROM public.users WHERE active = true
    object: public.active_customers
    select:
      - 'id'
      - 'first_name as customer_name'
      - 'score as customer_score'
```

### Excluding Columns with Wildcards

You can exclude columns using glob patterns with the `-` prefix. This supports `*` wildcards for prefix and suffix matching. Requires *v1.5.5+*.

```yaml
streams:
  public.users:
    object: public.users_clean
    select:
      # Exclude all columns starting with "address_"
      - '-address_*'

  public.audit_log:
    object: public.audit_clean
    select:
      # Exclude all columns ending with "_internal"
      - '-*_internal'
      # Exclude specific columns
      - '-debug_info'
```

**Note:** When using exclusions, all items in the `select` list must be exclusions (prefixed with `-`). You cannot mix inclusions and exclusions in the same `select` list.

## Casting Columns

When running a replication, you can specify the column types to tell Sling to cast the data to the correct type. It is not necessary to include all columns. Sling will automatically detect the types for any unspecified columns. See [here](https://github.com/slingdata-io/sling-cli/blob/main/core/dbio/templates/types_general_to_native.tsv) and [here](https://github.com/slingdata-io/sling-cli/blob/main/core/dbio/templates/types_native_to_general.tsv) for various type mappings between native and generic types for all types of databases.

Acceptable data types are:

* `bigint`
* `bool`
* `datetime`
* `decimal` or `decimal(precision, scale)` (such as `decimal(10, 2)`)
* `integer`
* `json`
* `string` or `string(length)` (such as `string(100)`)
* `text` or `text(length)` (such as `text(3000)`)
* `geometry`

Sling allows you the ability to apply constraints, such as `value > 0`. See [Constraints](https://docs.slingdata.io/concepts/data-quality/constraints) for details.

### Using CLI Flags

```bash
# template
sling run --columns '{ "<column_name>": "<data_type>" }'

# template with constraint
sling run --columns '{ "<column_name>": "<data_type> | <constraint>" }'

# cast column to bigint
sling run --columns '{ "my_column": "bigint" }'

# cast column to decimal(10, 2)
sling run --columns '{ "my_column": "decimal(10, 2)" }'

# cast multiple columns
sling run --columns '{ "my_column": "string(150)", "my_other_column": "decimal(10, 2)" }'

# cast all columns to string
sling run --columns '{ "*": "string" }'

# apply constraint, value should be greater than 0
sling run --columns '{ "my_column": "integert | value > 0" }'

```

### Using YAML

Using the `defaults` and `streams` keys, you can specify different columns for each stream.

```yaml
source: source_name
target: target_name

defaults:
  # apply to all streams by default
  columns:
    id: bigint

streams:
  # inherit defaults
  my_stream:

  my_other_stream:
    # apply to this stream only (replaces defaults)
    columns:
      id: bigint
      my_column: string(150)
      my_other_column: decimal(10, 2)
      my_int: int | value > 0   # apply constraint, value should be greater than 0

  # cast all columns to string
  another_stream:
    columns:
      "*": string
```

#### Merging with Defaults

By default, stream-level `columns` **replace** `defaults.columns` entirely. To **merge** stream columns with defaults instead, prefix column names with `+`. This inherits all defaults and lets you add or override specific columns. Requires *v1.5.12+*.

```yaml
source: oracle
target: postgres

defaults:
  columns:
    note_id: bigint
    source_id: bigint
    trip_header_id: bigint
    note: text

streams:
  # inherits all defaults
  table_a:

  # "+" prefix: merges with defaults
  # inherits note_id, source_id, trip_header_id, note from defaults
  # adds trip_id: bigint
  table_b:
    columns:
      +trip_id: bigint

  # without "+": replaces defaults entirely (legacy behavior)
  # only trip_id: bigint is applied, no defaults inherited
  table_c:
    columns:
      trip_id: bigint
```

{% hint style="warning" %}
You cannot mix `+` prefixed and non-prefixed column names in the same `columns` block. Either all columns use the `+` prefix (merge mode) or none do (replace mode).
{% endhint %}

#### Unsetting a Default

When using merge mode (`+` prefix), you can remove a default column type for a specific stream by setting it to `null` using YAML's `~`. This reverts that column to the auto-detected type from the source.

```yaml
defaults:
  columns:
    id: bigint
    name: text

streams:
  my_stream:
    columns:
      +id: ~      # unset: uses the auto-inferred type from the source
                   # name: text is still inherited from defaults
```

## Column Casing

The `column_casing` [target option](https://docs.slingdata.io/concepts/replication/target-options) allows you to control how column names are formatted when creating tables in the target database. This is useful for ensuring consistent naming conventions and avoiding the need to use quotes when querying tables in databases with case-sensitive identifiers.

Starting in `v1.4.5`, the default is `normalize`. Before this version, the default was `source`. See note [here](https://github.com/slingdata-io/sling-cli/issues/538#issuecomment-2799022785) for details.

### Available Casing Options

* `normalize` - Normalize column names to target database's default casing (upper or lower case), but preserve mixed-case column names. This helps with querying tables without needing quotes for standard column names.
* `source` - Keep the original casing from the source data.
* `target` - Convert all column names according to the target database's default casing (upper case for Oracle, lower case for PostgreSQL, etc.).
* `snake` - Convert `camelCase` and other formats to `snake_case`, then apply the target database's default casing.
* `upper` - Convert all column names to UPPER CASE.
* `lower` - Convert all column names to lower case.

### Examples

Assuming source column names: `customerId`, `first_name`, `LAST_NAME`, `email-address`

#### Using CLI Flag

```bash
# Keep source casing
sling run --tgt-options '{"column_casing": "source"}'

# Use target database casing
sling run --tgt-options '{"column_casing": "target"}'

# Convert to snake case with target database casing
sling run --tgt-options '{"column_casing": "snake"}'

# Convert to upper case
sling run --tgt-options '{"column_casing": "upper"}'

# Convert to lower case
sling run --tgt-options '{"column_casing": "lower"}'
```

#### Using YAML

```yaml
source: mysql.customer_data
target: postgres.public.customers

defaults:
  mode: full-refresh
  target_options:
    column_casing: snake
```

#### Example Results

Let's see how each option transforms our sample column names for a PostgreSQL, DuckDB or MySQL target (which defaults to lowercase):

| Original Column | source        | normalize     | target         | snake          | upper          | lower          |
| --------------- | ------------- | ------------- | -------------- | -------------- | -------------- | -------------- |
| customerId      | customerId    | customerId    | customerid     | customer\_id   | CUSTOMERID     | customerid     |
| first\_name     | first\_name   | first\_name   | first\_name    | first\_name    | FIRST\_NAME    | first\_name    |
| LAST\_NAME      | LAST\_NAME    | last\_name    | last\_name     | last\_name     | LAST\_NAME     | last\_name     |
| email-address   | email-address | email-address | email\_address | email\_address | EMAIL\_ADDRESS | email\_address |

For an Oracle or Snowflake target (which defaults to uppercase):

| Original Column | source        | normalize     | target         | snake          | upper          | lower          |
| --------------- | ------------- | ------------- | -------------- | -------------- | -------------- | -------------- |
| customerId      | customerId    | customerId    | CUSTOMERID     | CUSTOMER\_ID   | CUSTOMERID     | customerid     |
| first\_name     | first\_name   | FIRST\_NAME   | FIRST\_NAME    | FIRST\_NAME    | FIRST\_NAME    | first\_name    |
| LAST\_NAME      | LAST\_NAME    | LAST\_NAME    | LAST\_NAME     | LAST\_NAME     | LAST\_NAME     | last\_name     |
| email-address   | email-address | email-address | EMAIL\_ADDRESS | EMAIL\_ADDRESS | EMAIL\_ADDRESS | email\_address |

This functionality makes it easier to work with column names when moving data between systems with different naming conventions or case sensitivity requirements.

## Column Typing

Starting in `v1.4.5`, the `column_typing` [target option](https://docs.slingdata.io/concepts/replication/target-options) allows you to configure how Sling generates column types when creating tables in the target database. This is particularly useful when you need to ensure string columns have sufficient length to accommodate all possible values, especially when dealing with different database systems or character encodings.

### Structure

The `column_typing` configuration has the following structure:

```yaml
target_options:
  column_typing:
    string:
      length_factor: <int>
      min_length: <int>
      max_length: <int>
      use_max: <bool>

    decimal:
      min_precision: <int>
      max_precision: <int>
      min_scale: <int>
      max_scale: <int>
      cast_as: <string>  # "float" or "string"

    json:
      as_text: <bool>

    boolean:
      cast_as: <string>  # "integer" or "string"
```

Where:

* `string`: Settings for string type columns
  * `length_factor`: A multiplier applied to the detected length of string columns (default: 1)
  * `min_length`: The minimum length to use for string columns (if specified)
  * `max_length`: The maximum length to use for string columns (if specified)
  * `use_max`: Whether to always use the max\_length value instead of calculated lengths (default: false)
* `decimal`: Settings for decimal type columns
  * `min_precision`: The minimum total number of digits (precision) for decimal columns.
  * `max_precision`: The maximum total number of digits (precision) for decimal columns.
  * `min_scale`: The minimum number of digits after the decimal point (scale).
  * `max_scale`: The maximum number of digits after the decimal point (scale).
  * `cast_as`: Force decimal columns to be cast as a specific type (available in `v1.4.27`+):
    * `"float"`: Convert decimal columns to floating-point type (e.g., `DOUBLE PRECISION` in PostgreSQL)
    * `"string"`: Convert decimal columns to string/text type (e.g., `VARCHAR` in PostgreSQL)
* `json`: Settings for JSON type columns
  * `as_text`: When set to `true`, JSON columns are stored as text/string type instead of native JSON type (default: false). This is useful when the target database doesn't support native JSON types, or when you need to store JSON data as plain text for compatibility.
* `boolean`: Settings for boolean type columns
  * `cast_as`: Force boolean columns to be cast as a specific type:
    * `"integer"`: Convert boolean columns to integer type (1 for true, 0 for false)
    * `"string"`: Convert boolean columns to string type ("true" or "false")

### How It Works

When Sling creates tables in the target database, it analyzes the source data to determine appropriate column types.

For string columns:

1. Sling determines the maximum string length from the source data
2. If `length_factor` is specified, this value is multiplied by the factor
3. If `max_length` is specified and `use_max` is false, the length is capped at this value
4. If `use_max` is true, `max_length` is used regardless of the calculated length
5. If `min_length` is specified, the length will be at least that number

For decimal columns:

1. Sling determines the required precision and scale based on the source data.
2. If `cast_as` is specified:
   * `"float"`: Decimal columns are converted to floating-point type, ignoring precision/scale settings
   * `"string"`: Decimal columns are converted to string/text type, preserving exact decimal representation
3. Otherwise, if `column_typing.decimal` settings are provided, Sling adjusts the calculated precision and scale based on the `min_precision`, `max_precision`, `min_scale`, and `max_scale` values.
4. The final precision and scale are used to generate the `decimal(precision, scale)` type in the target database DDL.

This helps prevent truncation issues when moving data between systems with different character encoding requirements or different decimal precision/scale needs.

### Examples

#### Double String Column Lengths

```yaml
source: mssql
target: redshift

defaults:
  mode: truncate
  object: public.{stream_table}
  
streams:
  dbo.test_sling_unicode:
    target_options:
      column_typing:
        string:
          length_factor: 2
```

In this example, all string columns in the `dbo.test_sling_unicode` table will have their length doubled when created in the PostgreSQL target. This is useful when moving from a database that uses single-byte encoding to one that uses multi-byte encoding (like UTF-8).

#### Set Maximum String Length

```yaml
source: mysql
target: oracle

defaults:
  target_options:
    column_typing:
      string:
        max_length: 8000
```

This example sets a maximum length of 8000 characters for all string columns across all streams, which is useful for databases with column size limitations.

#### Use Fixed String Length

```yaml
streams:
  sales.customers:
    target_options:
      column_typing:
        string:
          max_length: 1000
          use_max: true
```

This configuration forces all string columns in the `sales.customers` table to use a fixed length of 1000, regardless of the actual data length.

#### Cast Decimals as Float

```yaml
source: mssql
target: postgres

streams:
  dbo.financial_data:
    target_options:
      column_typing:
        decimal:
          cast_as: float
```

This example converts all decimal columns to floating-point type (`DOUBLE PRECISION` in PostgreSQL). This is useful when you need better performance for numeric operations and can accept the loss of exact decimal precision.

#### Cast Decimals as String

```yaml
source: mssql
target: postgres

streams:
  dbo.accounting_records:
    target_options:
      column_typing:
        decimal:
          cast_as: string
```

This example converts all decimal columns to string type (`VARCHAR` in PostgreSQL). This is useful when you need to preserve exact decimal representation, including trailing zeros, or when the target database doesn't support the required decimal precision.

#### Store JSON as Text

```yaml
source: postgres
target: mysql

streams:
  public.events:
    target_options:
      column_typing:
        json:
          as_text: true
```

This example stores JSON columns as text instead of native JSON type. This is useful when replicating to databases that don't have native JSON support, or when you want to store JSON data as plain text for compatibility with older systems or specific application requirements.

#### Cast Booleans as Integer

```yaml
source: postgres
target: oracle

streams:
  public.feature_flags:
    target_options:
      column_typing:
        boolean:
          cast_as: integer
```

This example converts boolean columns to integer type (1 for true, 0 for false). This is useful when replicating to databases that don't have native boolean support, such as Oracle, or when integrating with legacy systems that expect numeric flags. Available in v1.5.3+.

#### Cast Booleans as String

```yaml
source: mysql
target: bigquery

streams:
  app.settings:
    target_options:
      column_typing:
        boolean:
          cast_as: string
```

This example converts boolean columns to string type ("true" or "false"). This can be useful for data warehousing scenarios where you want boolean values stored as human-readable text, or when the target system expects string representations of boolean values.
