Columns
Selecting Columns
You can specify which columns to include or exclude during replication using the select option.
Using CLI Flags
# select specific columns
sling run --select 'col1,col2'
# exclude specific columns (use minus prefix)
sling run --select "-col3,-col4"Using YAML
source: source_name
target: target_name
defaults:
# select specific columns for all streams
select: [col1, col2]
streams:
my_stream:
# exclude specific columns for this stream
select: [-col3, -col4]
my_other_stream:
# select only these columns for this stream
select: [id, name, email]Casting Columns
When running a replication, you can specify the column types to tell Sling to cast the data to the correct type. It is not necessary to include all columns. Sling will automatically detect the types for any unspecified columns. See here and here for various type mappings between native and generic types for all types of databases.
Acceptable data types are:
bigintbooldatetimedecimalordecimal(precision, scale)(such asdecimal(10, 2))integerjsonstringorstring(length)(such asstring(100))textortext(length)(such astext(3000))
Sling allows you the ability to apply constraints, such as value > 0. See Constraints for details.
Using CLI Flags
# template
sling run --columns '{ "<column_name>": "<data_type>" }'
# template with constraint
sling run --columns '{ "<column_name>": "<data_type> | <constraint>" }'
# cast column to bigint
sling run --columns '{ "my_column": "bigint" }'
# cast column to decimal(10, 2)
sling run --columns '{ "my_column": "decimal(10, 2)" }'
# cast multiple columns
sling run --columns '{ "my_column": "string(150)", "my_other_column": "decimal(10, 2)" }'
# cast all columns to string
sling run --columns '{ "*": "string" }'
# apply constraint, value should be greater than 0
sling run --columns '{ "my_column": "integert | value > 0" }'
Using YAML
Using the defaults and streams keys, you can specify different columns for each stream.
source: source_name
target: target_name
defaults:
# apply to all streams by default
columns:
id: bigint
streams:
# inherit defaults
my_stream:
my_other_stream:
# apply to this stream only (overrides defaults)
columns:
id: bigint
my_column: string(150)
my_other_column: decimal(10, 2)
my_int: int | value > 0 # apply constraint, value should be greater than 0
# cast all columns to string
another_stream:
columns:
"*": stringColumn Casing
The column_casing target option allows you to control how column names are formatted when creating tables in the target database. This is useful for ensuring consistent naming conventions and avoiding the need to use quotes when querying tables in databases with case-sensitive identifiers.
Starting in v1.4.5, the default is normalize. Before this version, the default was source. See note here for details.
Available Casing Options
normalize- Normalize column names to target database's default casing (upper or lower case), but preserve mixed-case column names. This helps with querying tables without needing quotes for standard column names.source- Keep the original casing from the source data.target- Convert all column names according to the target database's default casing (upper case for Oracle, lower case for PostgreSQL, etc.).snake- ConvertcamelCaseand other formats tosnake_case, then apply the target database's default casing.upper- Convert all column names to UPPER CASE.lower- Convert all column names to lower case.
Examples
Assuming source column names: customerId, first_name, LAST_NAME, email-address
Using CLI Flag
# Keep source casing
sling run --tgt-options '{"column_casing": "source"}'
# Use target database casing
sling run --tgt-options '{"column_casing": "target"}'
# Convert to snake case with target database casing
sling run --tgt-options '{"column_casing": "snake"}'
# Convert to upper case
sling run --tgt-options '{"column_casing": "upper"}'
# Convert to lower case
sling run --tgt-options '{"column_casing": "lower"}'Using YAML
source: mysql.customer_data
target: postgres.public.customers
defaults:
mode: full-refresh
target_options:
column_casing: snakeExample Results
Let's see how each option transforms our sample column names for a PostgreSQL, DuckDB or MySQL target (which defaults to lowercase):
customerId
customerId
customerId
customerid
customer_id
CUSTOMERID
customerid
first_name
first_name
first_name
first_name
first_name
FIRST_NAME
first_name
LAST_NAME
LAST_NAME
last_name
last_name
last_name
LAST_NAME
last_name
email-address
email-address
email-address
email_address
email_address
EMAIL_ADDRESS
email_address
For an Oracle or Snowflake target (which defaults to uppercase):
customerId
customerId
customerId
CUSTOMERID
CUSTOMER_ID
CUSTOMERID
customerid
first_name
first_name
FIRST_NAME
FIRST_NAME
FIRST_NAME
FIRST_NAME
first_name
LAST_NAME
LAST_NAME
LAST_NAME
LAST_NAME
LAST_NAME
LAST_NAME
last_name
email-address
email-address
email-address
EMAIL_ADDRESS
EMAIL_ADDRESS
EMAIL_ADDRESS
email_address
This functionality makes it easier to work with column names when moving data between systems with different naming conventions or case sensitivity requirements.
Column Typing
Starting in v1.4.5, the column_typing target option allows you to configure how Sling generates column types when creating tables in the target database. This is particularly useful when you need to ensure string columns have sufficient length to accommodate all possible values, especially when dealing with different database systems or character encodings.
Structure
The column_typing configuration has the following structure:
target_options:
column_typing:
string:
length_factor: <int>
min_length: <int>
max_length: <int>
use_max: <bool>
decimal:
min_precision: <int>
max_precision: <int>
min_scale: <int>
max_scale: <int>
cast_as: <string> # "float" or "string"Where:
string: Settings for string type columnslength_factor: A multiplier applied to the detected length of string columns (default: 1)min_length: The minimum length to use for string columns (if specified)max_length: The maximum length to use for string columns (if specified)use_max: Whether to always use the max_length value instead of calculated lengths (default: false)
decimal: Settings for decimal type columnsmin_precision: The minimum total number of digits (precision) for decimal columns.max_precision: The maximum total number of digits (precision) for decimal columns.min_scale: The minimum number of digits after the decimal point (scale).max_scale: The maximum number of digits after the decimal point (scale).cast_as: Force decimal columns to be cast as a specific type (available inv1.4.27+):"float": Convert decimal columns to floating-point type (e.g.,DOUBLE PRECISIONin PostgreSQL)"string": Convert decimal columns to string/text type (e.g.,VARCHARin PostgreSQL)
How It Works
When Sling creates tables in the target database, it analyzes the source data to determine appropriate column types.
For string columns:
Sling determines the maximum string length from the source data
If
length_factoris specified, this value is multiplied by the factorIf
max_lengthis specified anduse_maxis false, the length is capped at this valueIf
use_maxis true,max_lengthis used regardless of the calculated lengthIf
min_lengthis specified, the length will be at least that number
For decimal columns:
Sling determines the required precision and scale based on the source data.
If
cast_asis specified:"float": Decimal columns are converted to floating-point type, ignoring precision/scale settings"string": Decimal columns are converted to string/text type, preserving exact decimal representation
Otherwise, if
column_typing.decimalsettings are provided, Sling adjusts the calculated precision and scale based on themin_precision,max_precision,min_scale, andmax_scalevalues.The final precision and scale are used to generate the
decimal(precision, scale)type in the target database DDL.
This helps prevent truncation issues when moving data between systems with different character encoding requirements or different decimal precision/scale needs.
Examples
Double String Column Lengths
source: mssql
target: redshift
defaults:
mode: truncate
object: public.{stream_table}
streams:
dbo.test_sling_unicode:
target_options:
column_typing:
string:
length_factor: 2In this example, all string columns in the dbo.test_sling_unicode table will have their length doubled when created in the PostgreSQL target. This is useful when moving from a database that uses single-byte encoding to one that uses multi-byte encoding (like UTF-8).
Set Maximum String Length
source: mysql
target: oracle
defaults:
target_options:
column_typing:
string:
max_length: 8000This example sets a maximum length of 8000 characters for all string columns across all streams, which is useful for databases with column size limitations.
Use Fixed String Length
streams:
sales.customers:
target_options:
column_typing:
string:
max_length: 1000
use_max: trueThis configuration forces all string columns in the sales.customers table to use a fixed length of 1000, regardless of the actual data length.
Cast Decimals as Float
source: mssql
target: postgres
streams:
dbo.financial_data:
target_options:
column_typing:
decimal:
cast_as: floatThis example converts all decimal columns to floating-point type (DOUBLE PRECISION in PostgreSQL). This is useful when you need better performance for numeric operations and can accept the loss of exact decimal precision.
Cast Decimals as String
source: mssql
target: postgres
streams:
dbo.accounting_records:
target_options:
column_typing:
decimal:
cast_as: stringThis example converts all decimal columns to string type (VARCHAR in PostgreSQL). This is useful when you need to preserve exact decimal representation, including trailing zeros, or when the target database doesn't support the required decimal precision.
Last updated
Was this helpful?