Structure
Below is the structure of the replication configuration file.
Root Level
At the root level, we have the following keys:
Stream Level
The <stream name>
identifies the stream to replicate. This can be either a source table name, a file path, or a wildcard pattern using *
. Wildcards allow matching multiple tables within a schema or multiple files within a directory. For example, my_schema.*
matches all tables in my_schema
, while data/*.csv
matches all CSV files in the data
directory. See Tags & Wildcards for more details.
The <replication stream map>
is a map object which accepts the following keys:
Hooks
The <replication level hooks map>
and <stream level hooks map>
accepts the keys below. See Hooks for more details.
Source Options
The <source options map>
accepts the keys below. See Source Options for more details.
Target Options
The <target options map>
accepts the keys below. See Target Options for more details.
Replication Specification
Here we have the definitions for the accepted keys.
source
The source database connection (name, conn string or URL).
target
The target database connection (name, conn string or URL).
hooks
The replication level hooks to apply (at start & end of replication). See here for details.
streams.<key>
The source table (schema.table), local / cloud file path. Use file://
for local paths.
streams.<key>.object
or defaults.object
The target table (schema.table) or local / cloud file path. Use file://
for local paths.
streams.<key>.columns
or defaults.columns
The columns types map. See here for details.
streams.<key>.transforms
or defaults.transforms
The transforms to apply. See here for details.
streams.<key>.hooks
or defaults.hooks
The stream level hooks to apply (pre- & post-stream run). See here for details.
streams.<key>.mode
or defaults.mode
The target load mode to use: incremental
, truncate
, full-refresh
, backfill
or snapshot
. Default is full-refresh
.
streams.<key>.select
or defaults.select
Select or exclude specific columns from the source stream. Use -
prefix to exclude.
streams.<key>.single
or defaults.single
When using a wildcard (*
) in the stream name, consider as a single stream (don't expand into many streams).
streams.<key>.sql
or defaults.sql
The custom SQL query to use. Accepts file://path/to.query.sql
as well.
streams.<key>.primary_key
or defaults.primary_key
The column(s) to use as primary key. If composite key, use array.
streams.<key>.update_key
or defaults.update_key
The column to use as update key (for incremental
mode).
streams.<key>.source_options
or defaults.source_options
Options to further configure source. See here for details.
streams.<key>.target_options
or defaults.target_options
Options to further configure target. See here for details.
env
Environment variables to use for replication. See here for details.
Last updated