Structure

Below is the structure of the replication configuration file.

Root Level

At the root level, we have the following keys:

# 'source', 'target' and 'streams' keys are required
source: <connection name>
target: <connection name>

defaults: <replication stream map>

hooks: <replication level hooks map>

streams:
  <stream name>: <replication stream map>

env:
  <variable name>: <variable value>

Stream Level

The <stream name> identifies the stream to replicate. This can be either a source table name, a file path, or a wildcard pattern using *. Wildcards allow matching multiple tables within a schema or multiple files within a directory. For example, my_schema.* matches all tables in my_schema, while data/*.csv matches all CSV files in the data directory. See Tags & Wildcards for more details.

The <replication stream map> is a map object which accepts the following keys:

Hooks

The <replication level hooks map> and <stream level hooks map> accepts the keys below. See Hooks for more details.

Source Options

The <source options map> accepts the keys below. See Source Options for more details.

Target Options

The <target options map> accepts the keys below. See Target Options for more details.

Replication Specification

Here we have the definitions for the accepted keys.

Replication Config Key
Description

source

The source database connection (name, conn string or URL).

target

The target database connection (name, conn string or URL).

hooks

The replication level hooks to apply (at start & end of replication). See here for details.

streams.<key>

The source table (schema.table), local / cloud file path. Use file:// for local paths.

streams.<key>.object

or defaults.object

The target table (schema.table) or local / cloud file path. Use file:// for local paths.

streams.<key>.columns

or defaults.columns

The columns types map. See here for details.

streams.<key>.transforms

or defaults.transforms

The transforms to apply. See here for details.

streams.<key>.hooks

or defaults.hooks

The stream level hooks to apply (pre- & post-stream run). See here for details.

streams.<key>.mode

or defaults.mode

The target load mode to use: incremental, truncate, full-refresh, backfill or snapshot. Default is full-refresh.

streams.<key>.select or defaults.select

Select or exclude specific columns from the source stream. Use - prefix to exclude.

streams.<key>.single or defaults.single

When using a wildcard (*) in the stream name, consider as a single stream (don't expand into many streams).

streams.<key>.sql or defaults.sql

The custom SQL query to use. Accepts file://path/to.query.sql as well.

streams.<key>.primary_key

or defaults.primary_key

The column(s) to use as primary key. If composite key, use array.

streams.<key>.update_key

or defaults.update_key

The column to use as update key (for incremental mode).

streams.<key>.source_options

or defaults.source_options

Options to further configure source. See here for details.

streams.<key>.target_options

or defaults.target_options

Options to further configure target. See here for details.

env

Environment variables to use for replication. See here for details.

Last updated

Was this helpful?