Database to File

Examples of using Sling to load data from databases to storage systems

We first need to make sure our connections are available in our environment. See Environment, Storage Connections and Database Connections for more details.

export MY_SOURCE_DB='...'

$ sling conns list
+---------------+------------------+-----------------+
| CONN NAME     | CONN TYPE        | SOURCE          |
+---------------+------------------+-----------------+
| MY_S3_BUCKET  | FileSys - S3     | sling env yaml  |
| MY_SOURCE_DB  | DB - PostgreSQL  | env variable    |
| MY_GS_BUCKET  | FileSys - Google | sling env yaml  |
| MY_AZURE_CONT | FileSys - Azure  | sling env yaml  |
+---------------+------------------+-----------------+
Database ⇨ Local Storage (CSV)

Using CLI Flags

$ sling run --src-conn MY_SOURCE_DB \
  --src-stream 'source_schema.source_table' \
  --tgt-object 'file:///tmp/my_file.csv'

$ sling run --src-conn MY_SOURCE_DB \
  --src-stream 'source_schema.source_table' \
  --tgt-object 'file:///tmp/my_csv_folder/*.csv'

$ sling run --src-conn MY_SOURCE_DB \
  --src-stream 'source_schema.source_table' \
  --tgt-object 'file:///tmp/my_csv_folder/' \
  --tgt-options '{file_max_rows: 100000, format: csv}'

# Windows path format
$ sling run --src-conn MY_SOURCE_DB \
  --src-stream 'source_schema.source_table' \
  --tgt-object 'file://C:/Temp/my_csv_folder/' \
  --tgt-options '{file_max_rows: 100000, format: csv}'

Using Replication

Running with Sling: sling run -r /path/to/replication.yaml

replication.yaml
source: MY_SOURCE_DB
target: LOCAL

defaults:
  target_options:
    format: csv

streams:
  source_schema.source_table:
    object: file:///tmp/my_file.csv

  source_schema.source_table1:
    object: file:///tmp/my_csv_folder/*.csv

  source_schema.source_table2:
    object: file:///tmp/my_csv_folder/
    target_options:
      file_max_rows: 100000

  source_schema.source_table3:
    object: file://C:/Temp/my_csv_folder/ # Windows Path format
    target_options:
      file_max_rows: 100000

  # all tables in schema, except "forbidden_table"
  my_schema.*:
    object: file:///tmp/{stream_schema}/{stream_table}/{YYYY}_{MM}_{DD}/
    target_options:
      file_max_rows: 400000 # will split files into folder
  my_schema.forbidden_table:
    disabled: true

env:
  SLING_THREADS: 3 # run streams concurrently

Using Python

replication.py
from sling import Replication, ReplicationStream, TargetOptions, Format
import os

# Set environment variables
os.environ['MY_SOURCE_DB'] = '...'

# Single file export
replication = Replication(
    source='MY_SOURCE_DB',
    target='LOCAL',
    streams={
        'source_schema.source_table': ReplicationStream(
            object='file:///tmp/my_file.csv'
        )
    }
)

# Run the replication
replication.run()

# Multiple streams with target options
replication = Replication(
    source='MY_SOURCE_DB',
    target='LOCAL',
    defaults=TargetOptions(format=Format.CSV),
    streams={
        'source_schema.source_table': ReplicationStream(
            object='file:///tmp/my_file.csv'
        ),
        'source_schema.source_table1': ReplicationStream(
            object='file:///tmp/my_csv_folder/*.csv'
        ),
        'source_schema.source_table2': ReplicationStream(
            object='file:///tmp/my_csv_folder/',
            target_options=TargetOptions(file_max_rows=100000)
        ),
        'source_schema.source_table3': ReplicationStream(
            object='file://C:/Temp/my_csv_folder/',  # Windows Path format
            target_options=TargetOptions(file_max_rows=100000)
        )
    },
    env={'SLING_THREADS': '3'}  # run streams concurrently
)

replication.run()

# Schema wildcard with disabled stream
replication = Replication(
    source='MY_SOURCE_DB',
    target='LOCAL',
    defaults=TargetOptions(format=Format.CSV),
    streams={
        'my_schema.*': ReplicationStream(
            object='file:///tmp/{stream_schema}/{stream_table}/{YYYY}_{MM}_{DD}/',
            target_options=TargetOptions(file_max_rows=400000)
        ),
        'my_schema.forbidden_table': ReplicationStream(
            disabled=True
        )
    },
    env={'SLING_THREADS': '3'}
)

replication.run()
Database ⇨ STDOUT

Using CLI Flags

sling.sh
$ sling run --src-conn MY_SOURCE_DB --src-stream 'source_schema.source_table' --stdout
Database ⇨ Local Storage (JSON)

Using CLI Flags


Using Replication

Running with Sling: sling run -r /path/to/replication.yaml


Using Python

Database ⇨ Local Storage (JSON Lines)

Using CLI Flags


Using Replication

Running with Sling: sling run -r /path/to/replication.yaml


Using Python

Database ⇨ Local Storage (Parquet)

See also Incremental Examples.

Using CLI Flags


Using Replication

Running with Sling: sling run -r /path/to/replication.yaml


Using Python

Database ⇨ Cloud Storage (CSV)

Using CLI Flags


Using Replication

Running with Sling: sling run -r /path/to/replication.yaml


Using Python

Database ⇨ Cloud Storage (JSON)

Using CLI Flags


Using Replication

Running with Sling: sling run -r /path/to/replication.yaml


Using Python

Database ⇨ Cloud Storage (JSON Lines)

Using CLI Flags


Using Replication

Running with Sling: sling run -r /path/to/replication.yaml


Using Python

Database ⇨ Cloud Storage (Parquet)

See also Incremental Examples.

Using CLI Flags


Using Replication

Running with Sling: sling run -r /path/to/replication.yaml


Using Python

Last updated

Was this helpful?