Database to File

Examples of using Sling to load data from databases to storage systems

We first need to make sure our connections are available in our environment. See Environmentarrow-up-right, Storage Connections and Database Connections for more details.

export MY_SOURCE_DB='...'

$ sling conns list
+---------------+------------------+-----------------+
| CONN NAME     | CONN TYPE        | SOURCE          |
+---------------+------------------+-----------------+
| MY_S3_BUCKET  | FileSys - S3     | sling env yaml  |
| MY_SOURCE_DB  | DB - PostgreSQL  | env variable    |
| MY_GS_BUCKET  | FileSys - Google | sling env yaml  |
| MY_AZURE_CONT | FileSys - Azure  | sling env yaml  |
+---------------+------------------+-----------------+
chevron-rightDatabase โ‡จ Local Storage (CSV)hashtag

Using CLI Flags

$ sling run --src-conn MY_SOURCE_DB \
  --src-stream 'source_schema.source_table' \
  --tgt-object 'file:///tmp/my_file.csv'

$ sling run --src-conn MY_SOURCE_DB \
  --src-stream 'source_schema.source_table' \
  --tgt-object 'file:///tmp/my_csv_folder/*.csv'

$ sling run --src-conn MY_SOURCE_DB \
  --src-stream 'source_schema.source_table' \
  --tgt-object 'file:///tmp/my_csv_folder/' \
  --tgt-options '{file_max_rows: 100000, format: csv}'

# Windows path format
$ sling run --src-conn MY_SOURCE_DB \
  --src-stream 'source_schema.source_table' \
  --tgt-object 'file://C:/Temp/my_csv_folder/' \
  --tgt-options '{file_max_rows: 100000, format: csv}'

Using Replication

Running with Sling: sling run -r /path/to/replication.yaml

replication.yaml
source: MY_SOURCE_DB
target: LOCAL

defaults:
  target_options:
    format: csv

streams:
  source_schema.source_table:
    object: file:///tmp/my_file.csv

  source_schema.source_table1:
    object: file:///tmp/my_csv_folder/*.csv

  source_schema.source_table2:
    object: file:///tmp/my_csv_folder/
    target_options:
      file_max_rows: 100000

  source_schema.source_table3:
    object: file://C:/Temp/my_csv_folder/ # Windows Path format
    target_options:
      file_max_rows: 100000

  # all tables in schema, except "forbidden_table"
  my_schema.*:
    object: file:///tmp/{stream_schema}/{stream_table}/{YYYY}_{MM}_{DD}/
    target_options:
      file_max_rows: 400000 # will split files into folder
  my_schema.forbidden_table:
    disabled: true

env:
  SLING_THREADS: 3 # run streams concurrently

Using Python

replication.py
from sling import Replication, ReplicationStream, TargetOptions, Format
import os

# Set environment variables
os.environ['MY_SOURCE_DB'] = '...'

# Single file export
replication = Replication(
    source='MY_SOURCE_DB',
    target='LOCAL',
    streams={
        'source_schema.source_table': ReplicationStream(
            object='file:///tmp/my_file.csv'
        )
    }
)

# Run the replication
replication.run()

# Multiple streams with target options
replication = Replication(
    source='MY_SOURCE_DB',
    target='LOCAL',
    defaults=TargetOptions(format=Format.CSV),
    streams={
        'source_schema.source_table': ReplicationStream(
            object='file:///tmp/my_file.csv'
        ),
        'source_schema.source_table1': ReplicationStream(
            object='file:///tmp/my_csv_folder/*.csv'
        ),
        'source_schema.source_table2': ReplicationStream(
            object='file:///tmp/my_csv_folder/',
            target_options=TargetOptions(file_max_rows=100000)
        ),
        'source_schema.source_table3': ReplicationStream(
            object='file://C:/Temp/my_csv_folder/',  # Windows Path format
            target_options=TargetOptions(file_max_rows=100000)
        )
    },
    env={'SLING_THREADS': '3'}  # run streams concurrently
)

replication.run()

# Schema wildcard with disabled stream
replication = Replication(
    source='MY_SOURCE_DB',
    target='LOCAL',
    defaults=TargetOptions(format=Format.CSV),
    streams={
        'my_schema.*': ReplicationStream(
            object='file:///tmp/{stream_schema}/{stream_table}/{YYYY}_{MM}_{DD}/',
            target_options=TargetOptions(file_max_rows=400000)
        ),
        'my_schema.forbidden_table': ReplicationStream(
            disabled=True
        )
    },
    env={'SLING_THREADS': '3'}
)

replication.run()
chevron-rightDatabase โ‡จ STDOUThashtag

Using CLI Flags

sling.sh
$ sling run --src-conn MY_SOURCE_DB --src-stream 'source_schema.source_table' --stdout
chevron-rightDatabase โ‡จ Local Storage (JSON)hashtag

Using CLI Flags


Using Replication

Running with Sling: sling run -r /path/to/replication.yaml


Using Python

chevron-rightDatabase โ‡จ Local Storage (JSON Lines)hashtag

Using CLI Flags


Using Replication

Running with Sling: sling run -r /path/to/replication.yaml


Using Python

chevron-rightDatabase โ‡จ Local Storage (Parquet)hashtag

See also Incremental Examples.

Using CLI Flags


Using Replication

Running with Sling: sling run -r /path/to/replication.yaml


Using Python

chevron-rightDatabase โ‡จ Local Storage (GeoJSON)hashtag

Export spatial data from PostgreSQL/PostGIS to GeoJSON format. Sling will convert geometry columns to RFC 7946 compliant GeoJSON. Available from v1.5.2.

Using CLI Flags


Using Replication

Running with Sling: sling run -r /path/to/replication.yaml

circle-info

Note: GeoJSON format supports only one geometry column per stream, per RFC 7946. If your table has multiple geometry columns, specify which one to use via the columns configuration.

chevron-rightDatabase โ‡จ Cloud Storage (CSV)hashtag

Using CLI Flags


Using Replication

Running with Sling: sling run -r /path/to/replication.yaml


Using Python

chevron-rightDatabase โ‡จ Cloud Storage (JSON)hashtag

Using CLI Flags


Using Replication

Running with Sling: sling run -r /path/to/replication.yaml


Using Python

chevron-rightDatabase โ‡จ Cloud Storage (JSON Lines)hashtag

Using CLI Flags


Using Replication

Running with Sling: sling run -r /path/to/replication.yaml


Using Python

chevron-rightDatabase โ‡จ Cloud Storage (Parquet)hashtag

See also Incremental Examples.

Using CLI Flags


Using Replication

Running with Sling: sling run -r /path/to/replication.yaml


Using Python

Last updated

Was this helpful?