# Database to File

We first need to make sure our connections are available in our environment. See [Environment](https://github.com/slingdata-io/sling-docs/blob/master/environment.md), [Storage Connections](/connections/file-connections.md) and [Database Connections](/connections/database-connections.md) for more details.

{% tabs %}
{% tab title="Linux / Mac" %}

```bash
export MY_SOURCE_DB='...'

$ sling conns list
+---------------+------------------+-----------------+
| CONN NAME     | CONN TYPE        | SOURCE          |
+---------------+------------------+-----------------+
| MY_S3_BUCKET  | FileSys - S3     | sling env yaml  |
| MY_SOURCE_DB  | DB - PostgreSQL  | env variable    |
| MY_GS_BUCKET  | FileSys - Google | sling env yaml  |
| MY_AZURE_CONT | FileSys - Azure  | sling env yaml  |
+---------------+------------------+-----------------+
```

{% endtab %}

{% tab title="Windows" %}

```powershell
# using windows Powershell
$env:MY_SOURCE_DB = '...'

$ sling conns list
+---------------+------------------+-----------------+
| CONN NAME     | CONN TYPE        | SOURCE          |
+---------------+------------------+-----------------+
| MY_S3_BUCKET  | FileSys - S3     | sling env yaml  |
| MY_SOURCE_DB  | DB - PostgreSQL  | env variable    |
| MY_GS_BUCKET  | FileSys - Google | sling env yaml  |
| MY_AZURE_CONT | FileSys - Azure  | sling env yaml  |
+---------------+------------------+-----------------+
```

{% endtab %}
{% endtabs %}

<details>

<summary>Database ⇨ Local Storage (CSV)</summary>

**Using** [**CLI Flags**](/sling-cli/run.md#cli-flags-overview)

{% code overflow="wrap" %}

```bash
$ sling run --src-conn MY_SOURCE_DB \
  --src-stream 'source_schema.source_table' \
  --tgt-object 'file:///tmp/my_file.csv'

$ sling run --src-conn MY_SOURCE_DB \
  --src-stream 'source_schema.source_table' \
  --tgt-object 'file:///tmp/my_csv_folder/*.csv'

$ sling run --src-conn MY_SOURCE_DB \
  --src-stream 'source_schema.source_table' \
  --tgt-object 'file:///tmp/my_csv_folder/' \
  --tgt-options '{file_max_rows: 100000, format: csv}'

# Windows path format
$ sling run --src-conn MY_SOURCE_DB \
  --src-stream 'source_schema.source_table' \
  --tgt-object 'file://C:/Temp/my_csv_folder/' \
  --tgt-options '{file_max_rows: 100000, format: csv}'
```

{% endcode %}

***

**Using** [**Replication**](/concepts/replication.md)

Running with Sling: `sling run -r /path/to/replication.yaml`

{% code title="replication.yaml" overflow="wrap" fullWidth="false" %}

```yaml
source: MY_SOURCE_DB
target: LOCAL

defaults:
  target_options:
    format: csv

streams:
  source_schema.source_table:
    object: file:///tmp/my_file.csv

  source_schema.source_table1:
    object: file:///tmp/my_csv_folder/*.csv

  source_schema.source_table2:
    object: file:///tmp/my_csv_folder/
    target_options:
      file_max_rows: 100000

  source_schema.source_table3:
    object: file://C:/Temp/my_csv_folder/ # Windows Path format
    target_options:
      file_max_rows: 100000

  # all tables in schema, except "forbidden_table"
  my_schema.*:
    object: file:///tmp/{stream_schema}/{stream_table}/{YYYY}_{MM}_{DD}/
    target_options:
      file_max_rows: 400000 # will split files into folder
  my_schema.forbidden_table:
    disabled: true

env:
  SLING_THREADS: 3 # run streams concurrently
```

{% endcode %}

***

**Using** [**Python**](/examples/sling-python.md)

{% code title="replication.py" overflow="wrap" %}

```python
from sling import Replication, ReplicationStream, TargetOptions, Format
import os

# Set environment variables
os.environ['MY_SOURCE_DB'] = '...'

# Single file export
replication = Replication(
    source='MY_SOURCE_DB',
    target='LOCAL',
    streams={
        'source_schema.source_table': ReplicationStream(
            object='file:///tmp/my_file.csv'
        )
    }
)

# Run the replication
replication.run()

# Multiple streams with target options
replication = Replication(
    source='MY_SOURCE_DB',
    target='LOCAL',
    defaults=TargetOptions(format=Format.CSV),
    streams={
        'source_schema.source_table': ReplicationStream(
            object='file:///tmp/my_file.csv'
        ),
        'source_schema.source_table1': ReplicationStream(
            object='file:///tmp/my_csv_folder/*.csv'
        ),
        'source_schema.source_table2': ReplicationStream(
            object='file:///tmp/my_csv_folder/',
            target_options=TargetOptions(file_max_rows=100000)
        ),
        'source_schema.source_table3': ReplicationStream(
            object='file://C:/Temp/my_csv_folder/',  # Windows Path format
            target_options=TargetOptions(file_max_rows=100000)
        )
    },
    env={'SLING_THREADS': '3'}  # run streams concurrently
)

replication.run()

# Schema wildcard with disabled stream
replication = Replication(
    source='MY_SOURCE_DB',
    target='LOCAL',
    defaults=TargetOptions(format=Format.CSV),
    streams={
        'my_schema.*': ReplicationStream(
            object='file:///tmp/{stream_schema}/{stream_table}/{YYYY}_{MM}_{DD}/',
            target_options=TargetOptions(file_max_rows=400000)
        ),
        'my_schema.forbidden_table': ReplicationStream(
            disabled=True
        )
    },
    env={'SLING_THREADS': '3'}
)

replication.run()
```

{% endcode %}

</details>

<details>

<summary>Database ⇨ STDOUT</summary>

**Using** [**CLI Flags**](/sling-cli/run.md#cli-flags-overview)

{% code title="sling.sh" overflow="wrap" %}

```bash
$ sling run --src-conn MY_SOURCE_DB --src-stream 'source_schema.source_table' --stdout
```

{% endcode %}

</details>

<details>

<summary>Database ⇨ Local Storage (JSON)</summary>

**Using** [**CLI Flags**](/sling-cli/run.md#cli-flags-overview)

{% code title="sling.sh" overflow="wrap" %}

```bash
$ sling run --src-conn MY_SOURCE_DB \
  --src-stream 'source_schema.source_table' \
  --tgt-object 'file:///tmp/my_file.json'

$ sling run --src-conn MY_SOURCE_DB \
  --src-stream 'source_schema.source_table' \
  --tgt-object 'file:///tmp/my_json_folder/*.json'

$ sling run --src-conn MY_SOURCE_DB \
  --src-stream 'source_schema.source_table' \
  --tgt-object 'file:///tmp/my_json_folder/' \
  --tgt-options '{file_max_bytes: 4000000, format: json}'

# Windows Path format
$ sling run --src-conn MY_SOURCE_DB \
  --src-stream 'source_schema.source_table' \
  --tgt-object 'file://C:/Temp/my_json_folder/' \
  --tgt-options '{file_max_bytes: 4000000, format: json}'
```

{% endcode %}

***

**Using** [**Replication**](/concepts/replication.md)

Running with Sling: `sling run -r /path/to/replication.yaml`

{% code title="replication.yaml" overflow="wrap" fullWidth="false" %}

```yaml
source: MY_SOURCE_DB
target: LOCAL

defaults:
  target_options:
    format: json

streams:
  source_schema.source_table:
    object: file:///tmp/my_file.json

  source_schema.source_table1:
    object: file:///tmp/my_json_folder/*.json

  source_schema.source_table2:
    object: file:///tmp/my_json_folder/
    target_options:
      file_max_bytes: 4000000

  source_schema.source_table3:
    object: file://C:/Temp/my_json_folder/
    target_options:
      file_max_bytes: 4000000

  # all tables in schema, except "forbidden_table"
  my_schema.*:
    object: file:///tmp/{stream_schema}/{stream_table}/{YYYY}_{MM}_{DD}/
    target_options:
      file_max_rows: 400000 # will split files into folder
  my_schema.forbidden_table:
    disabled: true

env:
  SLING_THREADS: 3 # run streams concurrently
```

{% endcode %}

***

**Using** [**Python**](/examples/sling-python.md)

{% code title="replication.py" overflow="wrap" %}

```python
from sling import Replication, ReplicationStream, TargetOptions, Format
import os

# Set environment variables
os.environ['MY_SOURCE_DB'] = '...'

# Single JSON file export
replication = Replication(
    source='MY_SOURCE_DB',
    target='LOCAL',
    streams={
        'source_schema.source_table': ReplicationStream(
            object='file:///tmp/my_file.json'
        )
    }
)

# Run the replication
replication.run()

# Multiple streams with target options
replication = Replication(
    source='MY_SOURCE_DB',
    target='LOCAL',
    defaults=TargetOptions(format=Format.JSON),
    streams={
        'source_schema.source_table': ReplicationStream(
            object='file:///tmp/my_file.json'
        ),
        'source_schema.source_table1': ReplicationStream(
            object='file:///tmp/my_json_folder/*.json'
        ),
        'source_schema.source_table2': ReplicationStream(
            object='file:///tmp/my_json_folder/',
            target_options=TargetOptions(file_max_bytes=4000000)
        ),
        'source_schema.source_table3': ReplicationStream(
            object='file://C:/Temp/my_json_folder/',  # Windows Path format
            target_options=TargetOptions(file_max_bytes=4000000)
        )
    },
    env={'SLING_THREADS': '3'}  # run streams concurrently
)

replication.run()

# Schema wildcard with disabled stream
replication = Replication(
    source='MY_SOURCE_DB',
    target='LOCAL',
    defaults=TargetOptions(format=Format.JSON),
    streams={
        'my_schema.*': ReplicationStream(
            object='file:///tmp/{stream_schema}/{stream_table}/{YYYY}_{MM}_{DD}/',
            target_options=TargetOptions(file_max_rows=400000)
        ),
        'my_schema.forbidden_table': ReplicationStream(
            disabled=True
        )
    },
    env={'SLING_THREADS': '3'}
)

replication.run()
```

{% endcode %}

</details>

<details>

<summary>Database ⇨ Local Storage (JSON Lines)</summary>

**Using** [**CLI Flags**](/sling-cli/run.md#cli-flags-overview)

{% code title="sling.sh" overflow="wrap" %}

```bash
$ sling run --src-conn MY_SOURCE_DB \
  --src-stream 'source_schema.source_table' \
  --tgt-object 'file:///tmp/my_file.jsonl' \
  --tgt-options '{format: jsonlines}'

$ sling run --src-conn MY_SOURCE_DB \
  --src-stream 'source_schema.source_table' \
  --tgt-object 'file:///tmp/my_json_folder/*.jsonl'

$ sling run --src-conn MY_SOURCE_DB \
  --src-stream 'source_schema.source_table' \
  --tgt-object 'file:///tmp/my_json_folder/' \
  --tgt-options '{file_max_bytes: 4000000, format: jsonlines}'

# Windows Path format
$ sling run --src-conn MY_SOURCE_DB \
  --src-stream 'source_schema.source_table' \
  --tgt-object 'file://C:/Temp/my_json_folder/' \
  --tgt-options '{file_max_bytes: 4000000, format: jsonlines}'
```

{% endcode %}

***

**Using** [**Replication**](/concepts/replication.md)

Running with Sling: `sling run -r /path/to/replication.yaml`

{% code title="replication.yaml" overflow="wrap" fullWidth="false" %}

```yaml
source: MY_SOURCE_DB
target: LOCAL

defaults:
  target_options:
    format: jsonlines

streams:
  source_schema.source_table:
    object: file:///tmp/my_file.jsonl

  source_schema.source_table1:
    object: file:///tmp/my_jsonlines_folder/*.jsonl

  source_schema.source_table2:
    object: file:///tmp/my_jsonlines_folder/
    target_options:
      file_max_bytes: 4000000

  source_schema.source_table3:
    object: file://C:/Temp/my_jsonlines_folder/ # Windows Path format
    target_options:
      file_max_bytes: 4000000

  # all tables in schema, except "forbidden_table"
  my_schema.*:
    object: file:///tmp/{stream_schema}/{stream_table}/{YYYY}_{MM}_{DD}/
    target_options:
      file_max_rows: 400000 # will split files into folder
  my_schema.forbidden_table:
    disabled: true

env:
  SLING_THREADS: 3 # run streams concurrently
```

{% endcode %}

***

**Using** [**Python**](/examples/sling-python.md)

{% code title="replication.py" overflow="wrap" %}

```python
from sling import Replication, ReplicationStream, TargetOptions, Format
import os

# Set environment variables
os.environ['MY_SOURCE_DB'] = '...'

# Single JSON Lines file export
replication = Replication(
    source='MY_SOURCE_DB',
    target='LOCAL',
    streams={
        'source_schema.source_table': ReplicationStream(
            object='file:///tmp/my_file.jsonl',
            target_options=TargetOptions(format=Format.JSONLINES)
        )
    }
)

# Run the replication
replication.run()

# Multiple streams with target options
replication = Replication(
    source='MY_SOURCE_DB',
    target='LOCAL',
    defaults=TargetOptions(format=Format.JSONLINES),
    streams={
        'source_schema.source_table': ReplicationStream(
            object='file:///tmp/my_file.jsonl'
        ),
        'source_schema.source_table1': ReplicationStream(
            object='file:///tmp/my_jsonlines_folder/*.jsonl'
        ),
        'source_schema.source_table2': ReplicationStream(
            object='file:///tmp/my_jsonlines_folder/',
            target_options=TargetOptions(file_max_bytes=4000000)
        ),
        'source_schema.source_table3': ReplicationStream(
            object='file://C:/Temp/my_jsonlines_folder/',  # Windows Path format
            target_options=TargetOptions(file_max_bytes=4000000)
        )
    },
    env={'SLING_THREADS': '3'}  # run streams concurrently
)

replication.run()

# Schema wildcard with disabled stream
replication = Replication(
    source='MY_SOURCE_DB',
    target='LOCAL',
    defaults=TargetOptions(format=Format.JSONLINES),
    streams={
        'my_schema.*': ReplicationStream(
            object='file:///tmp/{stream_schema}/{stream_table}/{YYYY}_{MM}_{DD}/',
            target_options=TargetOptions(file_max_rows=400000)
        ),
        'my_schema.forbidden_table': ReplicationStream(
            disabled=True
        )
    },
    env={'SLING_THREADS': '3'}
)

replication.run()
```

{% endcode %}

</details>

<details>

<summary>Database ⇨ Local Storage (Parquet)</summary>

See also [Incremental Examples](/examples/database-to-file/incremental.md).

**Using** [**CLI Flags**](/sling-cli/run.md#cli-flags-overview)

{% code title="sling.sh" overflow="wrap" %}

```bash
$ sling run --src-conn MY_SOURCE_DB \
  --src-stream 'source_schema.source_table' \
  --tgt-object 'file:///tmp/my_file.parquet'

$ sling run --src-conn MY_SOURCE_DB \
  --src-stream 'source_schema.source_table' \
  --tgt-object 'file:///tmp/my_parquet_folder/*.parquet'

$ sling run --src-conn MY_SOURCE_DB \
  --src-stream 'source_schema.source_table' \
  --tgt-object 'file:///tmp/my_parquet_folder/' \
  --tgt-options '{file_max_rows: 4000000, format: parquet}'

# Windows Path format
$ sling run --src-conn MY_SOURCE_DB \
  --src-stream 'source_schema.source_table' \
  --tgt-object 'file://C:/Temp/my_parquet_folder/' \
  --tgt-options '{file_max_rows: 4000000, format: parquet}'
```

{% endcode %}

***

**Using** [**Replication**](/concepts/replication.md)

Running with Sling: `sling run -r /path/to/replication.yaml`

{% code title="replication.yaml" overflow="wrap" fullWidth="false" %}

```yaml
source: MY_SOURCE_DB
target: LOCAL

defaults:
  target_options:
    format: parquet

streams:
  source_schema.source_table:
    object: file://C:/Temp/my_file.parquet # Windows Path format

  source_schema.source_table1:
    object: file://C:/Temp/my_parquet_folder/*.parquet # Windows Path format

  source_schema.source_table2:
    object: file:///tmp/my_parquet_folder/
    target_options:
      file_max_rows: 1000000

  # all tables in schema, except "forbidden_table"
  my_schema.*:
    object: file:///tmp/{stream_schema}/{stream_table}/{YYYY}_{MM}_{DD}/
    target_options:
      file_max_rows: 400000 # will split files into folder
  my_schema.forbidden_table:
    disabled: true

env:
  SLING_THREADS: 3 # run streams concurrently
```

{% endcode %}

***

**Using** [**Python**](/examples/sling-python.md)

{% code title="replication.py" overflow="wrap" %}

```python
from sling import Replication, ReplicationStream, TargetOptions, Format
import os

# Set environment variables
os.environ['MY_SOURCE_DB'] = '...'

# Single Parquet file export
replication = Replication(
    source='MY_SOURCE_DB',
    target='LOCAL',
    streams={
        'source_schema.source_table': ReplicationStream(
            object='file:///tmp/my_file.parquet'
        )
    }
)

# Run the replication
replication.run()

# Multiple streams with target options
replication = Replication(
    source='MY_SOURCE_DB',
    target='LOCAL',
    defaults=TargetOptions(format=Format.PARQUET),
    streams={
        'source_schema.source_table': ReplicationStream(
            object='file://C:/Temp/my_file.parquet'  # Windows Path format
        ),
        'source_schema.source_table1': ReplicationStream(
            object='file://C:/Temp/my_parquet_folder/*.parquet'  # Windows Path format
        ),
        'source_schema.source_table2': ReplicationStream(
            object='file:///tmp/my_parquet_folder/',
            target_options=TargetOptions(file_max_rows=1000000)
        )
    },
    env={'SLING_THREADS': '3'}  # run streams concurrently
)

replication.run()

# Schema wildcard with disabled stream
replication = Replication(
    source='MY_SOURCE_DB',
    target='LOCAL',
    defaults=TargetOptions(format=Format.PARQUET),
    streams={
        'my_schema.*': ReplicationStream(
            object='file:///tmp/{stream_schema}/{stream_table}/{YYYY}_{MM}_{DD}/',
            target_options=TargetOptions(file_max_rows=400000)
        ),
        'my_schema.forbidden_table': ReplicationStream(
            disabled=True
        )
    },
    env={'SLING_THREADS': '3'}
)

replication.run()
```

{% endcode %}

</details>

<details>

<summary>Database ⇨ Local Storage (GeoJSON)</summary>

Export spatial data from PostgreSQL/PostGIS to GeoJSON format. Sling will convert geometry columns to RFC 7946 compliant GeoJSON. Available from *v1.5.2*.

**Using** [**CLI Flags**](/sling-cli/run.md#cli-flags-overview)

{% code title="sling.sh" overflow="wrap" %}

```bash
# Export with auto-detected geometry column (looks for column named "geometry")
$ sling run --src-conn MY_POSTGIS \
  --src-stream 'public.locations' \
  --tgt-object 'file:///tmp/locations.geojson' \
  --tgt-options '{format: geojson}'

# Specify which column contains geometry data
$ sling run --src-conn MY_POSTGIS \
  --src-stream 'public.parcels' \
  --tgt-object 'file:///tmp/parcels.geojson' \
  --tgt-options '{format: geojson, columns: {geom: geometry}}'
```

{% endcode %}

***

**Using** [**Replication**](/concepts/replication.md)

Running with Sling: `sling run -r /path/to/replication.yaml`

{% code title="replication.yaml" overflow="wrap" fullWidth="false" %}

```yaml
source: MY_POSTGIS
target: LOCAL

defaults:
  target_options:
    format: geojson

streams:
  # Auto-detect geometry column (must be named "geometry")
  public.locations:
    object: file:///tmp/locations.geojson

  # Specify geometry column explicitly
  public.parcels:
    sql: |
      -- Create test data inline using CTE
      WITH test_data AS (
        SELECT
          1 as id,
          'Point 1' as name,
          ST_GeomFromText('POINT(9.09425263416477 53.4920035631827)', 4326)::geometry as geom
        UNION ALL
        SELECT
          2 as id,
          'Point 2' as name,
          ST_GeomFromText('POINT(13.0532270916455 49.199065154883)', 4326)::geometry as geom
        UNION ALL
        SELECT
          3 as id,
          'Point 3' as name,
          ST_GeomFromText('POINT(7.81573202029895 52.6718611999912)', 4326)::geometry as geom
      )
      SELECT
        id,
        name,
        geom
      FROM test_data
    object: file:///tmp/parcels.geojson
    columns:
      geom: geometry  # designate 'geom' as the geometry column

  # Rename geometry column in output
  public.boundaries:
    object: file:///tmp/boundaries.geojson
    columns:
      shape: geometry  # 'shape' column will be used as geometry
```

{% endcode %}

{% hint style="info" %}
**Note:** GeoJSON format supports only one geometry column per stream, per RFC 7946. If your table has multiple geometry columns, specify which one to use via the `columns` configuration.
{% endhint %}

</details>

<details>

<summary>Database ⇨ Cloud Storage (CSV)</summary>

**Using** [**CLI Flags**](/sling-cli/run.md#cli-flags-overview)

{% code title="sling.sh" overflow="wrap" %}

```bash
$ sling run --src-conn MY_SOURCE_DB --src-stream 'source_schema.source_table' --tgt-conn MY_S3_BUCKET --tgt-object 's3://my-bucket/my_file.csv'

$ sling run --src-conn MY_SOURCE_DB --src-stream 'source_schema.source_table' --tgt-conn MY_S3_BUCKET --tgt-object 's3://my-bucket/my_csv_folder/*.csv'

$ sling run --src-conn MY_SOURCE_DB --src-stream 'source_schema.source_table' --tgt-conn MY_S3_BUCKET --tgt-object 's3://my-bucket/my_csv_folder/' --tgt-options '{file_max_rows: 100000, format: csv}'

$ sling run --src-conn MY_SOURCE_DB --src-stream 'source_schema.source_table' --tgt-conn MY_GS_BUCKET --tgt-object 'gs://my-bucket/my_file.csv'

$ sling run --src-conn MY_SOURCE_DB --src-stream 'source_schema.source_table' --tgt-conn MY_GS_BUCKET --tgt-object 'gs://my-bucket/my_csv_folder/*.csv'

$ sling run --src-conn MY_SOURCE_DB --src-stream 'source_schema.source_table' --tgt-conn MY_GS_BUCKET --tgt-object 'gs://my-bucket/my_csv_folder/' --tgt-options '{file_max_rows: 100000, format: csv}'

$ sling run --src-conn MY_SOURCE_DB --src-stream 'source_schema.source_table' --tgt-conn MY_AZURE_CONT --tgt-object 'https://my_account.blob.core.windows.net/my-container/my_file.csv'

$ sling run --src-conn MY_SOURCE_DB --src-stream 'source_schema.source_table' --tgt-conn MY_AZURE_CONT --tgt-object 'https://my_account.blob.core.windows.net/my-container/my_csv_folder/*.csv'

$ sling run --src-conn MY_SOURCE_DB --src-stream 'source_schema.source_table' --tgt-conn MY_AZURE_CONT --tgt-object 'https://my_account.blob.core.windows.net/my-container/my_csv_folder/' --tgt-options '{file_max_rows: 100000, format: csv}'
```

{% endcode %}

***

**Using** [**Replication**](/concepts/replication.md)

Running with Sling: `sling run -r /path/to/replication.yaml`

{% code title="replication.yaml" overflow="wrap" fullWidth="false" %}

```yaml
source: MY_SOURCE_DB
target: MY_CLOUD_STORAGE

defaults:
  object: {stream_schema}/{stream_table}/{YYYY}_{MM}_{DD}.csv.gz
  target_options:
    format: csv
    compression: gzip

streams:

  # all tables in schema
  my_schema.*:
    object: {stream_schema}/{stream_table}/{YYYY}_{MM}_{DD}/*.csv
    target_options:
      file_max_rows: 400000 # will split files into folder

  other_schema.source_table: # will use defaults

env:
  SLING_THREADS: 3 # run streams concurrently
```

{% endcode %}

***

**Using** [**Python**](/examples/sling-python.md)

{% code title="replication.py" overflow="wrap" %}

```python
from sling import Replication, ReplicationStream, TargetOptions, Format, Compression
import os

# Set environment variables
os.environ['MY_SOURCE_DB'] = '...'
os.environ['MY_CLOUD_STORAGE'] = '...'

# Cloud storage export with defaults
replication = Replication(
    source='MY_SOURCE_DB',
    target='MY_CLOUD_STORAGE',
    defaults={
        'object': '{stream_schema}/{stream_table}/{YYYY}_{MM}_{DD}.csv.gz',
        'target_options': TargetOptions(
            format=Format.CSV,
            compression=Compression.GZIP
        )
    },
    streams={
        # all tables in schema
        'my_schema.*': ReplicationStream(
            object='{stream_schema}/{stream_table}/{YYYY}_{MM}_{DD}/*.csv',
            target_options=TargetOptions(file_max_rows=400000)
        ),
        'other_schema.source_table': {}  # will use defaults
    },
    env={'SLING_THREADS': '3'}  # run streams concurrently
)

replication.run()
```

{% endcode %}

</details>

<details>

<summary>Database ⇨ Cloud Storage (JSON)</summary>

**Using** [**CLI Flags**](/sling-cli/run.md#cli-flags-overview)

{% code title="sling.sh" overflow="wrap" %}

```bash
$ sling run --src-conn MY_SOURCE_DB --src-stream 'source_schema.source_table' --tgt-conn MY_S3_BUCKET --tgt-object 's3://my-bucket/my_file.json'

$ sling run --src-conn MY_SOURCE_DB --src-stream 'source_schema.source_table' --tgt-conn MY_S3_BUCKET --tgt-object 's3://my-bucket/my_json_folder/*.json'

$ sling run --src-conn MY_SOURCE_DB --src-stream 'source_schema.source_table' --tgt-conn MY_S3_BUCKET --tgt-object 's3://my-bucket/my_json_folder/' --tgt-options '{file_max_rows: 100000, format: json}'

$ sling run --src-conn MY_SOURCE_DB --src-stream 'source_schema.source_table' --tgt-conn MY_GS_BUCKET --tgt-object 'gs://my-bucket/my_file.json'

$ sling run --src-conn MY_SOURCE_DB --src-stream 'source_schema.source_table' --tgt-conn MY_GS_BUCKET --tgt-object 'gs://my-bucket/my_json_folder/*.json'

$ sling run --src-conn MY_SOURCE_DB --src-stream 'source_schema.source_table' --tgt-conn MY_GS_BUCKET --tgt-object 'gs://my-bucket/my_json_folder/' --tgt-options '{file_max_rows: 100000, format: json}'

$ sling run --src-conn MY_SOURCE_DB --src-stream 'source_schema.source_table' --tgt-conn MY_AZURE_CONT --tgt-object 'https://my_account.blob.core.windows.net/my-container/my_file.json'

$ sling run --src-conn MY_SOURCE_DB --src-stream 'source_schema.source_table' --tgt-conn MY_AZURE_CONT --tgt-object 'https://my_account.blob.core.windows.net/my-container/my_json_folder/*.json'

$ sling run --src-conn MY_SOURCE_DB --src-stream 'source_schema.source_table' --tgt-conn MY_AZURE_CONT --tgt-object 'https://my_account.blob.core.windows.net/my-container/my_json_folder/' --tgt-options '{file_max_rows: 100000, format: json}'
```

{% endcode %}

***

**Using** [**Replication**](/concepts/replication.md)

Running with Sling: `sling run -r /path/to/replication.yaml`

{% code title="replication.yaml" overflow="wrap" fullWidth="false" %}

```yaml
source: MY_SOURCE_DB
target: MY_CLOUD_STORAGE

defaults:
  object: {stream_schema}/{stream_table}/{YYYY}_{MM}_{DD}.json.gz
  target_options:
    format: json
    compression: gzip

streams:

  # all tables in schema
  my_schema.*:
    object: {stream_schema}/{stream_table}/{YYYY}_{MM}_{DD}/*.json
    target_options:
      file_max_rows: 400000 # will split files into folder

  other_schema.source_table: # will use defaults
```

{% endcode %}

***

**Using** [**Python**](/examples/sling-python.md)

{% code title="replication.py" overflow="wrap" %}

```python
from sling import Replication, ReplicationStream, TargetOptions, Format, Compression
import os

# Set environment variables
os.environ['MY_SOURCE_DB'] = '...'
os.environ['MY_CLOUD_STORAGE'] = '...'

# Cloud storage JSON export with defaults
replication = Replication(
    source='MY_SOURCE_DB',
    target='MY_CLOUD_STORAGE',
    defaults={
        'object': '{stream_schema}/{stream_table}/{YYYY}_{MM}_{DD}.json.gz',
        'target_options': TargetOptions(
            format=Format.JSON,
            compression=Compression.GZIP
        )
    },
    streams={
        # all tables in schema
        'my_schema.*': ReplicationStream(
            object='{stream_schema}/{stream_table}/{YYYY}_{MM}_{DD}/*.json',
            target_options=TargetOptions(file_max_rows=400000)
        ),
        'other_schema.source_table': {}  # will use defaults
    }
)

replication.run()
```

{% endcode %}

</details>

<details>

<summary>Database ⇨ Cloud Storage (JSON Lines)</summary>

**Using** [**CLI Flags**](/sling-cli/run.md#cli-flags-overview)

{% code title="sling.sh" overflow="wrap" %}

```bash
$ sling run --src-conn MY_SOURCE_DB --src-stream 'source_schema.source_table' --tgt-conn MY_S3_BUCKET --tgt-object 's3://my-bucket/my_file.jsonl'

$ sling run --src-conn MY_SOURCE_DB --src-stream 'source_schema.source_table' --tgt-conn MY_S3_BUCKET --tgt-object 's3://my-bucket/my_json_folder/*.jsonl'

$ sling run --src-conn MY_SOURCE_DB --src-stream 'source_schema.source_table' --tgt-conn MY_S3_BUCKET --tgt-object 's3://my-bucket/my_json_folder/' --tgt-options '{file_max_rows: 100000, format: jsonlines}'

$ sling run --src-conn MY_SOURCE_DB --src-stream 'source_schema.source_table' --tgt-conn MY_GS_BUCKET --tgt-object 'gs://my-bucket/my_file.jsonl'

$ sling run --src-conn MY_SOURCE_DB --src-stream 'source_schema.source_table' --tgt-conn MY_GS_BUCKET --tgt-object 'gs://my-bucket/my_json_folder/*.jsonl'

$ sling run --src-conn MY_SOURCE_DB --src-stream 'source_schema.source_table' --tgt-conn MY_GS_BUCKET --tgt-object 'gs://my-bucket/my_json_folder/' --tgt-options '{file_max_rows: 100000, format: jsonlines}'

$ sling run --src-conn MY_SOURCE_DB --src-stream 'source_schema.source_table' --tgt-conn MY_AZURE_CONT --tgt-object 'https://my_account.blob.core.windows.net/my-container/my_file.jsonl'

$ sling run --src-conn MY_SOURCE_DB --src-stream 'source_schema.source_table' --tgt-conn MY_AZURE_CONT --tgt-object 'https://my_account.blob.core.windows.net/my-container/my_json_folder/*.jsonl'

$ sling run --src-conn MY_SOURCE_DB --src-stream 'source_schema.source_table' --tgt-conn MY_AZURE_CONT --tgt-object 'https://my_account.blob.core.windows.net/my-container/my_json_folder/' --tgt-options '{file_max_rows: 100000, format: jsonlines}'
```

{% endcode %}

***

**Using** [**Replication**](/concepts/replication.md)

Running with Sling: `sling run -r /path/to/replication.yaml`

{% code title="replication.yaml" overflow="wrap" fullWidth="false" %}

```yaml
source: MY_SOURCE_DB
target: MY_CLOUD_STORAGE

defaults:
  object: {stream_schema}/{stream_table}/{YYYY}_{MM}_{DD}.jsonl.gz
  target_options:
    format: jsonlines
    compression: gzip

streams:

  # all tables in schema, except "forbidden_table"
  my_schema.*:
    object: {stream_schema}/{stream_table}/{YYYY}_{MM}_{DD}/*.jsonl
    target_options:
      file_max_rows: 400000 # will split files into folder

  my_schema.forbidden_table:
    disabled: true

  other_schema.source_table: # will use defaults

env:
  SLING_THREADS: 3 # run streams concurrently
```

{% endcode %}

***

**Using** [**Python**](/examples/sling-python.md)

{% code title="replication.py" overflow="wrap" %}

```python
from sling import Replication, ReplicationStream, TargetOptions, Format, Compression
import os

# Set environment variables
os.environ['MY_SOURCE_DB'] = '...'
os.environ['MY_CLOUD_STORAGE'] = '...'

# Cloud storage JSON Lines export with defaults
replication = Replication(
    source='MY_SOURCE_DB',
    target='MY_CLOUD_STORAGE',
    defaults={
        'object': '{stream_schema}/{stream_table}/{YYYY}_{MM}_{DD}.jsonl.gz',
        'target_options': TargetOptions(
            format=Format.JSONLINES,
            compression=Compression.GZIP
        )
    },
    streams={
        # all tables in schema, except "forbidden_table"
        'my_schema.*': ReplicationStream(
            object='{stream_schema}/{stream_table}/{YYYY}_{MM}_{DD}/*.jsonl',
            target_options=TargetOptions(file_max_rows=400000)
        ),
        'my_schema.forbidden_table': ReplicationStream(
            disabled=True
        ),
        'other_schema.source_table': {}  # will use defaults
    },
    env={'SLING_THREADS': '3'}  # run streams concurrently
)

replication.run()
```

{% endcode %}

</details>

<details>

<summary>Database ⇨ Cloud Storage (Parquet)</summary>

See also [Incremental Examples](/examples/database-to-file/incremental.md).

**Using** [**CLI Flags**](/sling-cli/run.md#cli-flags-overview)

{% code title="sling.sh" overflow="wrap" %}

```bash
$ sling run --src-conn MY_SOURCE_DB --src-stream 'source_schema.source_table' --tgt-conn MY_S3_BUCKET --tgt-object 's3://my-bucket/my_file.parquet'

$ sling run --src-conn MY_SOURCE_DB --src-stream 'source_schema.source_table' --tgt-conn MY_S3_BUCKET --tgt-object 's3://my-bucket/my_parquet_folder/*.parquet'

$ sling run --src-conn MY_SOURCE_DB --src-stream 'source_schema.source_table' --tgt-conn MY_S3_BUCKET --tgt-object 's3://my-bucket/my_parquet_folder/' --tgt-options '{file_max_rows: 100000, format: parquet}'

$ sling run --src-conn MY_SOURCE_DB --src-stream 'source_schema.source_table' --tgt-conn MY_GS_BUCKET --tgt-object 'gs://my-bucket/my_file.parquet'

$ sling run --src-conn MY_SOURCE_DB --src-stream 'source_schema.source_table' --tgt-conn MY_GS_BUCKET --tgt-object 'gs://my-bucket/my_parquet_folder/*.parquet'

$ sling run --src-conn MY_SOURCE_DB --src-stream 'source_schema.source_table' --tgt-conn MY_GS_BUCKET --tgt-object 'gs://my-bucket/my_parquet_folder/' --tgt-options '{file_max_rows: 100000, format: parquet}'

$ sling run --src-conn MY_SOURCE_DB --src-stream 'source_schema.source_table' --tgt-conn MY_AZURE_CONT --tgt-object 'https://my_account.blob.core.windows.net/my-container/my_file.parquet'

$ sling run --src-conn MY_SOURCE_DB --src-stream 'source_schema.source_table' --tgt-conn MY_AZURE_CONT --tgt-object 'https://my_account.blob.core.windows.net/my-container/my_parquet_folder/*.parquet'

$ sling run --src-conn MY_SOURCE_DB --src-stream 'source_schema.source_table' --tgt-conn MY_AZURE_CONT --tgt-object 'https://my_account.blob.core.windows.net/my-container/my_parquet_folder/' --tgt-options '{file_max_rows: 100000, format: parquet}'
```

{% endcode %}

***

**Using** [**Replication**](/concepts/replication.md)

Running with Sling: `sling run -r /path/to/replication.yaml`

{% code title="replication.yaml" overflow="wrap" fullWidth="false" %}

```yaml
source: MY_SOURCE_DB
target: MY_CLOUD_STORAGE

defaults:
  object: {stream_schema}/{stream_table}/{YYYY}_{MM}_{DD}.parquet
  target_options:
    format: parquet

streams:

  # all tables in schema, except "forbidden_table"
  my_schema.*:
    object: {stream_schema}/{stream_table}/{YYYY}_{MM}_{DD}/*.parquet
    target_options:
      file_max_rows: 400000 # will split files into folder
  my_schema.forbidden_table:
    disabled: true

  other_schema.source_table: # will use defaults

env:
  SLING_THREADS: 3 # run streams concurrently
```

{% endcode %}

***

**Using** [**Python**](/examples/sling-python.md)

{% code title="replication.py" overflow="wrap" %}

```python
from sling import Replication, ReplicationStream, TargetOptions, Format
import os

# Set environment variables
os.environ['MY_SOURCE_DB'] = '...'
os.environ['MY_CLOUD_STORAGE'] = '...'

# Cloud storage Parquet export with defaults
replication = Replication(
    source='MY_SOURCE_DB',
    target='MY_CLOUD_STORAGE',
    defaults={
        'object': '{stream_schema}/{stream_table}/{YYYY}_{MM}_{DD}.parquet',
        'target_options': TargetOptions(format=Format.PARQUET)
    },
    streams={
        # all tables in schema, except "forbidden_table"
        'my_schema.*': ReplicationStream(
            object='{stream_schema}/{stream_table}/{YYYY}_{MM}_{DD}/*.parquet',
            target_options=TargetOptions(file_max_rows=400000)
        ),
        'my_schema.forbidden_table': ReplicationStream(
            disabled=True
        ),
        'other_schema.source_table': {}  # will use defaults
    },
    env={'SLING_THREADS': '3'}  # run streams concurrently
)

replication.run()
```

{% endcode %}

</details>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.slingdata.io/examples/database-to-file.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
