Sling
Slingdata.ioBlogGithubHelp!
  • Introduction
  • Sling CLI
    • Installation
    • Environment
    • Running Sling
    • Global Variables
    • CLI Pro
  • Sling Platform
    • Sling Platform
      • Architecture
      • Agents
      • Connections
      • Editor
      • API
      • Deploy from CLI
  • Concepts
    • Replications
      • Structure
      • Modes
      • Source Options
      • Target Options
      • Columns
      • Transforms
      • Runtime Variables
      • Tags & Wildcards
    • Hooks / Steps
      • Check
      • Command
      • Copy
      • Delete
      • Group
      • Http
      • Inspect
      • List
      • Log
      • Query
      • Replication
      • Store
      • Read
      • Write
    • Pipelines
    • Data Quality
      • Constraints
  • Examples
    • File to Database
      • Custom SQL
      • Incremental
    • Database to Database
      • Custom SQL
      • Incremental
      • Backfill
    • Database to File
      • Incremental
    • Sling + Python 🚀
  • Connections
    • Database Connections
      • Athena
      • BigTable
      • BigQuery
      • Cloudflare D1
      • Clickhouse
      • Databricks
      • DuckDB
      • DuckLake
      • Iceberg
      • MotherDuck
      • MariaDB
      • MongoDB
      • Elasticsearch
      • MySQL
      • Oracle
      • Postgres
      • Prometheus
      • Proton
      • Redshift
      • S3 Tables
      • StarRocks
      • SQLite
      • SQL Server
      • Snowflake
      • Trino
    • Storage Connections
      • AWS S3
      • Azure Storage
      • Backblaze B2
      • Cloudflare R2
      • DigitalOcean Spaces
      • FTP
      • Google Drive
      • Google Storage
      • Local Storage
      • Min.IO
      • SFTP
      • Wasabi
Powered by GitBook
On this page
  • Setup
  • Storage Configuration
  • Using sling conns
  • Environment Variable
  • Sling Env File YAML
  • Common Usage Examples
  • Basic Operations
  • Data Import/Export
  1. Connections
  2. Database Connections

DuckLake

Connect & Ingest data from / to a DuckLake database

PreviousDuckDBNextIceberg

Last updated 4 days ago

DuckLake is a data lake format specification that combines the power of DuckDB with flexible catalog backends and scalable data storage. It provides versioned, ACID-compliant tables with support for multiple catalog databases and various storage backends. See for more details.

Setup

The following credentials keys are accepted:

  • catalog_type (required) -> The catalog database type: duckdb, sqlite, postgres, or mysql. Default is duckdb.

  • catalog_conn_string (required) -> The connection string for the catalog database. See for details. Examples:

    • postgres: dbname=ducklake_catalog host=localhost

    • sqlite: metadata.sqlite

    • mysql: db=ducklake_catalog host=localhost

  • catalog_schema (optional) -> The schema to use to store catalog tables (e.g. public).

  • data_path (optional) -> Path to data files (local, S3, Azure, GCS). e.g. /local/path, s3://bucket/data, r2://bucket/data, az://container/data, gs://bucket/data.

  • schema (optional) -> The default schema to use to read/write data. Default is main.

Storage Configuration

For S3/S3-compatible storage:

  • s3_access_key_id (optional) -> AWS access key ID

  • s3_secret_access_key (optional) -> AWS secret access key

  • s3_session_token (optional) -> AWS session token

  • s3_region (optional) -> AWS region

  • s3_profile (optional) -> AWS profile to use

  • s3_endpoint (optional) -> S3-compatible endpoint URL (e.g. http://localhost:9000 for MinIO)

For Azure Blob Storage:

  • azure_account_name (optional) -> Azure storage account name

  • azure_account_key (optional) -> Azure storage account key

  • azure_sas_token (optional) -> Azure SAS token

  • azure_tenant_id (optional) -> Azure tenant ID

  • azure_client_id (optional) -> Azure client ID

  • azure_client_secret (optional) -> Azure client secret

  • azure_connection_string (optional) -> Azure connection string

For Google Cloud Storage:

  • gcs_key_file (optional) -> Path to GCS service account key file

  • gcs_project_id (optional) -> GCS project ID

  • gcs_access_key_id (optional) -> GCS HMAC access key ID

  • gcs_secret_access_key (optional) -> GCS HMAC secret access key

Using sling conns

Here are examples of setting a connection named DUCKLAKE. We must provide the type=ducklake property:

# Default DuckDB catalog with local data
$ sling conns set DUCKLAKE type=ducklake catalog_conn_string=metadata.ducklake

# DuckDB catalog with S3 data storage
$ sling conns set DUCKLAKE type=ducklake catalog_conn_string=metadata.ducklake data_path=data

# SQLite catalog for multi-client access
$ sling conns set DUCKLAKE type=ducklake catalog_type=sqlite catalog_conn_string=metadata.sqlite

# PostgreSQL catalog for multi-user access
$ sling conns set DUCKLAKE type=ducklake catalog_type=postgres catalog_conn_string="host=localhost port=5432 user=myuser password=mypass dbname=ducklake_catalog"

# MySQL catalog with Azure storage
$ sling conns set DUCKLAKE type=ducklake catalog_type=mysql catalog_conn_string="host=localhost user=root password=pass db=ducklake_catalog" data_path=data

Environment Variable


# configuration with s3 properties
export DUCKLAKE='{ 
  type: ducklake, 
  catalog_type: postgres,
  catalog_conn_string: "host=db.example.com port=5432 user=ducklake password=secret dbname=ducklake_catalog",
  data_path: "s3://bucket/path/to/data/folder/",
  s3_access_key_id: "AKIAIOSFODNN7EXAMPLE",
  s3_secret_access_key: "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
  s3_region: "us-east-1"
}'

# configuration with r2 properties
export DUCKLAKE='{ 
  type: ducklake, 
  catalog_type: postgres,
  catalog_conn_string: "host=db.example.com port=5432 user=ducklake password=secret dbname=ducklake_catalog",
  data_path: "r2://bucket/path/to/data/folder/",
  s3_access_key_id: "AKIAIOSFODNN7EXAMPLE",
  s3_secret_access_key: "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
  s3_endpoint: "<account-id>.r2.cloudflarestorage.com"
}'

# Windows PowerShell
$env:DUCKLAKE='{ 
  type: ducklake, 
  catalog_type: postgres,
  catalog_conn_string: "host=db.example.com port=5432 user=ducklake password=secret dbname=ducklake_catalog",
  data_path: "az://my_container/path/to/data/folder/",
  azure_account_name: "<azure_account_name>",
  azure_account_key: "<s3_secret_access_key>",
}'

Sling Env File YAML

connections:
  DUCKLAKE:
    type: ducklake
    catalog_type: <catalog_type>
    catalog_conn_string: <catalog_conn_string>
    data_path: <data_path>
    database: <database>
    schema: <schema>
    
    # S3 Configuration (if using S3 storage)
    s3_access_key_id: <s3_access_key_id>
    s3_secret_access_key: <s3_secret_access_key>
    s3_region: <s3_region>
    s3_endpoint: http://localhost:9000
    
    # Azure Configuration (if using Azure storage)
    azure_account_name: <azure_account_name>
    azure_account_key: <azure_account_key>
    
    # GCS Configuration (if using GCS storage)
    gcs_key_file: <gcs_key_file>
    gcs_project_id: <gcs_project_id>

Common Usage Examples

Basic Operations

# List tables
sling conns discover DUCKLAKE

# Query data
sling run --src-conn DUCKLAKE --src-stream "SELECT * FROM sales.orders LIMIT 10" --stdout

# Export to CSV
sling run --src-conn DUCKLAKE --src-stream sales.orders --tgt-object file://./orders.csv

Data Import/Export

# Import CSV to DuckLake
sling run --src-stream file://./data.csv --tgt-conn DUCKLAKE --tgt-object sales.new_data

# Import from PostgreSQL
sling run --src-conn POSTGRES_DB --src-stream public.customers --tgt-conn DUCKLAKE --tgt-object sales.customers

# Export to Parquet files
sling run --src-conn DUCKLAKE --src-stream sales.orders --tgt-conn AWS_S3 --tgt-object s3://bucket/exports/orders.parquet

See to learn more about the sling env.yaml file.

If you are facing issues connecting, please reach out to us at , on or open a Github Issue .

https://ducklake.select/
here
here
support@slingdata.io
discord
here