DuckLake
Connect & Ingest data from / to a DuckLake database
DuckLake is a data lake format specification that combines the power of DuckDB with flexible catalog backends and scalable data storage. It provides versioned, ACID-compliant tables with support for multiple catalog databases and various storage backends. See https://ducklake.select/ for more details.
Setup
The following credentials keys are accepted:
catalog_type(required) -> The catalog database type:duckdb,sqlite,postgres, ormysql. Default isduckdb.catalog_conn_string(required) -> The connection string for the catalog database. See here for details. Examples:postgres:
dbname=ducklake_catalog host=localhostsqlite:
metadata.sqliteduckdb:
metadata.ducklakemysql:
db=ducklake_catalog host=localhost
catalog_schema(optional) -> The schema to use to store catalog tables (e.g.public).data_path(optional) -> Path to data files (local, S3, Azure, GCS). e.g./local/path,s3://bucket/data,r2://bucket/data,az://container/data,gs://bucket/data.data_inlining_limit(optional) -> The row limit to use forDATA_INLINING_ROW_LIMIT.copy_method(optional v1.4.20+) -> the method data is copied from sling into DuckDB. Acceptable values:csv_http,arrow_http. Default isarrow_http. If facing issues with arrow, try settingcsv_http.schema(optional) -> The default schema to use to read/write data. Default ismain.url_style(optional) -> specifypathto use path url style (e.g. when using MinIO)use_ssl(optional) -> specifyfalseto not use HTTPS (e.g. when using MinIO)duckdb_version(optional) -> The CLI version of DuckDB to use. You can also specify the env. variableDUCKDB_VERSION.max_buffer_size(optional v1.5) -> the max buffer size to use when piping data from DuckDB CLI. Specify this if you have extremely large one-line text values in your dataset. Default is10485760(10MB).
Storage Configuration
For S3/S3-compatible storage:
s3_access_key_id(optional) -> AWS access key IDs3_secret_access_key(optional) -> AWS secret access keys3_session_token(optional) -> AWS session tokens3_region(optional) -> AWS regions3_profile(optional) -> AWS profile to uses3_endpoint(optional) -> S3-compatible endpoint URL (e.g.localhost:9000for MinIO)
For Azure Blob Storage:
azure_account_name(optional) -> Azure storage account nameazure_account_key(optional) -> Azure storage account keyazure_sas_token(optional) -> Azure SAS tokenazure_tenant_id(optional) -> Azure tenant IDazure_client_id(optional) -> Azure client IDazure_client_secret(optional) -> Azure client secretazure_connection_string(optional) -> Azure connection string
For Google Cloud Storage:
gcs_key_file(optional) -> Path to GCS service account key filegcs_project_id(optional) -> GCS project IDgcs_access_key_id(optional) -> GCS HMAC access key IDgcs_secret_access_key(optional) -> GCS HMAC secret access key
Using sling conns
sling connsHere are examples of setting a connection named DUCKLAKE. We must provide the type=ducklake property:
Environment Variable
Sling Env File YAML
See here to learn more about the sling env.yaml file.
Common Usage Examples
Basic Operations
Data Import/Export
If you are facing issues connecting, please reach out to us at [email protected], on discord or open a Github Issue here.
Last updated
Was this helpful?