Databricks

Connect & Ingest data from / to a Databricks database

Setup

The following credentials keys are accepted:

  • host (required) -> The hostname of the Databricks workspace (e.g., dbc-a1b2c3d4-e5f6.cloud.databricks.com)

  • token (required) -> The personal access token or password to access the instance

  • warehouse_id (required) -> The SQL warehouse ID to connect to

  • http_path (optional) -> The HTTP path for the connection (if not using warehouse_id)

  • catalog (optional) -> The initial catalog name to use in the session (default: hive_metastore)

  • schema (optional) -> The initial schema name to use in the session (default: default)

  • port (optional) -> The port number (default: 443)

  • max_rows (optional) -> Maximum number of rows fetched per request (default: 10000)

  • internal_volume (optional) -> Specifies a custom internal volume to use for bulk operations. If not provided, Sling will attempt to create a volume in the default schema named SLING_SCHEMA.SLING_STAGING.

  • timeout (optional) -> Timeout in seconds for server query execution (no timeout by default)

  • user_agent_entry (optional) -> Used to identify partners

  • ansi_mode (optional) -> Boolean for ANSI SQL specification adherence (default: false)

  • timezone (optional) -> Timezone setting (default: UTC)

Using sling conns

Here are examples of setting a connection named DATABRICKS. We must provide the type=databricks property:

Environment Variable

Sling Env File YAML

See here to learn more about the sling env.yaml file.

If you are facing issues connecting, please reach out to us at [email protected], on discord or open a Github Issue here.

Last updated

Was this helpful?