Databricks

Connect & Ingest data from / to a Databricks database

Setup

The following credentials keys are accepted:

  • host (required) -> The hostname of the Databricks workspace (e.g., dbc-a1b2c3d4-e5f6.cloud.databricks.com)

  • token (required) -> The personal access token or password to access the instance

  • warehouse_id (required) -> The SQL warehouse ID to connect to

  • http_path (optional) -> The HTTP path for the connection (if not using warehouse_id)

  • catalog (optional) -> The initial catalog name to use in the session (default: hive_metastore)

  • schema (optional) -> The initial schema name to use in the session (default: default)

  • port (optional) -> The port number (default: 443)

  • max_rows (optional) -> Maximum number of rows fetched per request (default: 10000)

  • timeout (optional) -> Timeout in seconds for server query execution (no timeout by default)

  • user_agent_entry (optional) -> Used to identify partners

  • ansi_mode (optional) -> Boolean for ANSI SQL specification adherence (default: false)

  • timezone (optional) -> Timezone setting (default: UTC)

Using sling conns

Here are examples of setting a connection named DATABRICKS. We must provide the type=databricks property:

# Basic connection with warehouse
$ sling conns set DATABRICKS type=databricks host=<workspace-hostname> token=<access-token> warehouse_id=<warehouse-id>

# Connection with custom HTTP path
$ sling conns set DATABRICKS type=databricks host=<workspace-hostname> token=<access-token> http_path=<http-path>

# With catalog and schema
$ sling conns set DATABRICKS type=databricks host=<workspace-hostname> token=<access-token> warehouse_id=<warehouse-id> catalog=<catalog> schema=<schema>

# Or use url
$ sling conns set DATABRICKS url="databricks://token:<access-token>@<workspace-hostname>:443/sql/1.0/warehouses/<warehouse-id>?schema=<schema>"

Environment Variable

export DATABRICKS='databricks://token:<access-token>@<workspace-hostname>:443/sql/1.0/warehouses/<warehouse-id>?schema=<schema>'

# use JSON format
export DATABRICKS_CONN='{ "type": "databricks", "host": "<workspace-hostname>", "token": "<access-token>", "warehouse_id": "<warehouse-id>", "schema": "<schema>" }'

# use YAML format (with new lines)
export DATABRICKS='
type: databricks
host: <workspace-hostname>
token: <access-token>
warehouse_id: <warehouse-id>
schema: <schema>
'

Sling Env File YAML

See here to learn more about the sling env.yaml file.

connections:
  DATABRICKS:
    type: databricks
    host: <workspace-hostname>
    token: <access-token>
    warehouse_id: <warehouse-id>
    schema: <schema>

  DATABRICKS_URL:
    url: "databricks://token:<access-token>@<workspace-hostname>:443/sql/1.0/warehouses/<warehouse-id>?catalog=<catalog>&schema=<schema>"\

If you are facing issues connecting, please reach out to us at [email protected], on discord or open a Github Issue here.

Last updated