Databricks

Connect & Ingest data from / to a Databricks database

Setup

The following credentials keys are accepted:

host (required) -> The hostname of the Databricks workspace (e.g., dbc-a1b2c3d4-e5f6.cloud.databricks.com)
token (required) -> The personal access token or password to access the instance
warehouse_id (required) -> The SQL warehouse ID to connect to
http_path (optional) -> The HTTP path for the connection (if not using warehouse_id)
catalog (optional) -> The initial catalog name to use in the session (default: hive_metastore)
schema (optional) -> The initial schema name to use in the session (default: default)
port (optional) -> The port number (default: 443)
max_rows (optional) -> Maximum number of rows fetched per request (default: 10000)
internal_volume (optional) -> Specifies a custom internal volume to use for bulk operations. If not provided, Sling will attempt to create a volume in the default schema named SLING_SCHEMA.SLING_STAGING.
timeout (optional) -> Timeout in seconds for server query execution (no timeout by default)
user_agent_entry (optional) -> Used to identify partners
ansi_mode (optional) -> Boolean for ANSI SQL specification adherence (default: false)
timezone (optional) -> Timezone setting (default: UTC)

Using `sling conns`

Here are examples of setting a connection named DATABRICKS. We must provide the type=databricks property:

# Basic connection with warehouse
$ sling conns set DATABRICKS type=databricks host=<workspace-hostname> token=<access-token> warehouse_id=<warehouse-id>

# Connection with custom HTTP path
$ sling conns set DATABRICKS type=databricks host=<workspace-hostname> token=<access-token> http_path=<http-path>

# With catalog and schema
$ sling conns set DATABRICKS type=databricks host=<workspace-hostname> token=<access-token> warehouse_id=<warehouse-id> catalog=<catalog> schema=<schema>

# Or use url
$ sling conns set DATABRICKS url="databricks://token:<access-token>@<workspace-hostname>:443/sql/1.0/warehouses/<warehouse-id>?schema=<schema>"

Environment Variable

export DATABRICKS='databricks://token:<access-token>@<workspace-hostname>:443/sql/1.0/warehouses/<warehouse-id>?schema=<schema>'

# use JSON format
export DATABRICKS_CONN='{ "type": "databricks", "host": "<workspace-hostname>", "token": "<access-token>", "warehouse_id": "<warehouse-id>", "schema": "<schema>" }'

# use YAML format (with new lines)
export DATABRICKS='
type: databricks
host: <workspace-hostname>
token: <access-token>
warehouse_id: <warehouse-id>
schema: <schema>
'

Sling Env File YAML

See here to learn more about the sling env.yaml file.

connections:
  DATABRICKS:
    type: databricks
    host: <workspace-hostname>
    token: <access-token>
    warehouse_id: <warehouse-id>
    schema: <schema>

  DATABRICKS_URL:
    url: "databricks://token:<access-token>@<workspace-hostname>:443/sql/1.0/warehouses/<warehouse-id>?catalog=<catalog>&schema=<schema>"\

If you are facing issues connecting, please reach out to us at [email protected], on discord or open a Github Issue here.

PreviousClickhouse NextDuckDB

Last updated 3 months ago

Was this helpful?

Setup

Using sling conns

Environment Variable

Sling Env File YAML

Using `sling conns`