Databricks

Connect & Ingest data from / to a Databricks database

Setup

The following credentials keys are accepted:

  • host (required) -> The hostname of the Databricks workspace (e.g., dbc-a1b2c3d4-e5f6.cloud.databricks.com)

  • token (required) -> The personal access token or password to access the instance

  • warehouse_id (required) -> The SQL warehouse ID to connect to

  • http_path (optional) -> The HTTP path for the connection (if not using warehouse_id)

  • catalog (optional) -> The initial catalog name to use in the session (default: hive_metastore)

  • schema (optional) -> The initial schema name to use in the session (default: default)

  • port (optional) -> The port number (default: 443)

  • max_rows (optional) -> Maximum number of rows fetched per request (default: 10000)

  • internal_volume (optional) -> Specifies a custom internal volume to use for bulk operations. If not provided, Sling will attempt to create a volume in the default schema named SLING_SCHEMA.SLING_STAGING.

  • timeout (optional) -> Timeout in seconds for server query execution (no timeout by default)

  • user_agent_entry (optional) -> Used to identify partners

  • ansi_mode (optional) -> Boolean for ANSI SQL specification adherence (default: false)

  • timezone (optional) -> Timezone setting (default: UTC)

Using sling conns

Here are examples of setting a connection named DATABRICKS. We must provide the type=databricks property:

Environment Variable

See here to learn more about the .env.sling file.

Sling Env File YAML

See here to learn more about the sling env.yaml file.

If you are facing issues connecting, please reach out to us at [email protected]envelope, on discordarrow-up-right or open a Github Issue herearrow-up-right.

Last updated

Was this helpful?