BigQuery

Connect & Ingest data from / to a BigQuery database

Setup

The following credentials keys are accepted:

  • project (required) -> The GCP project ID for the project

  • dataset (required) -> The default dataset (like a schema)

  • gc_bucket (optional) -> The Google Cloud Storage Bucket to use for loading (Recommended)

  • key_file (optional) -> The path of the Service Account JSON. If not provided, the Google Application Default Credentials will be used.

  • key_body (optional) -> The Service Account JSON key content as a string. You can also provide the JSON content in env var GC_KEY_BODY.

  • location (optional) -> The location of the account, such as US or EU. Default is US.

  • extra_scopes (optional) -> An array of strings, which represent scopes to use in addition to https://www.googleapis.com/auth/bigquery. e.g. ["https://www.googleapis.com/auth/drive", "https://www.googleapis.com/auth/spreadsheets"]

Using sling conns

Here are examples of setting a connection named BIGQUERY. We must provide the type=bigquery property:

$ sling conns set BIGQUERY type=bigquery project=<project> dataset=<dataset> gc_bucket=<gc_bucket> key_file=/path/to/service.account.json location=<location>

Environment Variable

export BIGQUERY='{type: bigquery, project: my-google-project, gc_bucket: my_gc_bucket, dataset: public, location: US, key_file: /path/to/service.account.json}'

You can also provide Sling the Service Account JSON in key_body as a string, or via environment variable GC_KEY_BODY, instead of a key_file.

export GC_KEY_BODY='{"type": "service_account","project_id": ...........}'

Sling Env File YAML

See here to learn more about the sling env.yaml file.

connections:
  BIGQUERY:
    type: bigquery
    project: <project>
    dataset: <dataset>
    gc_bucket: <gc_bucket>
    key_file: '<key_file>'

  # using with `key_body` instead of `key_file`
  BIGQUERY:
    type: bigquery
    project: <project>
    dataset: <dataset>
    gc_bucket: <gc_bucket>
    key_body: |
      { "type": "service_account", ... } 

BigQuery Table Partitioning

streams:
  my_schema.another_table:
    object: my_dataset.{stream_table}
    target_options:
      table_keys:
        partition: [ DATE_TRUNC(transaction_date, MONTH) ]

# OR
streams:
  my_schema.another_table:
    object: my_dataset.{stream_table}
    target_options:
      table_ddl: |
         CREATE TABLE my_dataset.{stream_table} ({col_types}) 
          PARTITION BY
            DATE_TRUNC(transaction_date, MONTH)
            OPTIONS (
              partition_expiration_days = 3,
              require_partition_filter = TRUE)

If you are facing issues connecting, please reach out to us at [email protected], on discord or open a Github Issue here.

Last updated

Was this helpful?