BigQuery
Connect & Ingest data from / to a BigQuery database
Setup
The following credentials keys are accepted:
project(required) -> The GCP project ID for the projectdataset(required) -> The default dataset (like a schema)gc_bucket(optional) -> The Google Cloud Storage Bucket to use for loading (Recommended)key_file(optional) -> The path of the Service Account JSON. If not provided, the Google Application Default Credentials will be used.key_body(optional) -> The Service Account JSON key content as a string. You can also provide the JSON content in env varGC_KEY_BODY.location(optional) -> The location of the account, such asUSorEU. Default isUS.extra_scopes(optional) -> An array of strings, which represent scopes to use in addition tohttps://www.googleapis.com/auth/bigquery. e.g.["https://www.googleapis.com/auth/drive", "https://www.googleapis.com/auth/spreadsheets"]
If you'd like to have sling use the machine's Google Cloud Application Default Credentials (usually with cloud auth application-default login), don't specify a key_file (or the env var GC_KEY_BODY).
Using sling conns
sling connsHere are examples of setting a connection named BIGQUERY. We must provide the type=bigquery property:
$ sling conns set BIGQUERY type=bigquery project=<project> dataset=<dataset> gc_bucket=<gc_bucket> key_file=/path/to/service.account.json location=<location>Environment Variable
export BIGQUERY='{type: bigquery, project: my-google-project, gc_bucket: my_gc_bucket, dataset: public, location: US, key_file: /path/to/service.account.json}'You can also provide Sling the Service Account JSON in key_body as a string, or via environment variable GC_KEY_BODY, instead of a key_file.
export GC_KEY_BODY='{"type": "service_account","project_id": ...........}'Sling Env File YAML
See here to learn more about the sling env.yaml file.
connections:
BIGQUERY:
type: bigquery
project: <project>
dataset: <dataset>
gc_bucket: <gc_bucket>
key_file: '<key_file>'
# using with `key_body` instead of `key_file`
BIGQUERY:
type: bigquery
project: <project>
dataset: <dataset>
gc_bucket: <gc_bucket>
key_body: |
{ "type": "service_account", ... } BigQuery Table Partitioning
streams:
my_schema.another_table:
object: my_dataset.{stream_table}
target_options:
table_keys:
partition: [ DATE_TRUNC(transaction_date, MONTH) ]
# OR
streams:
my_schema.another_table:
object: my_dataset.{stream_table}
target_options:
table_ddl: |
CREATE TABLE my_dataset.{stream_table} ({col_types})
PARTITION BY
DATE_TRUNC(transaction_date, MONTH)
OPTIONS (
partition_expiration_days = 3,
require_partition_filter = TRUE)If you are facing issues connecting, please reach out to us at [email protected], on discord or open a Github Issue here.
Last updated
Was this helpful?