Multiple Files & Cross-Bucket

Examples of loading multiple files and cross-bucket paths in Sling

This guide demonstrates advanced techniques for loading files from multiple paths, including files spanning multiple cloud storage buckets or containers using a single connection.

Loading Multiple Paths in a Single Stream

Sling allows you to specify multiple file paths for a single stream using the files key. This is useful when you want to combine data from several files into one target table.

Basic Multiple Files

replication.yaml

source: aws_s3
target: postgres

defaults:
  mode: full-refresh

streams:
  combined_data:
    files:
      - data/customers_2024.csv       # Single file
      - data/customers_2023.csv       # Single file
      - data/archive/                 # All files in folder
      - data/legacy/*.csv             # Wildcard pattern
    object: public.all_customers
    source_options:
      format: csv
      header: true

Cross-Bucket File Loading

A powerful feature of Sling is the ability to access files from multiple buckets using a single S3 connection. By specifying full s3:// URIs in the files array, you can pull data from different buckets (as long as your credentials have access to all of them).

replication.yaml

source: aws_s3
target: postgres

defaults:
  mode: full-refresh
  target_options:
    adjust_column_type: true

streams:
  # Method 1: Using full S3 URIs as stream names
  's3://bucket-west/data/sales.csv':
    object: public.sales_west

  's3://bucket-east/data/sales.csv':
    object: public.sales_east

  # Method 2: Combining files from multiple buckets into one table
  combined_sales:
    files:
      - s3://bucket-west/data/sales.csv           # Single file
      - s3://bucket-east/data/                    # All files in folder
      - s3://bucket-archive/historical/           # All files in folder
      - s3://bucket-legacy/sales/*.csv            # Wildcard pattern
    object: public.all_sales

env:
  SLING_STREAM_URL_COLUMN: true  # Track which file each row came from

Cross-Bucket Access: When using full URIs like s3://bucket-name/path, make sure your AWS credentials (configured in the connection) have read access to all the buckets referenced. The same principle applies to GCS (gs://) and Azure (https://) storage.

Cross-Container Loading for Other Cloud Providers

The same technique works with Google Cloud Storage and Azure Blob Storage:

source: gcs_conn
target: bigquery

streams:
  # From multiple GCS buckets
  combined_logs:
    files:
      - gs://prod-bucket/logs/2024/           # All files in folder
      - gs://staging-bucket/logs/             # All files in folder
      - gs://dev-bucket/logs/app.json         # Single file
      - gs://archive-bucket/logs/**/*.json    # Recursive wildcard
    object: dataset.all_logs
    source_options:
      format: json
      flatten: true

source: azure_conn
target: snowflake

streams:
  # From multiple Azure containers
  combined_data:
    files:
      - https://account.blob.core.windows.net/container1/data/        # All files in folder
      - https://account.blob.core.windows.net/container2/exports/     # All files in folder
      - https://account.blob.core.windows.net/archive/2024/*.parquet  # Wildcard pattern
    object: schema.all_data

Combining Folders, Wildcards, and Files

You can mix and match folder paths, wildcard patterns, and individual files for maximum flexibility:

replication.yaml

source: aws_s3
target: snowflake

defaults:
  mode: full-refresh

streams:
  # Load from multiple sources into one table
  all_regions:
    files:
      - s3://bucket/us-west/                  # All files in folder
      - s3://bucket/us-east/daily/            # All files in subfolder
      - s3://bucket/eu-central/*.csv          # Wildcard pattern
      - s3://bucket/apac/**/*.csv             # Recursive wildcard
      - s3://bucket/legacy/important.csv      # Single specific file
    object: warehouse.all_regions
    single: true  # Treat all matching files as one stream
    source_options:
      format: csv

env:
  SLING_STREAM_URL_COLUMN: true
  SLING_THREADS: 5

PreviousIncremental NextReading Excel

Last updated 1 month ago

Was this helpful?

hashtagLoading Multiple Paths in a Single Stream

hashtagBasic Multiple Files

hashtagCross-Bucket File Loading

hashtagCross-Container Loading for Other Cloud Providers

hashtagCombining Folders, Wildcards, and Files

Loading Multiple Paths in a Single Stream

Basic Multiple Files

Cross-Bucket File Loading

Cross-Container Loading for Other Cloud Providers

Combining Folders, Wildcards, and Files