Multiple Files & Cross-Bucket

Examples of loading multiple files and cross-bucket paths in Sling

This guide demonstrates advanced techniques for loading files from multiple paths, including files spanning multiple cloud storage buckets or containers using a single connection.

Loading Multiple Paths in a Single Stream

Sling allows you to specify multiple file paths for a single stream using the files key. This is useful when you want to combine data from several files into one target table.

Basic Multiple Files

replication.yaml
source: aws_s3
target: postgres

defaults:
  mode: full-refresh

streams:
  combined_data:
    files:
      - data/customers_2024.csv       # Single file
      - data/customers_2023.csv       # Single file
      - data/archive/                 # All files in folder
      - data/legacy/*.csv             # Wildcard pattern
    object: public.all_customers
    source_options:
      format: csv
      header: true

Cross-Bucket File Loading

A powerful feature of Sling is the ability to access files from multiple buckets using a single S3 connection. By specifying full s3:// URIs in the files array, you can pull data from different buckets (as long as your credentials have access to all of them).

circle-info

Cross-Bucket Access: When using full URIs like s3://bucket-name/path, make sure your AWS credentials (configured in the connection) have read access to all the buckets referenced. The same principle applies to GCS (gs://) and Azure (https://) storage.

Cross-Container Loading for Other Cloud Providers

The same technique works with Google Cloud Storage and Azure Blob Storage:

Combining Folders, Wildcards, and Files

You can mix and match folder paths, wildcard patterns, and individual files for maximum flexibility:

Last updated

Was this helpful?