# Multiple Files & Cross-Bucket

This guide demonstrates advanced techniques for loading files from multiple paths, including files spanning multiple cloud storage buckets or containers using a single connection.

## Loading Multiple Paths in a Single Stream

Sling allows you to specify multiple file paths for a single stream using the `files` key. This is useful when you want to combine data from several files into one target table.

### Basic Multiple Files

{% code title="replication.yaml" overflow="wrap" %}

```yaml
source: aws_s3
target: postgres

defaults:
  mode: full-refresh

streams:
  combined_data:
    files:
      - data/customers_2024.csv       # Single file
      - data/customers_2023.csv       # Single file
      - data/archive/                 # All files in folder
      - data/legacy/*.csv             # Wildcard pattern
    object: public.all_customers
    source_options:
      format: csv
      header: true
```

{% endcode %}

### Cross-Bucket File Loading

A powerful feature of Sling is the ability to access files from multiple buckets using a single S3 connection. By specifying full `s3://` URIs in the `files` array, you can pull data from different buckets (as long as your credentials have access to all of them).

{% code title="replication.yaml" overflow="wrap" %}

```yaml
source: aws_s3
target: postgres

defaults:
  mode: full-refresh
  target_options:
    adjust_column_type: true

streams:
  # Method 1: Using full S3 URIs as stream names
  's3://bucket-west/data/sales.csv':
    object: public.sales_west

  's3://bucket-east/data/sales.csv':
    object: public.sales_east

  # Method 2: Combining files from multiple buckets into one table
  combined_sales:
    files:
      - s3://bucket-west/data/sales.csv           # Single file
      - s3://bucket-east/data/                    # All files in folder
      - s3://bucket-archive/historical/           # All files in folder
      - s3://bucket-legacy/sales/*.csv            # Wildcard pattern
    object: public.all_sales

env:
  SLING_STREAM_URL_COLUMN: true  # Track which file each row came from
```

{% endcode %}

{% hint style="info" %}
**Cross-Bucket Access**: When using full URIs like `s3://bucket-name/path`, make sure your AWS credentials (configured in the connection) have read access to all the buckets referenced. The same principle applies to GCS (`gs://`) and Azure (`https://`) storage.
{% endhint %}

### Cross-Container Loading for Other Cloud Providers

The same technique works with Google Cloud Storage and Azure Blob Storage:

{% tabs %}
{% tab title="Google Cloud Storage" %}

```yaml
source: gcs_conn
target: bigquery

streams:
  # From multiple GCS buckets
  combined_logs:
    files:
      - gs://prod-bucket/logs/2024/           # All files in folder
      - gs://staging-bucket/logs/             # All files in folder
      - gs://dev-bucket/logs/app.json         # Single file
      - gs://archive-bucket/logs/**/*.json    # Recursive wildcard
    object: dataset.all_logs
    source_options:
      format: json
      flatten: true
```

{% endtab %}

{% tab title="Azure Blob Storage" %}

```yaml
source: azure_conn
target: snowflake

streams:
  # From multiple Azure containers
  combined_data:
    files:
      - https://account.blob.core.windows.net/container1/data/        # All files in folder
      - https://account.blob.core.windows.net/container2/exports/     # All files in folder
      - https://account.blob.core.windows.net/archive/2024/*.parquet  # Wildcard pattern
    object: schema.all_data
```

{% endtab %}
{% endtabs %}

## Combining Folders, Wildcards, and Files

You can mix and match folder paths, wildcard patterns, and individual files for maximum flexibility:

{% code title="replication.yaml" overflow="wrap" %}

```yaml
source: aws_s3
target: snowflake

defaults:
  mode: full-refresh

streams:
  # Load from multiple sources into one table
  all_regions:
    files:
      - s3://bucket/us-west/                  # All files in folder
      - s3://bucket/us-east/daily/            # All files in subfolder
      - s3://bucket/eu-central/*.csv          # Wildcard pattern
      - s3://bucket/apac/**/*.csv             # Recursive wildcard
      - s3://bucket/legacy/important.csv      # Single specific file
    object: warehouse.all_regions
    single: true  # Treat all matching files as one stream
    source_options:
      format: csv

env:
  SLING_STREAM_URL_COLUMN: true
  SLING_THREADS: 5
```

{% endcode %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.slingdata.io/examples/file-to-database/multi-files.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
