# Iceberg

Apache Iceberg is an open table format for huge analytic datasets. Iceberg adds tables to compute engines including Spark, Trino, PrestoDB, Flink, Hive and Impala using a high-performance table format that works just like a SQL table. See <https://iceberg.apache.org/> for more details.

Sling supports connecting to Iceberg tables through catalog backends including REST catalogs, AWS Glue, and SQL catalogs.

## Setup

The following credentials keys are accepted:

### Common Properties

* `catalog_type` **(required)** -> The catalog type: `rest`, `glue`, or `sql`. Default is `rest`.
* `schema` (optional) -> The default schema to use to read/write data. Default is `main`.

### REST Catalog Configuration

* `rest_uri` **(required for REST)** -> The REST catalog endpoint URI (e.g., `https://s3tables.us-east-1.amazonaws.com/iceberg`, `https://catalog.cloudflarestorage.com/xxxxxxxxx/warehouse`, `http://localhost:8181`).
* `rest_warehouse` (optional) -> Warehouse location for the catalog (e.g., `s3://bucket/warehouse`, `arn:aws:s3tables:region:account-id:bucket/namespace`).
* `rest_token` (optional) -> OAuth token for authentication.
* `rest_oauth_client_id` (optional) -> OAuth client ID for authentication.
* `rest_oauth_client_secret` (optional) -> OAuth client secret for authentication.
* `rest_oauth_scope` (optional) -> OAuth scope for authentication.
* `rest_oauth_server_uri` (optional) -> OAuth server URI for token requests.
* `rest_prefix` (optional) -> API prefix for the REST catalog.
* `rest_metadata_location` (optional) -> Custom metadata location.
* `rest_extra_props` (optional) -> Additional properties as JSON string.
* `rest_sigv4_enable` (optional) -> Enable AWS SigV4 authentication (true/false).
* `rest_sigv4_region` (optional) -> AWS region for SigV4 authentication (with `rest_sigv4_enable=true`).
* `rest_sigv4_service` (optional) -> AWS service name for SigV4 authentication (with `rest_sigv4_enable=true`).

### Glue Catalog Configuration

* `glue_warehouse` **(required for Glue)** -> Warehouse location in S3. e.g. `s3://my-bucket/warehouse`
* `glue_account_id` (optional) -> AWS account ID for Glue catalog.
* `glue_namespace` (optional) -> Namespace in the Glue catalog.
* `glue_extra_props` (optional) -> Extra Glue Properties (object).

### SQL Catalog Configuration

* `sql_catalog_name` (optional) -> Name of the SQL catalog. Default is `sql`.
* `sql_catalog_conn` **(required for SQL)** -> Name of a Sling database connection to use as catalog backend.
* `sql_catalog_init` (optional) -> Whether to initialize catalog tables if they don't exist. Default is `true`.

### Storage Configuration for Glue Catalog or S3Tables or SQL Catalog

For **S3/S3-compatible storage**:

* `s3_access_key_id` (optional) -> AWS access key ID
* `s3_secret_access_key` (optional) -> AWS secret access key
* `s3_session_token` (optional) -> AWS session token
* `s3_region` (optional) -> AWS region
* `s3_profile` (optional) -> AWS profile to use
* `s3_endpoint` (optional) -> S3-compatible endpoint URL (e.g. `http://localhost:9000` for MinIO)

### Using `sling conns`

Here are examples of setting a connection named `ICEBERG`. We must provide the `type=iceberg` property:

{% code overflow="wrap" %}

```bash
# REST catalog with local warehouse
$ sling conns set ICEBERG type=iceberg catalog_type=rest rest_uri=http://localhost:8181

# AWS S3 Tables via REST
$ sling conns set ICEBERG type=iceberg catalog_type=rest rest_warehouse="arn:aws:s3tables:us-east-1:123456789012:bucket/my-namespace" s3_profile=my-profile

# Cloudflare R2 Data Catalog via REST
$ sling conns set ICEBERG type=iceberg catalog_type=rest rest_uri="https://catalog.cloudflarestorage.com/xxxxxxxxx/warehouse" rest_warehouse="3fff6d86c73fcxxxxxxx4cb125f34927_warehouse" rest_token="<rest_token>"

# Glue catalog
$ sling conns set ICEBERG type=iceberg catalog_type=glue glue_warehouse=s3://my-bucket/glue-warehouse s3_access_key_id=AKIAIOSFODNN7EXAMPLE s3_secret_access_key=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY s3_region=us-east-1
```

{% endcode %}

### Environment Variable

See [here](https://docs.slingdata.io/connections/datalake-connections/pages/eAdVs2BHCgdr6RS8GoJC#dot-env-file-.env.sling) to learn more about the `.env.sling` file.

{% code overflow="wrap" %}

```bash
# REST catalog local
export ICEBERG='{ 
  type: iceberg, 
  catalog_type: rest,
  rest_uri: "http://localhost:8181",
  rest_token: "<rest_token>",
  rest_warehouse: "<rest_warehouse>"
}'

# REST catalog with OAuth
export ICEBERG='{ 
  type: iceberg, 
  catalog_type: rest,
  rest_uri: "http://localhost:8181",
  rest_oauth_client_id: "my-client-id",
  rest_oauth_client_secret: "my-client-secret",
  rest_oauth_scope: "catalog:write",
  rest_oauth_server_uri: "https://auth.example.com/oauth2/token"
}'

# AWS S3 Tables
export ICEBERG_S3='{ 
  type: iceberg, 
  catalog_type: rest,
  rest_warehouse: "arn:aws:s3tables:us-east-1:123456789012:bucket/my-namespace",
  s3_access_key_id: "AKIAIOSFODNN7EXAMPLE",
  s3_secret_access_key: "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
}'

# Iceberg with GCP backend (Apache Gravitino)
export ICEBERG_GCP='{ 
  type: iceberg
  catalog_name: iceberg_catalog
  catalog_type: rest
  rest_uri: http://endpoint.com:9001/iceberg
  rest_warehouse: s3://my-bucket/gravitino/gcs_as_s3
  s3_access_key_id: <s3_access_key_id>
  s3_secret_access_key: <s3_secret_access_key>
  s3_endpoint: https://myhost.storage.googleapis.com
  s3_region: auto
  schema: iceberg_schema
}'

# Glue catalog
export ICEBERG_GLUE='{ 
  type: iceberg, 
  catalog_type: glue,
  glue_warehouse: "s3://my-bucket/glue-warehouse",
  s3_region: "us-east-1",
  s3_profile: "default"
}'

# SQL catalog with PostgreSQL
export ICEBERG='{ 
  type: iceberg
  catalog_type: sql
  sql_catalog_conn: "POSTGRES_DB"
  sql_catalog_name: "iceberg_catalog"
  sql_catalog_init: true
  sql_warehouse: s3://my-bucket/iceberg/warehouse
  s3_access_key_id: AKIAIOSFODNN7EXAMPLE
  s3_secret_access_key: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
  s3_region: us-east-1
}'

```

{% endcode %}

### Sling Env File YAML

See [here](https://docs.slingdata.io/connections/datalake-connections/pages/eAdVs2BHCgdr6RS8GoJC#sling-env-file-env.yaml) to learn more about the sling `env.yaml` file.

```yaml
connections:
  ICEBERG:
    type: iceberg
    catalog_type: rest  # or glue, or sql
    catalog_name: iceberg
    schema: main
    
    # REST Catalog Configuration
    rest_uri: http://localhost:8181
    rest_warehouse: s3://my-bucket/warehouse
    rest_token: my-bearer-token
    rest_oauth_client_id: my-client-id
    rest_oauth_client_secret: my-client-secret
    rest_oauth_scope: catalog:write
    rest_oauth_server_uri: https://auth.example.com/oauth2/token
    
    # Glue Catalog Configuration
    catalog_type: glue
    glue_warehouse: s3://my-bucket/glue-warehouse
    s3_access_key_id: AKIAIOSFODNN7EXAMPLE
    s3_secret_access_key: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
    s3_region: us-east-1
    
    # SQL Catalog Configuration (alternative to REST/Glue)
    # sql_catalog_conn: POSTGRES_DB
    # sql_catalog_name: iceberg_catalog
    # sql_catalog_init: true
    
    # S3 Configuration (if using S3 storage)
    s3_access_key_id: AKIAIOSFODNN7EXAMPLE
    s3_secret_access_key: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
    s3_region: us-east-1
    s3_endpoint: http://localhost:9000  # for MinIO or other S3-compatible

  # Example SQL catalog setup with PostgreSQL backend
  ICEBERG_SQL:
    type: iceberg
    catalog_type: sql
    sql_catalog_conn: POSTGRES_CATALOG  # defined below
    sql_catalog_name: iceberg_catalog
    sql_catalog_init: true
    schema: main

    # S3 Configuration for storage
    sql_warehouse: s3://my-bucket/iceberg/warehouse
    s3_access_key_id: AKIAIOSFODNN7EXAMPLE
    s3_secret_access_key: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
    s3_region: us-east-1

  # PostgreSQL connection for SQL catalog backend
  POSTGRES_CATALOG:
    type: postgres
    host: localhost
    port: 5432
    user: postgres
    password: mypassword
    database: iceberg_catalog
```

## Common Usage Examples

### Basic Operations

```bash
# List namespaces (schemas)
sling conns discover ICEBERG

# List tables in a namespace
sling conns discover ICEBERG --schema my_namespace

# Query data
sling run --src-conn ICEBERG --src-stream "SELECT * FROM my_namespace.orders LIMIT 10" --stdout

# Export to CSV
sling run --src-conn ICEBERG --src-stream my_namespace.orders --tgt-object file://./orders.csv
```

### Data Import/Export

```bash
# Import CSV to Iceberg
sling run --src-stream file://./data.csv --tgt-conn ICEBERG --tgt-object my_namespace.new_data

# Import from PostgreSQL
sling run --src-conn POSTGRES_DB --src-stream public.customers --tgt-conn ICEBERG --tgt-object sales.customers

# Export to Parquet files
sling run --src-conn ICEBERG --src-stream sales.orders --tgt-conn AWS_S3 --tgt-object s3://bucket/exports/orders.parquet

# Incremental sync with timestamp
sling run --src-conn POSTGRES_DB --src-stream public.events --tgt-conn ICEBERG --tgt-object events.raw --mode incremental --primary-key id --update-key updated_at
```

### Advanced Queries with DuckDB Integration

```bash
# For complex SQL queries, Sling uses DuckDB with Iceberg extension
# Tables must be qualified with "iceberg_catalog" prefix
# Note: Custom SQL queries via DuckDB are only supported with REST and Glue catalog types
sling run --src-conn ICEBERG --src-stream "SELECT count(*) FROM iceberg_catalog.my_namespace.my_table" --stdout

# Join multiple Iceberg tables
sling run --src-conn ICEBERG --src-stream "
  SELECT o.*, c.customer_name 
  FROM iceberg_catalog.sales.orders o 
  JOIN iceberg_catalog.sales.customers c ON o.customer_id = c.id
" --stdout

# Export with custom SQL
sling run --src-conn ICEBERG --src-stream "
  SELECT date_trunc('month', order_date) as month, sum(amount) as total
  FROM iceberg_catalog.sales.orders 
  GROUP BY 1 ORDER BY 1
" --tgt-object file://./monthly_sales.csv
```

If you are facing issues connecting, please reach out to us at <support@slingdata.io>, on [discord](https://discord.gg/q5xtaSNDvp) or open a Github Issue [here](https://github.com/slingdata-io/sling-cli/issues).


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.slingdata.io/connections/datalake-connections/iceberg.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
