# Iceberg

Apache Iceberg is an open table format for huge analytic datasets. Iceberg adds tables to compute engines including Spark, Trino, PrestoDB, Flink, Hive and Impala using a high-performance table format that works just like a SQL table. See <https://iceberg.apache.org/> for more details.

Sling supports connecting to Iceberg tables through catalog backends including REST catalogs, AWS Glue, and SQL catalogs.

## Setup

The following credentials keys are accepted:

### Common Properties

* `catalog_type` **(required)** -> The catalog type: `rest`, `glue`, or `sql`. Default is `rest`.
* `schema` (optional) -> The default schema to use to read/write data. Default is `main`.

### REST Catalog Configuration

* `rest_uri` **(required for REST)** -> The REST catalog endpoint URI (e.g., `https://s3tables.us-east-1.amazonaws.com/iceberg`, `https://catalog.cloudflarestorage.com/xxxxxxxxx/warehouse`, `http://localhost:8181`).
* `rest_warehouse` (optional) -> Warehouse location for the catalog (e.g., `s3://bucket/warehouse`, `arn:aws:s3tables:region:account-id:bucket/namespace`).
* `rest_token` (optional) -> OAuth token for authentication.
* `rest_oauth_client_id` (optional) -> OAuth client ID for authentication.
* `rest_oauth_client_secret` (optional) -> OAuth client secret for authentication.
* `rest_oauth_scope` (optional) -> OAuth scope for authentication.
* `rest_oauth_server_uri` (optional) -> OAuth server URI for token requests.
* `rest_prefix` (optional) -> API prefix for the REST catalog.
* `rest_metadata_location` (optional) -> Custom metadata location.
* `rest_extra_props` (optional) -> Additional properties as JSON string.
* `rest_sigv4_enable` (optional) -> Enable AWS SigV4 authentication (true/false).
* `rest_sigv4_region` (optional) -> AWS region for SigV4 authentication (with `rest_sigv4_enable=true`).
* `rest_sigv4_service` (optional) -> AWS service name for SigV4 authentication (with `rest_sigv4_enable=true`).

### Glue Catalog Configuration

* `glue_warehouse` **(required for Glue)** -> Warehouse location in S3. e.g. `s3://my-bucket/warehouse`
* `glue_account_id` (optional) -> AWS account ID for Glue catalog.
* `glue_namespace` (optional) -> Namespace in the Glue catalog.
* `glue_extra_props` (optional) -> Extra Glue Properties (object).

### SQL Catalog Configuration

* `sql_catalog_name` (optional) -> Name of the SQL catalog. Default is `sql`.
* `sql_catalog_conn` **(required for SQL)** -> Name of a Sling database connection to use as catalog backend.
* `sql_catalog_init` (optional) -> Whether to initialize catalog tables if they don't exist. Default is `true`.

### Storage Configuration for Glue Catalog or S3Tables or SQL Catalog

For **S3/S3-compatible storage**:

* `s3_access_key_id` (optional) -> AWS access key ID
* `s3_secret_access_key` (optional) -> AWS secret access key
* `s3_session_token` (optional) -> AWS session token
* `s3_region` (optional) -> AWS region
* `s3_profile` (optional) -> AWS profile to use
* `s3_endpoint` (optional) -> S3-compatible endpoint URL (e.g. `http://localhost:9000` for MinIO)

### Using `sling conns`

Here are examples of setting a connection named `ICEBERG`. We must provide the `type=iceberg` property:

{% code overflow="wrap" %}

```bash
# REST catalog with local warehouse
$ sling conns set ICEBERG type=iceberg catalog_type=rest rest_uri=http://localhost:8181

# AWS S3 Tables via REST
$ sling conns set ICEBERG type=iceberg catalog_type=rest rest_warehouse="arn:aws:s3tables:us-east-1:123456789012:bucket/my-namespace" s3_profile=my-profile

# Cloudflare R2 Data Catalog via REST
$ sling conns set ICEBERG type=iceberg catalog_type=rest rest_uri="https://catalog.cloudflarestorage.com/xxxxxxxxx/warehouse" rest_warehouse="3fff6d86c73fcxxxxxxx4cb125f34927_warehouse" rest_token="<rest_token>"

# Glue catalog
$ sling conns set ICEBERG type=iceberg catalog_type=glue glue_warehouse=s3://my-bucket/glue-warehouse s3_access_key_id=AKIAIOSFODNN7EXAMPLE s3_secret_access_key=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY s3_region=us-east-1
```

{% endcode %}

### Environment Variable

See [here](https://docs.slingdata.io/sling-cli/environment#dot-env-file-.env.sling) to learn more about the `.env.sling` file.

{% code overflow="wrap" %}

```bash
# REST catalog local
export ICEBERG='{ 
  type: iceberg, 
  catalog_type: rest,
  rest_uri: "http://localhost:8181",
  rest_token: "<rest_token>",
  rest_warehouse: "<rest_warehouse>"
}'

# REST catalog with OAuth
export ICEBERG='{ 
  type: iceberg, 
  catalog_type: rest,
  rest_uri: "http://localhost:8181",
  rest_oauth_client_id: "my-client-id",
  rest_oauth_client_secret: "my-client-secret",
  rest_oauth_scope: "catalog:write",
  rest_oauth_server_uri: "https://auth.example.com/oauth2/token"
}'

# AWS S3 Tables
export ICEBERG_S3='{ 
  type: iceberg, 
  catalog_type: rest,
  rest_warehouse: "arn:aws:s3tables:us-east-1:123456789012:bucket/my-namespace",
  s3_access_key_id: "AKIAIOSFODNN7EXAMPLE",
  s3_secret_access_key: "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
}'

# Iceberg with GCP backend (Apache Gravitino)
export ICEBERG_GCP='{ 
  type: iceberg
  catalog_name: iceberg_catalog
  catalog_type: rest
  rest_uri: http://endpoint.com:9001/iceberg
  rest_warehouse: s3://my-bucket/gravitino/gcs_as_s3
  s3_access_key_id: <s3_access_key_id>
  s3_secret_access_key: <s3_secret_access_key>
  s3_endpoint: https://myhost.storage.googleapis.com
  s3_region: auto
  schema: iceberg_schema
}'

# Glue catalog
export ICEBERG_GLUE='{ 
  type: iceberg, 
  catalog_type: glue,
  glue_warehouse: "s3://my-bucket/glue-warehouse",
  s3_region: "us-east-1",
  s3_profile: "default"
}'

# SQL catalog with PostgreSQL
export ICEBERG='{ 
  type: iceberg
  catalog_type: sql
  sql_catalog_conn: "POSTGRES_DB"
  sql_catalog_name: "iceberg_catalog"
  sql_catalog_init: true
  sql_warehouse: s3://my-bucket/iceberg/warehouse
  s3_access_key_id: AKIAIOSFODNN7EXAMPLE
  s3_secret_access_key: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
  s3_region: us-east-1
}'

```

{% endcode %}

### Sling Env File YAML

See [here](https://docs.slingdata.io/sling-cli/environment#sling-env-file-env.yaml) to learn more about the sling `env.yaml` file.

```yaml
connections:
  ICEBERG:
    type: iceberg
    catalog_type: rest  # or glue, or sql
    catalog_name: iceberg
    schema: main
    
    # REST Catalog Configuration
    rest_uri: http://localhost:8181
    rest_warehouse: s3://my-bucket/warehouse
    rest_token: my-bearer-token
    rest_oauth_client_id: my-client-id
    rest_oauth_client_secret: my-client-secret
    rest_oauth_scope: catalog:write
    rest_oauth_server_uri: https://auth.example.com/oauth2/token
    
    # Glue Catalog Configuration
    catalog_type: glue
    glue_warehouse: s3://my-bucket/glue-warehouse
    s3_access_key_id: AKIAIOSFODNN7EXAMPLE
    s3_secret_access_key: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
    s3_region: us-east-1
    
    # SQL Catalog Configuration (alternative to REST/Glue)
    # sql_catalog_conn: POSTGRES_DB
    # sql_catalog_name: iceberg_catalog
    # sql_catalog_init: true
    
    # S3 Configuration (if using S3 storage)
    s3_access_key_id: AKIAIOSFODNN7EXAMPLE
    s3_secret_access_key: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
    s3_region: us-east-1
    s3_endpoint: http://localhost:9000  # for MinIO or other S3-compatible

  # Example SQL catalog setup with PostgreSQL backend
  ICEBERG_SQL:
    type: iceberg
    catalog_type: sql
    sql_catalog_conn: POSTGRES_CATALOG  # defined below
    sql_catalog_name: iceberg_catalog
    sql_catalog_init: true
    schema: main

    # S3 Configuration for storage
    sql_warehouse: s3://my-bucket/iceberg/warehouse
    s3_access_key_id: AKIAIOSFODNN7EXAMPLE
    s3_secret_access_key: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
    s3_region: us-east-1

  # PostgreSQL connection for SQL catalog backend
  POSTGRES_CATALOG:
    type: postgres
    host: localhost
    port: 5432
    user: postgres
    password: mypassword
    database: iceberg_catalog
```

## Common Usage Examples

### Basic Operations

```bash
# List namespaces (schemas)
sling conns discover ICEBERG

# List tables in a namespace
sling conns discover ICEBERG --schema my_namespace

# Query data
sling run --src-conn ICEBERG --src-stream "SELECT * FROM my_namespace.orders LIMIT 10" --stdout

# Export to CSV
sling run --src-conn ICEBERG --src-stream my_namespace.orders --tgt-object file://./orders.csv
```

### Data Import/Export

```bash
# Import CSV to Iceberg
sling run --src-stream file://./data.csv --tgt-conn ICEBERG --tgt-object my_namespace.new_data

# Import from PostgreSQL
sling run --src-conn POSTGRES_DB --src-stream public.customers --tgt-conn ICEBERG --tgt-object sales.customers

# Export to Parquet files
sling run --src-conn ICEBERG --src-stream sales.orders --tgt-conn AWS_S3 --tgt-object s3://bucket/exports/orders.parquet

# Incremental sync with timestamp
sling run --src-conn POSTGRES_DB --src-stream public.events --tgt-conn ICEBERG --tgt-object events.raw --mode incremental --primary-key id --update-key updated_at
```

### Advanced Queries with DuckDB Integration

```bash
# For complex SQL queries, Sling uses DuckDB with Iceberg extension
# Tables must be qualified with "iceberg_catalog" prefix
# Note: Custom SQL queries via DuckDB are only supported with REST and Glue catalog types
sling run --src-conn ICEBERG --src-stream "SELECT count(*) FROM iceberg_catalog.my_namespace.my_table" --stdout

# Join multiple Iceberg tables
sling run --src-conn ICEBERG --src-stream "
  SELECT o.*, c.customer_name 
  FROM iceberg_catalog.sales.orders o 
  JOIN iceberg_catalog.sales.customers c ON o.customer_id = c.id
" --stdout

# Export with custom SQL
sling run --src-conn ICEBERG --src-stream "
  SELECT date_trunc('month', order_date) as month, sum(amount) as total
  FROM iceberg_catalog.sales.orders 
  GROUP BY 1 ORDER BY 1
" --tgt-object file://./monthly_sales.csv
```

If you are facing issues connecting, please reach out to us at <support@slingdata.io>, on [discord](https://discord.gg/q5xtaSNDvp) or open a Github Issue [here](https://github.com/slingdata-io/sling-cli/issues).
