Iceberg
Connect & Ingest data from / to Apache Iceberg tables
Apache Iceberg is an open table format for huge analytic datasets. Iceberg adds tables to compute engines including Spark, Trino, PrestoDB, Flink, Hive and Impala using a high-performance table format that works just like a SQL table. See https://iceberg.apache.org/ for more details.
Sling supports connecting to Iceberg tables through catalog backends including REST catalogs, AWS Glue, and SQL catalogs.
Setup
The following credentials keys are accepted:
Common Properties
catalog_type(required) -> The catalog type:rest,glue, orsql. Default isrest.schema(optional) -> The default schema to use to read/write data. Default ismain.
REST Catalog Configuration
rest_uri(required for REST) -> The REST catalog endpoint URI (e.g.,https://s3tables.us-east-1.amazonaws.com/iceberg,https://catalog.cloudflarestorage.com/xxxxxxxxx/warehouse,http://localhost:8181).rest_warehouse(optional) -> Warehouse location for the catalog (e.g.,s3://bucket/warehouse,arn:aws:s3tables:region:account-id:bucket/namespace).rest_token(optional) -> OAuth token for authentication.rest_oauth_client_id(optional) -> OAuth client ID for authentication.rest_oauth_client_secret(optional) -> OAuth client secret for authentication.rest_oauth_scope(optional) -> OAuth scope for authentication.rest_oauth_server_uri(optional) -> OAuth server URI for token requests.rest_prefix(optional) -> API prefix for the REST catalog.rest_metadata_location(optional) -> Custom metadata location.rest_extra_props(optional) -> Additional properties as JSON string.rest_sigv4_enable(optional) -> Enable AWS SigV4 authentication (true/false).rest_sigv4_region(optional) -> AWS region for SigV4 authentication (withrest_sigv4_enable=true).rest_sigv4_service(optional) -> AWS service name for SigV4 authentication (withrest_sigv4_enable=true).
Glue Catalog Configuration
glue_warehouse(required for Glue) -> Warehouse location in S3. e.g.s3://my-bucket/warehouseglue_account_id(optional) -> AWS account ID for Glue catalog.glue_namespace(optional) -> Namespace in the Glue catalog.glue_extra_props(optional) -> Extra Glue Properties (object).
SQL Catalog Configuration
sql_catalog_name(optional) -> Name of the SQL catalog. Default issql.sql_catalog_conn(required for SQL) -> Name of a Sling database connection to use as catalog backend.sql_catalog_init(optional) -> Whether to initialize catalog tables if they don't exist. Default istrue.
Storage Configuration for Glue Catalog or S3Tables or SQL Catalog
For S3/S3-compatible storage:
s3_access_key_id(optional) -> AWS access key IDs3_secret_access_key(optional) -> AWS secret access keys3_session_token(optional) -> AWS session tokens3_region(optional) -> AWS regions3_profile(optional) -> AWS profile to uses3_endpoint(optional) -> S3-compatible endpoint URL (e.g.http://localhost:9000for MinIO)
Using sling conns
sling connsHere are examples of setting a connection named ICEBERG. We must provide the type=iceberg property:
# REST catalog with local warehouse
$ sling conns set ICEBERG type=iceberg catalog_type=rest rest_uri=http://localhost:8181
# AWS S3 Tables via REST
$ sling conns set ICEBERG type=iceberg catalog_type=rest rest_warehouse="arn:aws:s3tables:us-east-1:123456789012:bucket/my-namespace" s3_profile=my-profile
# Cloudflare R2 Data Catalog via REST
$ sling conns set ICEBERG type=iceberg catalog_type=rest rest_uri="https://catalog.cloudflarestorage.com/xxxxxxxxx/warehouse" rest_warehouse="3fff6d86c73fcxxxxxxx4cb125f34927_warehouse" rest_token="<rest_token>"
# Glue catalog
$ sling conns set ICEBERG type=iceberg catalog_type=glue glue_warehouse=s3://my-bucket/glue-warehouse s3_access_key_id=AKIAIOSFODNN7EXAMPLE s3_secret_access_key=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY s3_region=us-east-1Environment Variable
# REST catalog local
export ICEBERG='{
type: iceberg,
catalog_type: rest,
rest_uri: "http://localhost:8181",
rest_token: "<rest_token>",
rest_warehouse: "<rest_warehouse>"
}'
# REST catalog with OAuth
export ICEBERG='{
type: iceberg,
catalog_type: rest,
rest_uri: "http://localhost:8181",
rest_oauth_client_id: "my-client-id",
rest_oauth_client_secret: "my-client-secret",
rest_oauth_scope: "catalog:write",
rest_oauth_server_uri: "https://auth.example.com/oauth2/token"
}'
# AWS S3 Tables
export ICEBERG_S3='{
type: iceberg,
catalog_type: rest,
rest_warehouse: "arn:aws:s3tables:us-east-1:123456789012:bucket/my-namespace",
s3_access_key_id: "AKIAIOSFODNN7EXAMPLE",
s3_secret_access_key: "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
}'
# Iceberg with GCP backend (Apache Gravitino)
export ICEBERG_GCP='{
type: iceberg
catalog_name: iceberg_catalog
catalog_type: rest
rest_uri: http://endpoint.com:9001/iceberg
rest_warehouse: s3://my-bucket/gravitino/gcs_as_s3
s3_access_key_id: <s3_access_key_id>
s3_secret_access_key: <s3_secret_access_key>
s3_endpoint: https://myhost.storage.googleapis.com
s3_region: auto
schema: iceberg_schema
}'
# Glue catalog
export ICEBERG_GLUE='{
type: iceberg,
catalog_type: glue,
glue_warehouse: "s3://my-bucket/glue-warehouse",
s3_region: "us-east-1",
s3_profile: "default"
}'
# SQL catalog with PostgreSQL
export ICEBERG='{
type: iceberg
catalog_type: sql
sql_catalog_conn: "POSTGRES_DB"
sql_catalog_name: "iceberg_catalog"
sql_catalog_init: true
sql_warehouse: s3://my-bucket/iceberg/warehouse
s3_access_key_id: AKIAIOSFODNN7EXAMPLE
s3_secret_access_key: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
s3_region: us-east-1
}'
Sling Env File YAML
See here to learn more about the sling env.yaml file.
connections:
ICEBERG:
type: iceberg
catalog_type: rest # or glue, or sql
catalog_name: iceberg
schema: main
# REST Catalog Configuration
rest_uri: http://localhost:8181
rest_warehouse: s3://my-bucket/warehouse
rest_token: my-bearer-token
rest_oauth_client_id: my-client-id
rest_oauth_client_secret: my-client-secret
rest_oauth_scope: catalog:write
rest_oauth_server_uri: https://auth.example.com/oauth2/token
# Glue Catalog Configuration
catalog_type: glue
glue_warehouse: s3://my-bucket/glue-warehouse
s3_access_key_id: AKIAIOSFODNN7EXAMPLE
s3_secret_access_key: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
s3_region: us-east-1
# SQL Catalog Configuration (alternative to REST/Glue)
# sql_catalog_conn: POSTGRES_DB
# sql_catalog_name: iceberg_catalog
# sql_catalog_init: true
# S3 Configuration (if using S3 storage)
s3_access_key_id: AKIAIOSFODNN7EXAMPLE
s3_secret_access_key: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
s3_region: us-east-1
s3_endpoint: http://localhost:9000 # for MinIO or other S3-compatible
# Example SQL catalog setup with PostgreSQL backend
ICEBERG_SQL:
type: iceberg
catalog_type: sql
sql_catalog_conn: POSTGRES_CATALOG # defined below
sql_catalog_name: iceberg_catalog
sql_catalog_init: true
schema: main
# S3 Configuration for storage
sql_warehouse: s3://my-bucket/iceberg/warehouse
s3_access_key_id: AKIAIOSFODNN7EXAMPLE
s3_secret_access_key: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
s3_region: us-east-1
# PostgreSQL connection for SQL catalog backend
POSTGRES_CATALOG:
type: postgres
host: localhost
port: 5432
user: postgres
password: mypassword
database: iceberg_catalogCommon Usage Examples
Basic Operations
# List namespaces (schemas)
sling conns discover ICEBERG
# List tables in a namespace
sling conns discover ICEBERG --schema my_namespace
# Query data
sling run --src-conn ICEBERG --src-stream "SELECT * FROM my_namespace.orders LIMIT 10" --stdout
# Export to CSV
sling run --src-conn ICEBERG --src-stream my_namespace.orders --tgt-object file://./orders.csvData Import/Export
# Import CSV to Iceberg
sling run --src-stream file://./data.csv --tgt-conn ICEBERG --tgt-object my_namespace.new_data
# Import from PostgreSQL
sling run --src-conn POSTGRES_DB --src-stream public.customers --tgt-conn ICEBERG --tgt-object sales.customers
# Export to Parquet files
sling run --src-conn ICEBERG --src-stream sales.orders --tgt-conn AWS_S3 --tgt-object s3://bucket/exports/orders.parquet
# Incremental sync with timestamp
sling run --src-conn POSTGRES_DB --src-stream public.events --tgt-conn ICEBERG --tgt-object events.raw --mode incremental --primary-key id --update-key updated_atAdvanced Queries with DuckDB Integration
# For complex SQL queries, Sling uses DuckDB with Iceberg extension
# Tables must be qualified with "iceberg_catalog" prefix
# Note: Custom SQL queries via DuckDB are only supported with REST and Glue catalog types
sling run --src-conn ICEBERG --src-stream "SELECT count(*) FROM iceberg_catalog.my_namespace.my_table" --stdout
# Join multiple Iceberg tables
sling run --src-conn ICEBERG --src-stream "
SELECT o.*, c.customer_name
FROM iceberg_catalog.sales.orders o
JOIN iceberg_catalog.sales.customers c ON o.customer_id = c.id
" --stdout
# Export with custom SQL
sling run --src-conn ICEBERG --src-stream "
SELECT date_trunc('month', order_date) as month, sum(amount) as total
FROM iceberg_catalog.sales.orders
GROUP BY 1 ORDER BY 1
" --tgt-object file://./monthly_sales.csvIf you are facing issues connecting, please reach out to us at [email protected], on discord or open a Github Issue here.
Last updated
Was this helpful?