Iceberg

Connect & Ingest data from / to Apache Iceberg tables

Apache Iceberg is an open table format for huge analytic datasets. Iceberg adds tables to compute engines including Spark, Trino, PrestoDB, Flink, Hive and Impala using a high-performance table format that works just like a SQL table. See https://iceberg.apache.org/ for more details.

Sling supports connecting to Iceberg tables through catalog backends including REST catalogs, AWS Glue, and SQL catalogs.

Setup

The following credentials keys are accepted:

Common Properties

  • catalog_type (required) -> The catalog type: rest, glue, or sql. Default is rest.

  • schema (optional) -> The default schema to use to read/write data. Default is main.

REST Catalog Configuration

  • rest_uri (required for REST) -> The REST catalog endpoint URI (e.g., https://s3tables.us-east-1.amazonaws.com/iceberg, https://catalog.cloudflarestorage.com/xxxxxxxxx/warehouse, http://localhost:8181).

  • rest_warehouse (optional) -> Warehouse location for the catalog (e.g., s3://bucket/warehouse, arn:aws:s3tables:region:account-id:bucket/namespace).

  • rest_token (optional) -> OAuth token for authentication.

  • rest_oauth_client_id (optional) -> OAuth client ID for authentication.

  • rest_oauth_client_secret (optional) -> OAuth client secret for authentication.

  • rest_oauth_scope (optional) -> OAuth scope for authentication.

  • rest_oauth_server_uri (optional) -> OAuth server URI for token requests.

  • rest_prefix (optional) -> API prefix for the REST catalog.

  • rest_metadata_location (optional) -> Custom metadata location.

  • rest_extra_props (optional) -> Additional properties as JSON string.

  • rest_sigv4_enable (optional) -> Enable AWS SigV4 authentication (true/false).

  • rest_sigv4_region (optional) -> AWS region for SigV4 authentication (with rest_sigv4_enable=true).

  • rest_sigv4_service (optional) -> AWS service name for SigV4 authentication (with rest_sigv4_enable=true).

Glue Catalog Configuration

  • glue_warehouse (required for Glue) -> Warehouse location in S3. e.g. s3://my-bucket/warehouse

  • glue_account_id (optional) -> AWS account ID for Glue catalog.

  • glue_namespace (optional) -> Namespace in the Glue catalog.

  • glue_extra_props (optional) -> Extra Glue Properties (object).

SQL Catalog Configuration

  • sql_catalog_name (optional) -> Name of the SQL catalog. Default is sql.

  • sql_catalog_conn (required for SQL) -> Name of a Sling database connection to use as catalog backend.

  • sql_catalog_init (optional) -> Whether to initialize catalog tables if they don't exist. Default is true.

Storage Configuration for Glue Catalog or S3Tables or SQL Catalog

For S3/S3-compatible storage:

  • s3_access_key_id (optional) -> AWS access key ID

  • s3_secret_access_key (optional) -> AWS secret access key

  • s3_session_token (optional) -> AWS session token

  • s3_region (optional) -> AWS region

  • s3_profile (optional) -> AWS profile to use

  • s3_endpoint (optional) -> S3-compatible endpoint URL (e.g. http://localhost:9000 for MinIO)

Using sling conns

Here are examples of setting a connection named ICEBERG. We must provide the type=iceberg property:

Environment Variable

Sling Env File YAML

See here to learn more about the sling env.yaml file.

Common Usage Examples

Basic Operations

Data Import/Export

Advanced Queries with DuckDB Integration

If you are facing issues connecting, please reach out to us at [email protected], on discord or open a Github Issue here.

Last updated

Was this helpful?