Iceberg

Connect & Ingest data from / to Apache Iceberg tables

Apache Iceberg is an open table format for huge analytic datasets. Iceberg adds tables to compute engines including Spark, Trino, PrestoDB, Flink, Hive and Impala using a high-performance table format that works just like a SQL table. See https://iceberg.apache.org/arrow-up-right for more details.

Sling supports connecting to Iceberg tables through catalog backends including REST catalogs, AWS Glue, and SQL catalogs.

Setup

The following credentials keys are accepted:

Common Properties

  • catalog_type (required) -> The catalog type: rest, glue, or sql. Default is rest.

  • schema (optional) -> The default schema to use to read/write data. Default is main.

REST Catalog Configuration

  • rest_uri (required for REST) -> The REST catalog endpoint URI (e.g., https://s3tables.us-east-1.amazonaws.com/iceberg, https://catalog.cloudflarestorage.com/xxxxxxxxx/warehouse, http://localhost:8181).

  • rest_warehouse (optional) -> Warehouse location for the catalog (e.g., s3://bucket/warehouse, arn:aws:s3tables:region:account-id:bucket/namespace).

  • rest_token (optional) -> OAuth token for authentication.

  • rest_oauth_client_id (optional) -> OAuth client ID for authentication.

  • rest_oauth_client_secret (optional) -> OAuth client secret for authentication.

  • rest_oauth_scope (optional) -> OAuth scope for authentication.

  • rest_oauth_server_uri (optional) -> OAuth server URI for token requests.

  • rest_prefix (optional) -> API prefix for the REST catalog.

  • rest_metadata_location (optional) -> Custom metadata location.

  • rest_extra_props (optional) -> Additional properties as JSON string.

  • rest_sigv4_enable (optional) -> Enable AWS SigV4 authentication (true/false).

  • rest_sigv4_region (optional) -> AWS region for SigV4 authentication (with rest_sigv4_enable=true).

  • rest_sigv4_service (optional) -> AWS service name for SigV4 authentication (with rest_sigv4_enable=true).

Glue Catalog Configuration

  • glue_warehouse (required for Glue) -> Warehouse location in S3. e.g. s3://my-bucket/warehouse

  • glue_account_id (optional) -> AWS account ID for Glue catalog.

  • glue_namespace (optional) -> Namespace in the Glue catalog.

  • glue_extra_props (optional) -> Extra Glue Properties (object).

SQL Catalog Configuration

  • sql_catalog_name (optional) -> Name of the SQL catalog. Default is sql.

  • sql_catalog_conn (required for SQL) -> Name of a Sling database connection to use as catalog backend.

  • sql_catalog_init (optional) -> Whether to initialize catalog tables if they don't exist. Default is true.

Storage Configuration for Glue Catalog or S3Tables or SQL Catalog

For S3/S3-compatible storage:

  • s3_access_key_id (optional) -> AWS access key ID

  • s3_secret_access_key (optional) -> AWS secret access key

  • s3_session_token (optional) -> AWS session token

  • s3_region (optional) -> AWS region

  • s3_profile (optional) -> AWS profile to use

  • s3_endpoint (optional) -> S3-compatible endpoint URL (e.g. http://localhost:9000 for MinIO)

Using sling conns

Here are examples of setting a connection named ICEBERG. We must provide the type=iceberg property:

Environment Variable

See here to learn more about the .env.sling file.

Sling Env File YAML

See here to learn more about the sling env.yaml file.

Common Usage Examples

Basic Operations

Data Import/Export

Advanced Queries with DuckDB Integration

If you are facing issues connecting, please reach out to us at [email protected]envelope, on discordarrow-up-right or open a Github Issue herearrow-up-right.

Last updated

Was this helpful?