Iceberg
Connect & Ingest data from / to Apache Iceberg tables
Apache Iceberg is an open table format for huge analytic datasets. Iceberg adds tables to compute engines including Spark, Trino, PrestoDB, Flink, Hive and Impala using a high-performance table format that works just like a SQL table. See https://iceberg.apache.org/ for more details.
Sling supports connecting to Iceberg tables through catalog backends including REST catalogs, AWS Glue, and SQL catalogs.
Setup
The following credentials keys are accepted:
Common Properties
catalog_type(required) -> The catalog type:rest,glue, orsql. Default isrest.schema(optional) -> The default schema to use to read/write data. Default ismain.
REST Catalog Configuration
rest_uri(required for REST) -> The REST catalog endpoint URI (e.g.,https://s3tables.us-east-1.amazonaws.com/iceberg,https://catalog.cloudflarestorage.com/xxxxxxxxx/warehouse,http://localhost:8181).rest_warehouse(optional) -> Warehouse location for the catalog (e.g.,s3://bucket/warehouse,arn:aws:s3tables:region:account-id:bucket/namespace).rest_token(optional) -> OAuth token for authentication.rest_oauth_client_id(optional) -> OAuth client ID for authentication.rest_oauth_client_secret(optional) -> OAuth client secret for authentication.rest_oauth_scope(optional) -> OAuth scope for authentication.rest_oauth_server_uri(optional) -> OAuth server URI for token requests.rest_prefix(optional) -> API prefix for the REST catalog.rest_metadata_location(optional) -> Custom metadata location.rest_extra_props(optional) -> Additional properties as JSON string.rest_sigv4_enable(optional) -> Enable AWS SigV4 authentication (true/false).rest_sigv4_region(optional) -> AWS region for SigV4 authentication (withrest_sigv4_enable=true).rest_sigv4_service(optional) -> AWS service name for SigV4 authentication (withrest_sigv4_enable=true).
Glue Catalog Configuration
glue_warehouse(required for Glue) -> Warehouse location in S3. e.g.s3://my-bucket/warehouseglue_account_id(optional) -> AWS account ID for Glue catalog.glue_namespace(optional) -> Namespace in the Glue catalog.glue_extra_props(optional) -> Extra Glue Properties (object).
SQL Catalog Configuration
sql_catalog_name(optional) -> Name of the SQL catalog. Default issql.sql_catalog_conn(required for SQL) -> Name of a Sling database connection to use as catalog backend.sql_catalog_init(optional) -> Whether to initialize catalog tables if they don't exist. Default istrue.
Storage Configuration for Glue Catalog or S3Tables or SQL Catalog
For S3/S3-compatible storage:
s3_access_key_id(optional) -> AWS access key IDs3_secret_access_key(optional) -> AWS secret access keys3_session_token(optional) -> AWS session tokens3_region(optional) -> AWS regions3_profile(optional) -> AWS profile to uses3_endpoint(optional) -> S3-compatible endpoint URL (e.g.http://localhost:9000for MinIO)
Using sling conns
sling connsHere are examples of setting a connection named ICEBERG. We must provide the type=iceberg property:
Environment Variable
Sling Env File YAML
See here to learn more about the sling env.yaml file.
Common Usage Examples
Basic Operations
Data Import/Export
Advanced Queries with DuckDB Integration
If you are facing issues connecting, please reach out to us at [email protected], on discord or open a Github Issue here.
Last updated
Was this helpful?