Iceberg
Connect & Ingest data from / to Apache Iceberg tables
Apache Iceberg is an open table format for huge analytic datasets. Iceberg adds tables to compute engines including Spark, Trino, PrestoDB, Flink, Hive and Impala using a high-performance table format that works just like a SQL table. See https://iceberg.apache.org/ for more details.
Sling supports connecting to Iceberg tables through catalog backends including REST catalogs, AWS Glue, and SQL catalogs.
Setup
The following credentials keys are accepted:
Common Properties
catalog_type(required) -> The catalog type:rest,glue, orsql. Default isrest.schema(optional) -> The default schema to use to read/write data. Default ismain.
REST Catalog Configuration
rest_uri(required for REST) -> The REST catalog endpoint URI (e.g.,https://s3tables.us-east-1.amazonaws.com/iceberg,https://catalog.cloudflarestorage.com/xxxxxxxxx/warehouse,http://localhost:8181).rest_warehouse(optional) -> Warehouse location for the catalog (e.g.,s3://bucket/warehouse,arn:aws:s3tables:region:account-id:bucket/namespace).rest_token(optional) -> OAuth token for authentication.rest_oauth_client_id(optional) -> OAuth client ID for authentication.rest_oauth_client_secret(optional) -> OAuth client secret for authentication.rest_oauth_scope(optional) -> OAuth scope for authentication.rest_oauth_server_uri(optional) -> OAuth server URI for token requests.rest_prefix(optional) -> API prefix for the REST catalog.rest_metadata_location(optional) -> Custom metadata location.rest_extra_props(optional) -> Additional properties as JSON string.rest_sigv4_enable(optional) -> Enable AWS SigV4 authentication (true/false).rest_sigv4_region(optional) -> AWS region for SigV4 authentication (withrest_sigv4_enable=true).rest_sigv4_service(optional) -> AWS service name for SigV4 authentication (withrest_sigv4_enable=true).
Glue Catalog Configuration
glue_warehouse(required for Glue) -> Warehouse location in S3. e.g.s3://my-bucket/warehouseglue_account_id(optional) -> AWS account ID for Glue catalog.glue_namespace(optional) -> Namespace in the Glue catalog.glue_extra_props(optional) -> Extra Glue Properties (object).
SQL Catalog Configuration
sql_catalog_name(optional) -> Name of the SQL catalog. Default issql.sql_catalog_conn(required for SQL) -> Name of a Sling database connection to use as catalog backend.sql_catalog_init(optional) -> Whether to initialize catalog tables if they don't exist. Default istrue.
Storage Configuration for Glue Catalog or S3Tables or SQL Catalog
For S3/S3-compatible storage:
s3_access_key_id(optional) -> AWS access key IDs3_secret_access_key(optional) -> AWS secret access keys3_session_token(optional) -> AWS session tokens3_region(optional) -> AWS regions3_profile(optional) -> AWS profile to uses3_endpoint(optional) -> S3-compatible endpoint URL (e.g.http://localhost:9000for MinIO)
Using sling conns
sling connsHere are examples of setting a connection named ICEBERG. We must provide the type=iceberg property:
Environment Variable
See here to learn more about the .env.sling file.
Sling Env File YAML
See here to learn more about the sling env.yaml file.
Common Usage Examples
Basic Operations
Data Import/Export
Advanced Queries with DuckDB Integration
If you are facing issues connecting, please reach out to us at [email protected], on discord or open a Github Issue here.
Last updated
Was this helpful?