Athena

Connect & Ingest data from / to AWS Athena

AWS Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run. See https://aws.amazon.com/athena/ for more details.

Setup

The following credentials keys are accepted:

  • data_location (required) -> S3 Bucket location for table data storage. e.g. s3://athena-bucket/data

  • staging_location (required) -> S3 Bucket location for temporary data and results. e.g. s3://athena-bucket-staging/temp

  • aws_region (required) -> AWS region where your Athena workgroup is located (e.g., us-east-1, eu-west-1).

  • aws_access_key_id (optional) -> AWS access key ID. Can also be provided via AWS_ACCESS_KEY_ID environment variable.

  • aws_secret_access_key (optional) -> AWS secret access key. Can also be provided via AWS_SECRET_ACCESS_KEY environment variable.

  • aws_session_token (optional) -> AWS session token for temporary credentials. Can also be provided via AWS_SESSION_TOKEN environment variable.

  • aws_profile (optional) -> AWS profile name from your credentials file to use for authentication.

  • workgroup (optional) -> Athena workgroup to use. Default is primary.

  • catalog (optional) -> Data catalog to use. Default is AwsDataCatalog.

  • database (optional) -> Default database/schema to use for queries.

Authentication Methods

Athena supports multiple authentication methods:

  1. Static Credentials: Provide access_key_id and secret_access_key

  2. AWS Profile: Specify a profile name from your AWS credentials file

  3. Default Credential Chain: Uses environment variables, IAM roles, or credential files automatically

  4. Temporary Credentials: Use session_token along with access keys for temporary access

Using sling conns

Here are examples of setting a connection named ATHENA. We must provide the type=athena property:

Environment Variable

Sling Env File YAML

See here to learn more about the sling env.yaml file.

Bulk Operations

For optimal performance with large datasets, Sling can leverage Athena's UNLOAD functionality and S3 integration:

  • Set staging_location property to enable S3-based bulk operations

  • Athena will use the UNLOAD command to export data to S3, then read from there

  • For imports, data is staged in S3 before being loaded into Athena tables

Common Usage Examples

Basic Operations

Data Import/Export

If you are facing issues connecting, please reach out to us at [email protected], on discord or open a Github Issue here.

Last updated

Was this helpful?