Athena
Connect & Ingest data from / to AWS Athena
AWS Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run. See https://aws.amazon.com/athena/ for more details.
Setup
The following credentials keys are accepted:
data_location(required) -> S3 Bucket location for table data storage. e.g.s3://athena-bucket/datastaging_location(required) -> S3 Bucket location for temporary data and results. e.g.s3://athena-bucket-staging/tempaws_region(required) -> AWS region where your Athena workgroup is located (e.g.,us-east-1,eu-west-1).aws_access_key_id(optional) -> AWS access key ID. Can also be provided viaAWS_ACCESS_KEY_IDenvironment variable.aws_secret_access_key(optional) -> AWS secret access key. Can also be provided viaAWS_SECRET_ACCESS_KEYenvironment variable.aws_session_token(optional) -> AWS session token for temporary credentials. Can also be provided viaAWS_SESSION_TOKENenvironment variable.aws_profile(optional) -> AWS profile name from your credentials file to use for authentication.workgroup(optional) -> Athena workgroup to use. Default isprimary.catalog(optional) -> Data catalog to use. Default isAwsDataCatalog.database(optional) -> Default database/schema to use for queries.
Authentication Methods
Athena supports multiple authentication methods:
Static Credentials: Provide
access_key_idandsecret_access_keyAWS Profile: Specify a
profilename from your AWS credentials fileDefault Credential Chain: Uses environment variables, IAM roles, or credential files automatically
Temporary Credentials: Use
session_tokenalong with access keys for temporary access
Using sling conns
sling connsHere are examples of setting a connection named ATHENA. We must provide the type=athena property:
# Using static AWS credentials
$ sling conns set ATHENA type=athena region=us-east-1 access_key_id=AKIAIOSFODNN7EXAMPLE secret_access_key=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY data_location=s3://my-bucket/athena-data/
# Using AWS profile
$ sling conns set ATHENA type=athena region=us-east-1 profile=my-profile data_location=s3://my-bucket/athena-data/
# Using default credential chain with custom workgroup
$ sling conns set ATHENA type=athena region=us-east-1 workgroup=analytics-workgroup data_location=s3://my-bucket/athena-data/
# With specific database and output location
$ sling conns set ATHENA type=athena region=us-east-1 database=my_database staging_location=s3://my-bucket/athena-results/ data_location=s3://my-bucket/athena-data/
# Using temporary credentials (session token)
$ sling conns set ATHENA type=athena region=us-east-1 access_key_id=AKIAI... secret_access_key=wJal... session_token=FwoGZXIvYXdzE...Environment Variable
# Configuration with static credentials
export ATHENA='{
type: athena,
region: "us-east-1",
access_key_id: "AKIAIOSFODNN7EXAMPLE",
secret_access_key: "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
workgroup: "primary",
catalog: "AwsDataCatalog",
database: "default",
data_location: "s3://my-bucket/athena-data/"
staging_location: "s3://my-bucket/athena-results/"
}'
# Configuration with AWS profile
export ATHENA='{
type: athena,
region: "us-east-1",
profile: "my-aws-profile",
workgroup: "analytics-workgroup",
database: "analytics_db",
data_location: "s3://my-bucket/athena-data/"
staging_location: "s3://my-bucket/athena-results/"
}'
# Windows PowerShell
$env:ATHENA='{
type: athena,
region: "us-east-1",
access_key_id: "AKIAIOSFODNN7EXAMPLE",
secret_access_key: "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
workgroup: "primary",
database: "default",
data_location: "s3://my-bucket/athena-data/"
staging_location: "s3://my-bucket/athena-results/"
}'Sling Env File YAML
See here to learn more about the sling env.yaml file.
connections:
ATHENA:
type: athena
region: <region>
access_key_id: <access_key_id>
secret_access_key: <secret_access_key>
data_location: <s3_location>
staging_location: <s3_location>
session_token: <session_token> # optional, for temporary credentials
profile: <aws_profile> # optional, alternative to keys
workgroup: <workgroup> # optional, defaults to 'primary'
catalog: <catalog> # optional, defaults to 'AwsDataCatalog'
database: <database> # optionalBulk Operations
For optimal performance with large datasets, Sling can leverage Athena's UNLOAD functionality and S3 integration:
Set
staging_locationproperty to enable S3-based bulk operationsAthena will use the UNLOAD command to export data to S3, then read from there
For imports, data is staged in S3 before being loaded into Athena tables
Common Usage Examples
Basic Operations
# List databases/catalogs
sling conns discover ATHENA
# List tables in a database
sling conns discover ATHENA --pattern "my_schema.*"
# Query data
sling run --src-conn ATHENA --src-stream "SELECT * FROM my_database.sales_data LIMIT 10" --stdout
# Export table to CSV
sling run --src-conn ATHENA --src-stream my_database.orders --tgt-object file://./orders.csvData Import/Export
# Import CSV to Athena (creates external table)
sling run --src-stream file://./data.csv --tgt-conn ATHENA --tgt-object my_database.new_table
# Import from another database
sling run --src-conn POSTGRES_DB --src-stream public.customers --tgt-conn ATHENA --tgt-object analytics.customers
# Export to S3 Parquet (using Athena UNLOAD)
sling run --src-conn ATHENA --src-stream analytics.sales --tgt-conn AWS_S3 --tgt-object s3://my-bucket/exports/sales.parquet
# Export with partitioning
sling run --src-conn ATHENA --src-stream "SELECT * FROM sales WHERE year = 2023" --tgt-conn AWS_S3 --tgt-object 's3://my-bucket/sales_2023/*.parquet'If you are facing issues connecting, please reach out to us at [email protected], on discord or open a Github Issue here.
Last updated
Was this helpful?