Limit Data Scanning and Excess Query Plans in Amazon Athena
Amazon Athena charges for data scanned by queries and you might pay a higher cost if a user runs a more extensive query than needed. Is there a way the administrator can limit the amount of data scanning?
AWS Athena Workgroups are a flexible solution to split users, teams, applications, or workloads. You might have groups requiring different functions to use Amazon Athena for their analytical queries. Workgroups act as a natural and necessary tool for Amazon Athena management. For example, it can limit the data scanned for individual queries for the specific workgroup.
Here are some of the benefits of using workgroups in Amazon Athena:
- Resource Management: The workgroup is like a resource in AWS. Therefore, you can use an identity-based policy to control access for a specific workgroup at the resource level. It controls the individual's access to operate queries and with which data they interact through workgroups. Consequently, it enhances the safety of your information and shields it from unauthorized access by others performing queries.
- Performance Optimization: You can use the workgroups to split the queries per their requirements. For example, different workgroups for scheduled queries, ad-hoc queries, and production workloads.
- Improved Control and Monitoring: You can use AWS services such as Amazon CloudWatch to view query-related metrics at the workgroup level. Therefore, it lets you track each workgroup's usage and incurred costs. Similarly, you can define thresholds and trigger actions using Amazon SNS at the workgroup. If you tag the workgroup as a cost allocation tag in the Amazon Billing and Cost Management Console, you get each Athena workgroup's cost and usage report.
How to Use Workgroup in Amazon Athena?
By default, each Amazon Athena uses the primary workgroup. Launch the AWS Web Console and Amazon Athena service page. On the Athena home page, click on Workgroups. As shown below, we have a primary workgroup, which contains default permissions for the AWS authenticated users. You cannot delete the primary workgroup.
The Workgroups page shows the following information:
- Workgroup name
- Analytics Engine version: The current version is Athena engine version 3.
- Engine Updates: Automatic
- Created on (timestamp)
Additionally, you can view the workgroup in the top right corner of the Athena query editor.
Setting Up a New Workgroup
Click Create WorkGroup in the Workgroups tab of Amazon Athena.
Enter the following details for a new workgroup.
Workgroup Name and Description
The workgroup name should be unique per region of your AWS account. You cannot modify the workgroup name once it is created. The description is an optional field.
Choose an analytics engine for the workgroup. The available options are:
- Athena SQL: Use the Athena SQL engine to run interactive SQL queries for the data stored in the S3 bucket.
- Apache Spark: Use Apache Spark to create, edit, or run the Jupyter Notebook using Python and Apache Spark. The notebook can contain markdowns, codes, rich media, and comments.
Upgrade Query Engine
Choose Automatic or Manual mode to upgrade the query engine. AWS gives you a notification before upgrading the engine in case of Automatic mode. The default mode is also Automatic, and it is recommended that the latest engine version is used.
Query Result Configuration (Optional)
- Location of query results: Enter the S3 bucket location to save the query results as an object.
- Expected bucket owner: It is the AWS account ID that will be the owner of the results S3 bucket. The bucket owner's account ID should match the ID you specify here. In a mismatch case, results will not be saved in a specified S3 bucket.
- Assign bucket owner full control over query results: It controls whether the AWS S3 query results S3 bucket owner has full control over the new objects stored in the bucket.
- Encrypt Query results:
You can encrypt the query results using one of the following encryption mechanisms:
- SSE_S3: Server-side encryption (SSE) with S3-managed encryption keys.
- SSE_KMS: Server-side encryption (SSE) with AWS KMS-managed keys.
- CSE_KMS: Client-side encryption (CSE) with KMS-managed keys.
- Publish query metrics to CloudWatch: By default, AWS publishes query metrics to AWS CloudWatch for monitoring.
- Override client-side settings: You can use this option to override client-level settings and use the workgroup configuration for all queries.
Per Query Data Usage Control
You can specify a maximum limit of the amount for a query data scan. This setting is applicable per query in a workgroup. The minimum and maximum limits are 10 MB and 7 EB.
If you put a check on the option - No data limit, there will be no control over the amount of data scanned by queries in a workgroup.
Let's create a workgroup and specify the query usage limit as 10MB.
Click on Workgroups under the Administration tab to see that both Primary (default) and new workgroup (MyWorkgroup_mssqltips) are turned on.
Now, go to the Query editor in Amazon Athena and choose the workgroup from the drop-down. You get a notification whenever you switch the workgroup, as shown below.
Let's run a sample query to retrieve data from the AWS CloudTrail:
SELECT eventTime, useridentity.arn, awsRegion FROM cloudtrail_logs_aws_cloudtrail_logs_147081669821_91ab9629 where eventName = 'ConsoleLogin';
The below query performed a data scan of 260.61 KB under the workgroup limit of 10MB. Therefore, the Athena query works successfully. The query will be canceled if the data scan for an individual query exceeds the workgroup limit.
Use CloudWatch Metrics to Monitor Workgroup Query Usage
AWS sends query metrics for all workgroups in the default configuration.
You can use the CloudWatch metric - ProcessedBytes to monitor the workgroup utilization. Launch AWS CloudWatch and click Metrics -> All Metrics, as shown below.
You can view the metric - ProcessedBytes for the workgroup MyWorkgroup_mssqltips on this page. As shown below, you can set the timeframe to see the metric data in a chart format. You can hover the mouse over a data point to check the counter value at a specific point.
- Read AWS Athena documentation on the AWS blog - https://aws.amazon.com/athena/
- Read more AWS tips on MSSQLTips.
About the author
This author pledges the content of this article is based on professional experience and not AI generated.
View all my tips
Article Last Updated: 2023-11-29