Working with AWS Athena - Configuration, Query Data, Create Tables and Views

By:   |   Updated: 2023-09-26   |   Comments   |   Related: > Amazon AWS


Problem

Amazon Web Services (AWS) is a popular cloud service for deploying infrastructure and applications with scalability, flexibility, and reliability. You can also use serverless computing to run codes without deploying any server. Amazon Athena is one such serverless computing application. This tip will give an introduction to SQL Athena and its use cases.

Solution

To work with the Amazon Athena, you need the following:

  • AWS Credentials: You need an account to log in to AWS with your credentials. If you don't have an account, register for the free-tier account.
  • Sample Data: For this tip, we will use the sample employee datasets from GitHub and query it using Amazon Athena.

Follow the steps below to query CSV datasets using Athena.

AWS Simple Storage Account, AWS S3, is highly scalable cloud storage. Users can upload any data format file in this object storage. To create the S3 bucket, search S3 in AWS services and click Create Bucket. Enter the bucket name, choose the AWS Region, and click Create Bucket.

Create bucket, general configuration

Open the S3 bucket, browse the local directory, and upload the sample file in the S3 bucket. Note the S3 bucket address shown in the summary page below. For this tip's S3 bucket, the destination address is s3://mymssqlbucket.

S3 bucket status

Configuration of Amazon Athena

Search for the Athena service and launch it from the services page.

search for Athena from services page

The launch screen of Amazon Athena shows different tabs: Editor, Recent queries, Saved queries, and Settings.

Athena launch screen

A message appears - "Before you run your first query, you need to set up a query result location in Amazon S3." Click on Edit Settings.

Enter the S3 bucket address to save the query results. If you want query results in an S3 bucket folder, create the folder first and specify the folder name and address as S3://bucket/folder.

Manage settings

Click Save, and it takes you to the Settings page with the updated configuration.

Settings page

Click on the Editor tab. Under the Tables and views area, click Create, then Create a table from data sources and choose the S3 bucket data.

Athena Editor | Data | Tables and views

It opens the Create table from the S3 bucket data page. Enter the required table name and description (optional), as shown below.

Create table from the S3 bucket data page

On the Database configuration page, choose "Create a database" or "Choose an existing database" if you already have an Athena database. If you create a new database, enter a name from 1-128 characters.

Database configuration page

The Data format page requires the following information to be provided:

  • Table type: Amazon Athena uses the Apache Hive Table as the default table type. However, we can choose the values from Apache Hive, Apache Iceberg, or Delta Lake.
  • File format: You can choose TSV, JSON, CSV, Parquet, etc., for the file format. In our case, choose the CSV as the input data file format.
  • SerDe library:
    • LazySimpleSerDe: If the CSV data does not have values enclosed in double quotes
    • OpenCSVSerDe: If the CSV data have values enclosed in double quotes
  • SerDe properties -optional: Choose the field delimiter as a Comma.
Data format

Next is the Column details page. Enter the column names and data types per your input data file. The following screenshot shows a few column details with their name and type.

Column details

Table Properties is an optional page. You can set table properties such as Write compression, Encryption, or choose the header line counts. I specified the value 1 to skip the first row while importing the data from CSV.

Table properties - optional

In the Preview table query, you can view the table script. You can note here that Athena uses a syntax similar to a regular SQL table. It uses the keyword EXTERNAL for the table.

Preview table query

Click Create Table to execute the script and deploy the table in the database. As shown below, the query runtime is 406 ms.

Query results

Querying the External Table in Amazon Athena

We have created the external table in Athena. Athena uses the logical namespace of the tables. These schema and data point to the Amazon S3 bucket for the read-time querying. Athena reads the data using the standard SQL language when you query the data.

Open a new query window and execute the SELECT statement to view all records from the CSV file stored in the S3 bucket. Athena processed the output in 422 ms for the 3.69 KBs of data scanned.

New query window

Check the query stats to see the number of rows processed, input bytes, output bytes, and runtime stats in chart format.

Query stats

Like ANSI SQL, you can use the WHERE clause to filter the data in your SQL queries. For example, let's filter out records for employees whose first name starts with the character D for our data analysis.

Filter data

Similarly, we can use COUNT(*) to check the number of records in the CSV file.

Check the number of records

Creating a Table from Query Results (CTAS)

You can use CREATE TABLE AS SELECT, aka CTAS, in Athena to create new tables using the result of SELECT statements. It is similar to SELECT * INTO in SQL Server.

You can use the CTAS query from the result of any executed Athena queries. For example, I ran the query to find out employees whose salary is greater than 10000.

query results

Now, I want to store this result as a table. Therefore, click on Create -> Table from Query.

Create table from query

Enter the table name, choose the database, and choose the table location(S3 address).

Create table as select

Preview the CTAS query as shown below. Click Create table. It creates the table from the schema of the select statement.

Preview table query

We can also use the CTAS query format and create a table from the query window. For example, let's create a table for those employees whose department id is 90.

CREATE TABLE "mssqltips"."FilterDeptEmp" WITH (
  format = 'parquet',
  external_location = 's3://mymssqlbucket/FilterDeptEmp'
) AS
Select *
from mssqlemployeesdata
where DEPARTMENT_ID= 90
query results

The benefits of CTAS queries are:

  • You do not need to define the schema for the table; it is automatically created from the SELECT statement output.
  • In an extensive database with many rows and columns, you can create a table by selecting the required row and column to minimize data processing. It will be a cost savings for data processing of Athena queries as Athena would scan less data.
  • You can create an empty copy of an existing table.

Create View in Amazon Athena

The view is a logical table that runs the query specified in it. The view helps query a subset of data or combine multiple tables in a single query. It eliminates the complexity of querying for reuse. It hides the underlying table schema and requires no changes in case the underlying table schema doesn't change.

Like the SQL statement, Athena also uses the CREATE VIEW to define a view in the Athena database. Let's create a view that returns the employees whose JOB ID is IT_PROG. The view contains the SELECT statement to filter the records matching criteria.

CREATE OR REPLACE VIEW EmployeeDeptFilter AS
SELECT employee_id, first_name, last_name
FROM mssqlemployeesdata
WHERE Job_ID='IT_PROG';
query results

Users can directly execute the view without knowing the underlying table and column names.

execute the view

You can use Describe View to return the list of columns for the specified view.

Describe view

Similarly, SHOW CREATE VIEW returns the SQL statement used to create the view.

SHOW CREATE VIEW

If you need to drop a view, use the DROP VIEW IF EXISTS statement.

DROP VIEW IF EXISTS  mssqltips.EmployeeDeptFilter
DROP VIEW IF EXISTS
Next Steps


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Rajendra Gupta Rajendra Gupta is a Consultant DBA with 14+ years of extensive experience in database administration including large critical OLAP, OLTP, Reporting and SharePoint databases.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips


Article Last Updated: 2023-09-26

Comments For This Article

















get free sql tips
agree to terms