Analysing AWS Application Load Balancer Logs with DuckDB: Unleashing Performance Insights | Loop

maryam1013

July 16, 2023·3min read

AWS Application Load Balancers (ALBs) are vital components for modern web applications, providing traffic distribution and performance monitoring. In this blog, we share the very simple snippet we use to have DuckDB ingest our application logs and make them available for analysis.

 

Before you ask, we have many many years of scar tissue from multiple analytic databases (bigquery et al), so our use of DuckDB doesn’t stem from abject ignorance but rather the simplicity of exploratory data analysis (and, up to a certain point, speed) that the DuckDB methodology offers.

 

Let’s take a very simple query you’d like to make: figure out which URLs have end-to-end latency that falls afoul of an SLA of 20ms. And yes, latencies should be, in general, monitored using percentiles – but as you dig deeper, being able to run exploratory queries quickly is oh-so-useful.

 

Prerequisites

 

What you need for this recipe:

1. A DuckDB installation, which you can set up easily following the official documentation.

2. For boring reasons not entirely related to this post, we prefer having the logs locally available on the instance where you are running the import. Having it in the local filesystem isn’t as storage-intensive as you may think, as the logs are highly structured, gzipped by default, and sharded by date. If this is not an option, you can import directly from S3 into DuckDB.

 

Importing AWS ALB Logs into DuckDB.

 

One of the cool things about DuckDB is that it can seamlessly import (including GZIP files)!

create table logs as select * from 

read_csv_auto('*.log.gz', delim=' ',

types = {

    'type': 'VARCHAR',

    'time': 'TIMESTAMP',

    'elb': 'VARCHAR',

    'client:port': 'VARCHAR',

    'target:port': 'VARCHAR',

    'request_processing_time': 'FLOAT',

    'target_processing_time': 'FLOAT',

    'response_processing_time': 'FLOAT',

    'elb_status_code': 'INTEGER',

    'target_status_code': 'INTEGER',

    'received_bytes': 'BIGINT',

    'sent_bytes': 'BIGINT',

    'request': 'VARCHAR',

    'user_agent': 'VARCHAR',

    'ssl_cipher': 'VARCHAR',

    'ssl_protocol': 'VARCHAR',

    'target_group_arn': 'VARCHAR',

    'trace_id': 'VARCHAR',

    'domain_name': 'VARCHAR',

    'chosen_cert_arn': 'VARCHAR',

    'matched_rule_priority': 'VARCHAR',

    'request_creation_time': 'TIMESTAMP',

    'actions_executed': 'VARCHAR',

    'redirect_url': 'VARCHAR',

    'error_reason': 'VARCHAR',

    'target_port_list': 'VARCHAR',

    'target_status_code_list': 'VARCHAR',

    'classification': 'VARCHAR',

    'classification_reason': 'VARCHAR'

},

names=['type', 'time','elb','client:port','target:port','request_processing_time','target_processing_time','response_processing_time','elb_status_code','target_status_code','received_bytes','sent_bytes','request','user_agent','ssl_cipher','ssl_protocol','target_group_arn','trace_id','domain_name','chosen_cert_arn','matched_rule_priority','request_creation_time','actions_executed','redirect_url','error_reason','target:port_list','target_status_code_list','classification','classification_reason']) 

 

This query creates the logs table and imports your log files using the read_csv_auto function.

 

Executing Queries on ALB Logs

Now that your logs are imported into DuckDB, it’s time to extract insights. 

Let’s say you want to identify requests with slow target processing times. Here’s the query to uncover those sluggish culprits:

 

COPY(select *  from logs where target_processing_time::float > 20) TO 'slow.csv' (HEADER, DELIMITER ',');

 

This query selects rows from the logs table where the target_processing_time exceeds 20 (adjust as needed). The results are exported to a CSV file named slow.csv with headers and comma delimiters, ready for further analysis or integration with other tools.

Another very powerful aspect of this is being able to seamlessly use this DuckDB database in Python, merging this with other data sources from your infrastructure to do a deeper analysis.

Hope this snippet is useful!

Sign up with email for news and updates

Sign up with email to receive news, blogs and updates about Loop