DataDog is an observability platform that provides a wide range of solutions for monitoring and alert of cloud scale applications. I have had the pleasure of working with DataDog these past years. I have used it to perform Application Performance Monitoring (APM), Database Monitoring (DB), Log Aggregation and recently started to use its Real User Monitoring (RUM) capabilities.
RUM is a performance monitoring process that collects detailed information about an end-users interactions with a given applications. You can use it to measure how an end-user is interacting with the page elements, detect their mouse movements and frustrations. Combined with APM tracing, RUM gives you a complete view of user activities, service traces and database interactions.
You can use all of this collected data and create dashboards and visualizations in your DataDog console. These can help anytime be alerted and you can use them to troubleshoot your applications very effectively. However, collected data is controlled by certain data retention policies. These vary from service to service but I found out RUMs data retention is by default only 1 month. This means your dashboards can't go back and you can't really perform historical analysis. DataDog support team can work with you to increase this retention period to 3 months. However, if you wanted to have historical analysis performed year over year you wouldn't be able to do it.
This is where DataDogs RESTful APIs can be helpful. You can use EVENTS API to collect and download RUM EVENTS from DataDog. You can then use this information in your own way and perform year over year and historical data comparisons.
This write up will focus on how to achieve this using AWS lambda to execute APIs, AWS S3 to store RUM data, AWS Athena to query and AWS QuickSight to analyze and visualize this data.
Here is a very basic architecture diagram explaining this set up. Here are the list of components
- AWS Lamda Function: a simple lambda function that executes with a CloudWatch time based event trigger. You can set the frequency to whatever works for you but you may want to work with DataDog support and ensure you won't get throttled.
- AWS S3 Bucket: This is the storage mechanism for storing RUM EVENT data. I decided to create a simple folder structure for events by year, month, and day. I did this so that I can easily partition the data by those 3 attributes making it easy to query based on time.
- AWS Secrets Manager: In order to access, DataDog API you will need to create some secrets. You will need to use them in your controller logic (i.e lambda function) so that you can get authorized to access the REST endpoints of DataDog platform. AWS Secret manager is very easy to work with and you can use it to store them there. You can use Lambda environment variables to pass it secrets where you can decrypt and access them for usage.
- AWS Athena: Athena is a serverless query service (using SQL) that makes it easy to query data in S3 buckets.
- AWS QuickSight: QuickSight is a visualization tool that you can use to create highly interactive dashboards. QuickSight can use many different datasources to query, analyze and visualize. It has native integration with AWS Athena.
Accessing and Storing DataDog Events
package com.test.service;
import java.io.UnsupportedEncodingException;
import com.test.exception.ServiceException;
import com.test.RunDate;
import com.test.S3Repository;
import com.datadog.api.client.ApiException;
import com.datadog.api.client.PaginationIterable;
import com.datadog.api.client.v2.api.RumApi;
import com.datadog.api.client.v2.model.RUMEvent;
import com.datadog.api.client.v2.model.RUMSort;
import lombok.RequiredArgsConstructor;
import lombok.extern.slf4j.Slf4j;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.stereotype.Service;
@Service
@Slf4j
@RequiredArgsConstructor
public class DataDogService {
private final RumApi rumApi;
private final S3Repository s3Repository;
@Value("${app.datadog.page-limit:100}")
private int pageLimit;
@Value("${app.datadog.filter-query: env:dev @context.custom:whatever service:(application1 OR application2)}")
private String filterQuery;
public void collect() throws ServiceException {
try {
final var runDate = new RunDate();
final var lastRunDate = s3Repository.getLastRunDate();
RumApi.ListRUMEventsOptionalParameters parameters = new RumApi.ListRUMEventsOptionalParameters();
parameters.filterFrom(lastRunDate);
parameters.filterTo(runDate.getRunDate());
parameters.filterQuery(filterQuery);
parameters.pageLimit(pageLimit);
parameters.sort(RUMSort.TIMESTAMP_ASCENDING);
PaginationIterable<RUMEvent> result = rumApi.listRUMEventsWithPagination(parameters);
for (RUMEvent item : result) {
s3Repository.saveRumEvent(item);
}
s3Repository.updateLastRunDate(runDate);
} catch (ApiException | UnsupportedEncodingException e) {
throw new ServiceException(e.getMessage(), e);
}
}
}
Creating an Athena Table for S3 Bucket
CREATE EXTERNAL TABLE `data_dog_events`(
'id' string COMMENT 'from deserializer',
'type' struct COMMENT 'from deserializer',
'attributes' struct<?> COMMENT 'from deserializer')
PARTITIONED BY (
'year' string,
'month' string,
'day' string)
ROW FORMAT SERDE
'org.apache.hive.hcatalog.data.JsonSerDe'
WITH SERDEPROPERTIES (
'case.insensitive'='true')
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
's3://your_bucket/events'
Creating a DataSet and QuickSight Dashboard
- Number of Unique Users
- Number of Sessions
- Number of Users to Number of Sessions
- Your device and browser information and how that gets distributed to sessions
- You can see where your users may be geo spatially accessing your applications from a map visual.
Comments