Designing a Real-Time Monitoring System for the AWS Cloud: An Adaptive Dashboard-Based Approach with Prometheus and Grafana

Abstract

In this paper, we implemented a comprehensive monitoring system for the AWS cloud environment. The developed architecture is based on a secure AWS infrastructure using Amazon VPC for network segmentation, EC2 instances for hosting services, and Amazon S3 for data storage. The monitoring system integrates Prometheus for metrics collection and storage, coupled with Grafana for visualization through interactive dashboards. The obtained performance results show average scraping times of 0.212 seconds and query latencies as low as 0.0021 seconds, enabling near real-time monitoring of over 1,279 metrics collected from 3 targets. Anomaly detection, implemented using the SH-ESD statistical model, demonstrated an accuracy of 88.85% on a sample of 330 data points. The model correctly identified 231 normal values and detected 29 anomalies during stress testing on EC2 instances. The automated alert system, managed by Alertmanager, ensures instant email notifications when critical thresholds are exceeded. This result confirms the robustness of the developed solution, providing proactive monitoring of the cloud infrastructure with rapid detection and response capabilities to incidents, while maintaining scalability in line with operational needs.

Description

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By