Posts Tagged APM

Brief Introduction to Observability on AWS

Explore the wide variety of Application Performance Monitoring (APM) and Observability options on AWS

Licensed image: ArtemisDiana/Shutterstock.com
Licensed image: ArtemisDiana/Shutterstock.com

APM and Observability

Observability is “the extent to which the internal states of a system can be inferred from externally available data” (Gartner). The three pillars of observability data are metricslogs, and tracesApplication Performance Monitoring (APM), a term commonly associated with observability, is “software that enables the observation and analysis of application health, performance, and user experience” (Gartner).

Additional features often associated with APM and observability products and services include the following (in alphabetical order):

  • Advanced Threat Protection (ATP)
  • Endpoint Detection and Response (EDR)
  • Incident Detection Response (IDR)
  • Infrastructure Performance Monitoring (IPM)
  • Network Device Monitoring (NDM)
  • Network Performance Monitoring (NPM)
  • OpenTelemetry (OTel)
  • Operational (or Operations) Intelligence Platform
  • Predictive Monitoring (predictive analytics / predictive modeling)
  • Real User Monitoring (RUM)
  • Security Information and Event Management (SIEM)
  • Security Orchestration, Automation, and Response (SOAR)
  • Synthetic Monitoring (directed monitoring / synthetic testing)
  • Threat Visibility and Risk Scoring
  • Unified Security and Observability Platform
  • User Behavior Analytics (UBA)

Not all features are offered by all vendors. Most vendors tend to specialize in one or more areas. Determining which features are essential to your organization before choosing a solution is vital.

AI/ML

Given the growing volume and real-time nature of observability telemetry, many vendors have started incorporating AI and ML into their products and services to improve correlation, anomaly detection, and mitigation capabilities. Understand how these features can reduce operational burden, enhance insights, and simplify complexity.

Decision Factors

APM and observability tooling choices often come down to a “Build vs. Buy” decision for organizations. In the Cloud, this usually means integrating several individual purpose-built products and services, self-managed open-source projects, or investing in an end-to-end APM or unified observability platform. Other decision factors include the need for solutions to support:

  • Hybrid cloud environments (on-premises/Cloud)
  • Multi-cloud environments (Public Cloud, SaaS, Supercloud)
  • Specialized workloads (e.g., Mainframes, HPC, VMware, SAP, SAS)
  • Compliant workloads (e.g., PCI DSS, PII, GDPR, FedRAMP)
  • Edge Computing and IoT/IIoT
  • AI, ML, and Data Analytics monitoring (AIOps, MLOps, DataOps)
  • SaaS observability (SaaS providers who offer monitoring to their end-users as part of their service offering)
  • Custom log formats and protocols

Finally, the 5 V’s of big data: Velocity, Volume, Value, Variety, and Veracity, also influence the choice of APM and observability tooling. The real-time nature of the observability data, the sheer volume of the data, the source and type of data, and the sensitivity of the data, will all guide tooling choices based on features and cost.

Organizations can choose fully-managed native AWS services, AWS Partner products and services, often SaaS, self-managed open-source observability tooling, or a combination of options. Many AWS and Partner products and services are commercial versions of popular open-source software (COSS).

AWS Options

Data Collection, Processing, and Forwarding

  • AWS Distro for OpenTelemetry (ADOT): Open-source APIs, libraries, and agents to collect distributed traces and metrics for application monitoring. ADOT is an open-source distribution of OpenTelemetry, a “high-quality, ubiquitous, and portable telemetry [solution] to enable effective observability.
  • AWS for Fluent Bit: Fluent Bit image with plugins for both CloudWatch Logs and Kinesis Data Firehose. Fluent Bit is an open-source, “super fast, lightweight, and highly scalable logging and metrics processor and forwarder.

Security-focused Monitoring

  • AWS CloudTrail: Helps enable operational and risk auditing, governance, and compliance in your AWS environment. CloudTrail records events, including actions taken in the AWS Management Console, AWS Command Line Interface (CLI), AWS SDKs, and APIs.

Partner Options

According to the 2022 Gartner® Magic Quadrant™ — APM & Observability Report, leading vendors commonly used by AWS customers include the following (in alphabetical order):

Open Source Options

There are countless open-source observability projects to choose from, including the following (in alphabetical order):

  • Elastic: Elastic Stack: Elasticsearch, Kibana, Beats, and Logstash
  • Fluentd: Data collector for unified logging layer
  • Grafana: Platform for monitoring and observability
  • Jaeger: End-to-end distributed tracing
  • Kiali: Configure, visualize, validate, and troubleshoot Istio Service Mesh
  • Loki: Like Prometheus, but for logs
  • OpenSearch: Scalable, flexible, and extensible software suite for search, analytics, and observability applications
  • OpenTelemetry (OTel): Collection of tools, APIs, and SDKs used to instrument, generate, collect, and export telemetry data
  • Prometheus: Monitoring system and time series database
  • Zabbix: Single pane of glass view of your whole IT infrastructure stack

🔔 To keep up with future content, follow Gary Stafford on LinkedIn.


This blog represents my viewpoints and not those of my employer, Amazon Web Services (AWS). All product names, logos, and brands are the property of their respective owners.

, , , , , ,

Leave a comment