Solutions Monitoring Cheat Sheet
Concepts
- “4” golden monitoring signals
- Metrics to choose from
- Request Rate
- Error Rate
- Latency
- Saturation
- Utilization
- Golden Signals variants
- Google SRE
- USE Method
- RED Method
- Metrics to choose from
- TSZ Compression Technique (Facebook Paper)
TSDBs (Time Series DBs)
- M3 (Prometheus, etcd, replication, Scale at Uber: 500Mio/s, Billions Storage)
- Thanos (Prometheus, federation)
- Grafana Mimir (Prometheus, scale up to 1Mrd active time series)
- InfluxDB (commercial, replication, good scale)
- eXtremeDB (commericial)
- TimescaleDB (Postgres, replication)
- Graphite/Whisper (no replication)
- Prometheus
- DalmatinerDB
- Riak-TS
Alarming / Paging / SMS Notification
All SaaS
- PagerDuty
- VictorOps
- BigPanda
- OpsGenie
- AlertOps
- iLert
DNS, Ping
Network Mapping
Mapping Solutions
Network Forensics
Host-based Service Monitoring
Self-hosted:
- Nagios
- Icinga 2
- check_mk
- Shinken
- Splunk
- Sensu
- Groundworks
Saas APMs:
- NewRelic
- AppDynamics
- DataDog
- Dynatrace
- Stackify Retrace
- Ruxit
- Sysdig Cloud
- Instana
- SignalFX
- SemaText (Metrics & Logs combined, correlation, Influx DB API for metrics, Elasticsearch API for logs)
Docker/Kubernetes
See also this review
- Prometheus
- Hawkular
- DataDog (SaaS)
- Sensu
- Scout
- Sysdig Cloud
External Website Monitoring
- gomez.com (now dynatrace)
- yottaa.com
- monitis.com
- pingdom.com
- Ruxit (RUM)
- uptrends.com