Rules

mail

4.261s ago

1.18ms

Rule State Error Last Evaluation Evaluation Time
alert: PostfixDown expr: node_systemd_unit_state{name="postfix.service",state="active"} == 0 for: 2m labels: service: postfix severity: critical annotations: description: Postfix mail service has been inactive for more than 2 minutes summary: Postfix is down on {{ $labels.instance }} ok 4.261s ago 626.5us
alert: DovecotDown expr: node_systemd_unit_state{name="dovecot.service",state="active"} == 0 for: 2m labels: service: dovecot severity: critical annotations: description: Dovecot IMAP service has been inactive for more than 2 minutes summary: Dovecot is down on {{ $labels.instance }} ok 4.261s ago 220.7us
alert: PostfixMailQueueGrowing expr: node_postfix_queue_size > 50 for: 15m labels: service: postfix severity: warning annotations: description: 'Mail queue has {{ $value }} messages (threshold: 50)' summary: Postfix mail queue growing on {{ $labels.instance }} ok 4.261s ago 161us
alert: PostfixMailQueueCritical expr: node_postfix_queue_size > 200 for: 5m labels: service: postfix severity: critical annotations: description: Mail queue has {{ $value }} messages — possible delivery failure summary: Postfix mail queue critical on {{ $labels.instance }} ok 4.261s ago 127.8us

system

22.174s ago

12.96ms

Rule State Error Last Evaluation Evaluation Time
alert: HighCPULoad expr: 100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 85 for: 5m labels: severity: warning annotations: description: 'CPU usage is {{ printf "%.1f" $value }}% (threshold: 85%)' summary: High CPU load on {{ $labels.instance }} ok 22.174s ago 1.787ms
alert: CriticalCPULoad expr: 100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 95 for: 2m labels: severity: critical annotations: description: 'CPU usage is {{ printf "%.1f" $value }}% (threshold: 95%)' summary: Critical CPU load on {{ $labels.instance }} ok 22.173s ago 539.8us
alert: HighMemoryUsage expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 85 for: 5m labels: severity: warning annotations: description: 'Memory usage is {{ printf "%.1f" $value }}% (threshold: 85%)' summary: High memory usage on {{ $labels.instance }} ok 22.173s ago 311.7us
alert: CriticalMemoryUsage expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 95 for: 2m labels: severity: critical annotations: description: Memory usage is {{ printf "%.1f" $value }}% summary: Critical memory usage on {{ $labels.instance }} ok 22.173s ago 541.9us
alert: DiskSpaceWarning expr: (1 - (node_filesystem_avail_bytes{fstype!~"tmpfs|fuse.lxcfs"} / node_filesystem_size_bytes{fstype!~"tmpfs|fuse.lxcfs"})) * 100 > 80 for: 5m labels: severity: warning annotations: description: Disk {{ $labels.mountpoint }} is {{ printf "%.1f" $value }}% full summary: Disk space warning on {{ $labels.instance }} ok 22.172s ago 1.393ms
alert: DiskSpaceCritical expr: (1 - (node_filesystem_avail_bytes{fstype!~"tmpfs|fuse.lxcfs"} / node_filesystem_size_bytes{fstype!~"tmpfs|fuse.lxcfs"})) * 100 > 90 for: 2m labels: severity: critical annotations: description: Disk {{ $labels.mountpoint }} is {{ printf "%.1f" $value }}% full summary: Critical disk space on {{ $labels.instance }} ok 22.171s ago 612.2us
alert: DiskWillFillIn24h expr: predict_linear(node_filesystem_avail_bytes{fstype!~"tmpfs|fuse.lxcfs"}[6h], 24 * 3600) < 0 for: 30m labels: severity: warning annotations: description: '{{ $labels.mountpoint }} is predicted to be full within 24 hours' summary: Disk will fill within 24h on {{ $labels.instance }} ok 22.171s ago 1.512ms
alert: HighLoadAverage expr: node_load15 / on (instance) count by (instance) (node_cpu_seconds_total{mode="idle"}) > 0.8 for: 10m labels: severity: warning annotations: description: 15-minute load average per CPU is {{ printf "%.2f" $value }} summary: High load average on {{ $labels.instance }} ok 22.17s ago 327.2us
alert: ServerDown expr: up == 0 for: 1m labels: severity: critical annotations: description: '{{ $labels.instance }} ({{ $labels.job }}) has been unreachable for more than 1 minute' summary: Instance {{ $labels.instance }} is down ok 22.17s ago 198.9us
alert: UnexpectedReboot expr: node_time_seconds - node_boot_time_seconds < 300 labels: severity: warning annotations: description: Server has been up for less than 5 minutes — possible unexpected reboot summary: 'Server rebooted: {{ $labels.instance }}' ok 22.17s ago 255.5us
alert: SystemdServiceFailed expr: node_systemd_unit_state{state="failed"} == 1 for: 2m labels: severity: warning annotations: description: Service {{ $labels.name }} is in failed state summary: Systemd service failed on {{ $labels.instance }} ok 22.172s ago 5.387ms

web

29.409s ago

3.83ms

Rule State Error Last Evaluation Evaluation Time
alert: NginxDown expr: node_systemd_unit_state{name="nginx.service",state="active"} == 0 for: 1m labels: service: nginx severity: critical annotations: description: Nginx web server has been inactive for more than 1 minute summary: Nginx is down on {{ $labels.instance }} ok 29.409s ago 2.021ms
alert: ApacheDown expr: node_systemd_unit_state{name="apache2.service",state="active"} == 0 for: 1m labels: service: apache2 severity: critical annotations: description: Apache web server has been inactive for more than 1 minute summary: Apache is down on {{ $labels.instance }} ok 29.408s ago 653.6us
alert: SSLCertExpiringWarning expr: probe_ssl_earliest_cert_expiry - time() < 86400 * 30 for: 1h labels: severity: warning annotations: description: Certificate expires in {{ $value | humanizeDuration }} summary: 'SSL certificate expiring soon: {{ $labels.instance }}' ok 29.408s ago 594.2us
alert: SSLCertExpiringCritical expr: probe_ssl_earliest_cert_expiry - time() < 86400 * 7 for: 1h labels: severity: critical annotations: description: Certificate expires in {{ $value | humanizeDuration }} — renew immediately summary: 'SSL certificate expiring in 7 days: {{ $labels.instance }}' ok 29.407s ago 457.3us