* .maintain/monitoring: Add alert when continuous task ends
Through the `polkadot_tasks_ended_total` Prometheus metric one can tell
when a task ended. Use this metric to alert when specific
known-to-be-continuous tasks end on a node.
* .maintain/monitoring: Don't hard-code task names
* .maintain/monitoring: Normalize alerting rules
- Start alert names with their component and end with the describing
adjective.
- Describe alert duration in `message` with `for more than` across all
alerts.
* .maintain/monitoring: Fix alert tests
The `HighCPUUsage` alert is based on the `cpu_usage_percentage` metric.
Instead of exposing the overall CPU usage in percent, the metric exposes
the per core usage summed over all cores.
This commit removes the alert for two reasons:
1. Substrate itself does not expose the core count and thus one can not
alert based on the `cpu_usage_percentage` metric.
2. Alerting based on CPU usage is generic and not specific to Substrate
or Blockchains. Thus any CPU usage alert suffice.
The transaction queue size alert has been firing with a constant 10
transactions in the queue. While maybe problematic those 10 transactions
don't need to be the same across scrape intervals.
Instead of alerting with a size above 10, alert based on two things:
1. Monotonically increasing queue size
2. Upper limit queue size reached
Create a place to collaborate on Prometheus alerting rules for
Substrate starting with a basic set of rules covering:
- Resource usage
- Block production
- Block finalization
- Transaction queue
- Networking
- ... Others