* client/authority-discovery: Publish and query on exponential interval
When a node starts up publishing and querying might fail due to various
reasons, for example due to being not yet fully bootstrapped on the DHT.
Thus one should retry rather sooner than later. On the other hand, a
long running node is likely well connected and thus timely retries are
not needed. For this reasoning use an exponentially increasing interval
for `publish_interval`, `query_interval` and
`priority_group_set_interval` instead of a constant interval.
* client/authority-discovery/src/interval.rs: Add license header
* .maintain/gitlab: Ensure adder collator tests are run on CI
* add_handlebar_template_to_benchmark
- add benchmark-cli arg to take in a handlebar-template file
* update to always use template
* rewrite writer for handlebars
* polish
* pass cmd data
* update docs
* support custom filename output
* Update command.rs
* Create frame-weight-template.hbs
* use a vector to maintain benchmark order
* fix tests
* Custom string serializer, remove feature flag
* update docs
* docs on public objects
* small fix
Co-authored-by: Ezadkiel Marbella <zadkiel.m@gmail.com>
* .maintain/monitoring: Add alert when continuous task ends
Through the `polkadot_tasks_ended_total` Prometheus metric one can tell
when a task ended. Use this metric to alert when specific
known-to-be-continuous tasks end on a node.
* .maintain/monitoring: Don't hard-code task names
* .maintain/monitoring: Normalize alerting rules
- Start alert names with their component and end with the describing
adjective.
- Describe alert duration in `message` with `for more than` across all
alerts.
* .maintain/monitoring: Fix alert tests
* Initiate chaostest cli test suite: singlenodeheight on one dev node
Added chaostest stages in CI
Added new docker/k8s resources and environments to CI
Added new chaos-only tag to gitlab-ci.yml
* Update .maintain/chaostest/src/commands/singlenodeheight/index.js
Co-authored-by: Max Inden <mail@max-inden.de>
* change nameSpace to namespace(one word)
* update chaos ci job to match template
* rename build-pr ci stage to docker [chaos:basic]
* test gitlab-ci [chaos:basic]
* Update .gitlab-ci.yml
* add new build-chaos-only condition
* add *default-vars to singlenodeheight [chaos:basic]
* change build-only to build-rules on substrate jobs [chaos:basic]
* test and change when:on_success to when:always [chaos:basic]
* resolve conflicts and test [chaos:basic]
Co-authored-by: Max Inden <mail@max-inden.de>
Co-authored-by: Denis Pisarev <denis.pisarev@parity.io>
The `HighCPUUsage` alert is based on the `cpu_usage_percentage` metric.
Instead of exposing the overall CPU usage in percent, the metric exposes
the per core usage summed over all cores.
This commit removes the alert for two reasons:
1. Substrate itself does not expose the core count and thus one can not
alert based on the `cpu_usage_percentage` metric.
2. Alerting based on CPU usage is generic and not specific to Substrate
or Blockchains. Thus any CPU usage alert suffice.
* Initial commit
Forked at: f54614e256
Parent branch: origin/master
* Remove polkadot companion detection from branch name
Even though it was nice it was also error prone as there were no indication whatsoever on the PR
that a polkadot companion branch exists.
* Update SubstrateCli to return String
* Add default implementation for executable_name()
* Use display instead of PathBuf
* Get file_name in default impl of executable_name
* Remove String::from and use .into()
* Use default impl for executable_name()
* Use .as_str() and remove useless .to_string()
* Update only sp-io when running companion build
* Remove unneeded update of sp-io in CI
Co-authored-by: Cecile Tonglet <cecile@parity.io>
The transaction queue size alert has been firing with a constant 10
transactions in the queue. While maybe problematic those 10 transactions
don't need to be the same across scrape intervals.
Instead of alerting with a size above 10, alert based on two things:
1. Monotonically increasing queue size
2. Upper limit queue size reached
Create a place to collaborate on Prometheus alerting rules for
Substrate starting with a basic set of rules covering:
- Resource usage
- Block production
- Block finalization
- Transaction queue
- Networking
- ... Others
* Run script in strict mode
* Add proper seperator between revision and file
* Fix copy paste error
* Do not repeat limit number in error text
* Fix bad revision error
* Do not mask pipe errors
* Fix typo
* Remove unnecessary ... syntax
* Do not fetch all commits of master
* Fetching one commit is enough
* client/authority-discovery: Allow to be run by sentry node
When run as a sentry node, the authority discovery module does not
publish any addresses to the dht, but still discovers validators and
sentry nodes of validators.
* client/authority-discovery/src/lib: Wrap lines at 100 characters
* client/authority-discovery: Remove TODO and unused import
* client/authority-discovery: Pass role to new unit tests
* client/authority-discovery: Apply suggestions
Co-Authored-By: André Silva <123550+andresilva@users.noreply.github.com>
* bin/node/cli/src/service: Use expressions instead of statements
Co-authored-by: André Silva <123550+andresilva@users.noreply.github.com>
* Substrate Dashboard example
* Improve README
* Update README_dashboard.md
* Add screenshots
* Minor fix
* Minor fix, image link
* .maintain/sentry-node: Add monitoring to docker-compose stack
With this patch a user can run the following fully configured and
monitored setup with a single command:
`docker-compose -f .maintain/sentry-node/docker-compose.yml up`
- 2 validators in two different network namespaces, connected via one
sentry node.
- Polkadot-js/apps to connect to one of the nodes above.
- Prometheus scraping the 3 Substrate nodes.
- Grafana displaying data from Prometheus with community dashboards
* .maintain/monitoring/grafana: Change default datasource name
* .maintain/monitoring/grafana: Add metric namespace option
* .maintain/monitoring/grafana: Remove `host` metric from most metrics
* .maintain/monitoring/grafana: Remove underscore from metric_namespace
* .maintain/monitoring: Use `instance` label instead of `hostname`
To identify a scrape target, one should use `instance` and not
`hostname` as multiple targets might run on the same node.
See https://prometheus.io/docs/concepts/jobs_instances/ for details.
* .maintain/monitoring: Introduce instance variable
* .maintain/monitoring/grafana: Rename substrate_block_height_number
* .maitain/monitoring/grafana: Use instance instead of host in legend
* .maintain/monitoring: Remove node exporter dependency
* .maintain/sentry-node/prometheus: Simplify configuration
* .maintain/monitoring/grafana: Update README and remove images
* .maintain/sentry-node: Improve docs
* .maintain/monitoring/grafana: Use metric_namespace template variable
* Use --sentry from v0.7.29 instead of a reserved-node
* .maintain/sentry-node: Revert sentry-a using validator-b as bootnode
Co-authored-by: DerFredy - @derfredy:matrix.org <derfredy@gmail.com>
Co-authored-by: david <davidd@custom.home>