Working towards migrating the `parity-bridges-common` repo inside
`polkadot-sdk`. This PR upgrades some dependencies in order to align
them with the versions used in `parity-bridges-common`
Related to
https://github.com/paritytech/parity-bridges-common/issues/2538
### What's been done
- `subsystem-bench` has been split into two parts: a cli benchmark
runner and a library.
- The cli runner is quite simple. It just allows us to run `.yaml` based
test sequences. Now it should only be used to run benchmarks during
development.
- The library is used in the cli runner and in regression tests. Some
code is changed to make the library independent of the runner.
- Added first regression tests for availability read and write that
replicate existing test sequences.
### How we run regression tests
- Regression tests are simply rust integration tests without the
harnesses.
- They should only be compiled under the `subsystem-benchmarks` feature
to prevent them from running with other tests.
- This doesn't work when running tests with `nextest` in CI, so
additional filters have been added to the `nextest` runs.
- Each benchmark run takes a different time in the beginning, so we
"warm up" the tests until their CPU usage differs by only 1%.
- After the warm-up, we run the benchmarks a few more times and compare
the average with the exception using a precision.
### What is still wrong?
- I haven't managed to set up approval voting tests. The spread of their
results is too large and can't be narrowed down in a reasonable amount
of time in the warm-up phase.
- The tests start an unconfigurable prometheus endpoint inside, which
causes errors because they use the same 9999 port. I disable it with a
flag, but I think it's better to extract the endpoint launching outside
the test, as we already do with `valgrind` and `pyroscope`. But we still
use `prometheus` inside the tests.
### Future work
* https://github.com/paritytech/polkadot-sdk/issues/3528
* https://github.com/paritytech/polkadot-sdk/issues/3529
* https://github.com/paritytech/polkadot-sdk/issues/3530
* https://github.com/paritytech/polkadot-sdk/issues/3531
---------
Co-authored-by: Alexander Samusev <41779041+alvicsam@users.noreply.github.com>
Lifting some more dependencies to the workspace. Just using the
most-often updated ones for now.
It can be reproduced locally.
```sh
# First you can check if there would be semver incompatible bumps (looks good in this case):
$ zepter transpose dependency lift-to-workspace --ignore-errors syn quote thiserror "regex:^serde.*"
# Then apply the changes:
$ zepter transpose dependency lift-to-workspace --version-resolver=highest syn quote thiserror "regex:^serde.*" --fix
# And format the changes:
$ taplo format --config .config/taplo.toml
```
---------
Signed-off-by: Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io>
Changes (partial https://github.com/paritytech/polkadot-sdk/issues/994):
- Set log to `0.4.20` everywhere
- Lift `log` to the workspace
Starting with a simpler one after seeing
https://github.com/paritytech/polkadot-sdk/pull/2065 from @jsdw.
This sets the `default-features` to `false` in the root and then
overwrites that in each create to its original value. This is necessary
since otherwise the `default` features are additive and its impossible
to disable them in the crate again once they are enabled in the
workspace.
I am using a tool to do this, so its mostly a test to see that it works
as expected.
---------
Signed-off-by: Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io>
This tool makes it easy to run parachain consensus stress/performance
testing on your development machine or in CI.
## Motivation
The parachain consensus node implementation spans across many modules
which we call subsystems. Each subsystem is responsible for a small part
of logic of the parachain consensus pipeline, but in general the most
load and performance issues are localized in just a few core subsystems
like `availability-recovery`, `approval-voting` or
`dispute-coordinator`. In the absence of such a tool, we would run large
test nets to load/stress test these parts of the system. Setting up and
making sense of the amount of data produced by such a large test is very
expensive, hard to orchestrate and is a huge development time sink.
## PR contents
- CLI tool
- Data Availability Read test
- reusable mockups and components needed so far
- Documentation on how to get started
### Data Availability Read test
An overseer is built with using a real `availability-recovery` susbsytem
instance while dependent subsystems like `av-store`, `network-bridge`
and `runtime-api` are mocked. The network bridge will emulate all the
network peers and their answering to requests.
The test is going to be run for a number of blocks. For each block it
will generate send a “RecoverAvailableData” request for an arbitrary
number of candidates. We wait for the subsystem to respond to all
requests before moving to the next block.
At the same time we collect the usual subsystem metrics and task CPU
metrics and show some nice progress reports while running.
### Here is how the CLI looks like:
```
[2023-11-28T13:06:27Z INFO subsystem_bench::core::display] n_validators = 1000, n_cores = 20, pov_size = 5120 - 5120, error = 3, latency = Some(PeerLatency { min_latency: 1ms, max_latency: 100ms })
[2023-11-28T13:06:27Z INFO subsystem-bench::availability] Generating template candidate index=0 pov_size=5242880
[2023-11-28T13:06:27Z INFO subsystem-bench::availability] Created test environment.
[2023-11-28T13:06:27Z INFO subsystem-bench::availability] Pre-generating 60 candidates.
[2023-11-28T13:06:30Z INFO subsystem-bench::core] Initializing network emulation for 1000 peers.
[2023-11-28T13:06:30Z INFO subsystem-bench::availability] Current block 1/3
[2023-11-28T13:06:30Z INFO substrate_prometheus_endpoint] 〽️ Prometheus exporter started at 127.0.0.1:9999
[2023-11-28T13:06:30Z INFO subsystem_bench::availability] 20 recoveries pending
[2023-11-28T13:06:37Z INFO subsystem_bench::availability] Block time 6262ms
[2023-11-28T13:06:37Z INFO subsystem-bench::availability] Sleeping till end of block (0ms)
[2023-11-28T13:06:37Z INFO subsystem-bench::availability] Current block 2/3
[2023-11-28T13:06:37Z INFO subsystem_bench::availability] 20 recoveries pending
[2023-11-28T13:06:43Z INFO subsystem_bench::availability] Block time 6369ms
[2023-11-28T13:06:43Z INFO subsystem-bench::availability] Sleeping till end of block (0ms)
[2023-11-28T13:06:43Z INFO subsystem-bench::availability] Current block 3/3
[2023-11-28T13:06:43Z INFO subsystem_bench::availability] 20 recoveries pending
[2023-11-28T13:06:49Z INFO subsystem_bench::availability] Block time 6194ms
[2023-11-28T13:06:49Z INFO subsystem-bench::availability] Sleeping till end of block (0ms)
[2023-11-28T13:06:49Z INFO subsystem_bench::availability] All blocks processed in 18829ms
[2023-11-28T13:06:49Z INFO subsystem_bench::availability] Throughput: 102400 KiB/block
[2023-11-28T13:06:49Z INFO subsystem_bench::availability] Block time: 6276 ms
[2023-11-28T13:06:49Z INFO subsystem_bench::availability]
Total received from network: 415 MiB
Total sent to network: 724 KiB
Total subsystem CPU usage 24.00s
CPU usage per block 8.00s
Total test environment CPU usage 0.15s
CPU usage per block 0.05s
```
### Prometheus/Grafana stack in action
<img width="1246" alt="Screenshot 2023-11-28 at 15 11 10"
src="https://github.com/paritytech/polkadot-sdk/assets/54316454/eaa47422-4a5e-4a3a-aaef-14ca644c1574">
<img width="1246" alt="Screenshot 2023-11-28 at 15 12 01"
src="https://github.com/paritytech/polkadot-sdk/assets/54316454/237329d6-1710-4c27-8f67-5fb11d7f66ea">
<img width="1246" alt="Screenshot 2023-11-28 at 15 12 38"
src="https://github.com/paritytech/polkadot-sdk/assets/54316454/a07119e8-c9f1-4810-a1b3-f1b7b01cf357">
---------
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
We currently use a bit of a hack in `.cargo/config` to make sure that
clippy isn't too annoying by specifying the list of lints.
There is now a stable way to define lints for a workspace. The only down
side is that every crate seems to have to opt into this so there's a
*few* files modified in this PR.
Dependencies:
- [x] PR that upgrades CI to use rust 1.74 is merged.
---------
Co-authored-by: joe petrowski <25483142+joepetrowski@users.noreply.github.com>
Co-authored-by: Branislav Kontur <bkontur@gmail.com>
Co-authored-by: Liam Aharon <liam.aharon@hotmail.com>
Refactors availability-recovery strategies to allow for easily adding
new hotpaths and failover mechanisms.
The new interface allows for chaining multiple `RecoveryStrategy`-es
together, to cleanly express the relationship between them and share
state and code where neccessary/possible:
This was done in order to aid in implementing new hotpaths like
[systematic chunks
recovery](https://github.com/paritytech/polkadot-sdk/issues/598) and
[fetching from approval
checkers](https://github.com/paritytech/polkadot-sdk/issues/575).
Thanks to this design, intermediate state can be shared between the
strategies. For example, if the systematic chunks recovery retrieved
less than the needed amount of chunks, pass them over to the next
FetchChunks strategy, which will only need to recover the remaining
number of chunks.
Draft example of how a systematic chunk recovery strategy would look:
https://github.com/paritytech/polkadot-sdk/commit/667d870bdf1470525d66c13929d5eac7249dd995
(notice how easy it was to add and reuse code)
Note that this PR doesn't itself add any new strategy, it should fully
preserve backwards compatiblity in terms of functionality. Follow-up PRs
to add new strategies will come.
* rust 1.64 enables workspace properties
* add edition, repository and authors.
* of course, update the version in one place.
Co-authored-by: Andronik <write@reusable.software>
* westend: update transaction version
* polkadot: update transaction version
* kusama: update transaction version
* Bump spec_version to 9330
* bump versions to 0.9.33
* Bump crate versions
* Bump spec_version to 9280 for kusama
* Bump spec_version to 9280 for polkadot
* Bump spec_version to 9280 for rococo
* Bump spec_version to 9280 for westend
* update Cargo.lock
Co-authored-by: parity-processbot <>
* seed commit for fatality based errors
* fatality
* first draft of fatality
* cleanup
* differnt approach
* simplify
* first working version for enums, with documentation
* add split
* fix simple split test case
* extend README.md
* update fatality impl
* make tests passed
* apply fatality to first subsystem
* fatality fixes
* use fatality in a subsystem
* fix subsystemg
* fixup proc macro
* fix/test: log::*! do not execute when log handler is missing
* fix spelling
* rename Runtime2 to something sane
* allow nested split with `forward` annotations
* add free license
* enable and fixup all tests
* use external fatality
Makes this more reviewable.
* bump fatality dep
Avoid duplicate expander compilations.
* migrate availability distribution
* more fatality usage
* chore: bump fatality to 0.0.6
* fixup remaining subsystems
* chore: fmt
* make cargo spellcheck happy
* remove single instance of `#[fatal(false)]`
* last quality sweep
* fixup