Working towards migrating the `parity-bridges-common` repo inside
`polkadot-sdk`. This PR upgrades some dependencies in order to align
them with the versions used in `parity-bridges-common`
Related to
https://github.com/paritytech/parity-bridges-common/issues/2538
This PR aims to channel the backpressure of the PVF host's preparation
and execution queues to the candidate validation subsystem consumers.
Related: #708
This tool makes it easy to run parachain consensus stress/performance
testing on your development machine or in CI.
## Motivation
The parachain consensus node implementation spans across many modules
which we call subsystems. Each subsystem is responsible for a small part
of logic of the parachain consensus pipeline, but in general the most
load and performance issues are localized in just a few core subsystems
like `availability-recovery`, `approval-voting` or
`dispute-coordinator`. In the absence of such a tool, we would run large
test nets to load/stress test these parts of the system. Setting up and
making sense of the amount of data produced by such a large test is very
expensive, hard to orchestrate and is a huge development time sink.
## PR contents
- CLI tool
- Data Availability Read test
- reusable mockups and components needed so far
- Documentation on how to get started
### Data Availability Read test
An overseer is built with using a real `availability-recovery` susbsytem
instance while dependent subsystems like `av-store`, `network-bridge`
and `runtime-api` are mocked. The network bridge will emulate all the
network peers and their answering to requests.
The test is going to be run for a number of blocks. For each block it
will generate send a “RecoverAvailableData” request for an arbitrary
number of candidates. We wait for the subsystem to respond to all
requests before moving to the next block.
At the same time we collect the usual subsystem metrics and task CPU
metrics and show some nice progress reports while running.
### Here is how the CLI looks like:
```
[2023-11-28T13:06:27Z INFO subsystem_bench::core::display] n_validators = 1000, n_cores = 20, pov_size = 5120 - 5120, error = 3, latency = Some(PeerLatency { min_latency: 1ms, max_latency: 100ms })
[2023-11-28T13:06:27Z INFO subsystem-bench::availability] Generating template candidate index=0 pov_size=5242880
[2023-11-28T13:06:27Z INFO subsystem-bench::availability] Created test environment.
[2023-11-28T13:06:27Z INFO subsystem-bench::availability] Pre-generating 60 candidates.
[2023-11-28T13:06:30Z INFO subsystem-bench::core] Initializing network emulation for 1000 peers.
[2023-11-28T13:06:30Z INFO subsystem-bench::availability] Current block 1/3
[2023-11-28T13:06:30Z INFO substrate_prometheus_endpoint] 〽️ Prometheus exporter started at 127.0.0.1:9999
[2023-11-28T13:06:30Z INFO subsystem_bench::availability] 20 recoveries pending
[2023-11-28T13:06:37Z INFO subsystem_bench::availability] Block time 6262ms
[2023-11-28T13:06:37Z INFO subsystem-bench::availability] Sleeping till end of block (0ms)
[2023-11-28T13:06:37Z INFO subsystem-bench::availability] Current block 2/3
[2023-11-28T13:06:37Z INFO subsystem_bench::availability] 20 recoveries pending
[2023-11-28T13:06:43Z INFO subsystem_bench::availability] Block time 6369ms
[2023-11-28T13:06:43Z INFO subsystem-bench::availability] Sleeping till end of block (0ms)
[2023-11-28T13:06:43Z INFO subsystem-bench::availability] Current block 3/3
[2023-11-28T13:06:43Z INFO subsystem_bench::availability] 20 recoveries pending
[2023-11-28T13:06:49Z INFO subsystem_bench::availability] Block time 6194ms
[2023-11-28T13:06:49Z INFO subsystem-bench::availability] Sleeping till end of block (0ms)
[2023-11-28T13:06:49Z INFO subsystem_bench::availability] All blocks processed in 18829ms
[2023-11-28T13:06:49Z INFO subsystem_bench::availability] Throughput: 102400 KiB/block
[2023-11-28T13:06:49Z INFO subsystem_bench::availability] Block time: 6276 ms
[2023-11-28T13:06:49Z INFO subsystem_bench::availability]
Total received from network: 415 MiB
Total sent to network: 724 KiB
Total subsystem CPU usage 24.00s
CPU usage per block 8.00s
Total test environment CPU usage 0.15s
CPU usage per block 0.05s
```
### Prometheus/Grafana stack in action
<img width="1246" alt="Screenshot 2023-11-28 at 15 11 10"
src="https://github.com/paritytech/polkadot-sdk/assets/54316454/eaa47422-4a5e-4a3a-aaef-14ca644c1574">
<img width="1246" alt="Screenshot 2023-11-28 at 15 12 01"
src="https://github.com/paritytech/polkadot-sdk/assets/54316454/237329d6-1710-4c27-8f67-5fb11d7f66ea">
<img width="1246" alt="Screenshot 2023-11-28 at 15 12 38"
src="https://github.com/paritytech/polkadot-sdk/assets/54316454/a07119e8-c9f1-4810-a1b3-f1b7b01cf357">
---------
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
We currently use a bit of a hack in `.cargo/config` to make sure that
clippy isn't too annoying by specifying the list of lints.
There is now a stable way to define lints for a workspace. The only down
side is that every crate seems to have to opt into this so there's a
*few* files modified in this PR.
Dependencies:
- [x] PR that upgrades CI to use rust 1.74 is merged.
---------
Co-authored-by: joe petrowski <25483142+joepetrowski@users.noreply.github.com>
Co-authored-by: Branislav Kontur <bkontur@gmail.com>
Co-authored-by: Liam Aharon <liam.aharon@hotmail.com>
This PR addresses multiple issues pending:
* [x] Update orchestra to the recent version and test how the node
performs
* [x] Add some useful metrics for outbound network bridge
* [x] Try to send incoming network requests to all subsystems without
blocking on some particular subsystem in that loop
* [x] Fix all incompatibilities between orchestra and polkadot code
(e.g. malus node)
* polkadot: propagate UnpinHandle to ActiveLeafUpdate
Also extract the leaf creation for tests
into a common function.
* dispute-coordinator: try pinned blocks for slashin
* apparently 1.72 is smarter than 1.70
* address nits
* rename fresh_leaf to new_leaf
* Rename `polkadot-parachain` to `polkadot-parachain-primitives`
While doing this it also fixes some last `rustdoc` issues and fixes
another Cargo warning related to `pallet-paged-list`.
* Fix compilation
* ".git/.scripts/commands/fmt/fmt.sh"
* Fix XCM docs
---------
Co-authored-by: command-bot <>
* Happy New Year!
* Remove year entierly
Co-authored-by: Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io>
* Remove years from copyright notice in the entire repo
---------
Co-authored-by: Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io>
* Don't send `ActiveLeaves` from leaves in db on startup in Overseer. Wait for fresh leaves instead.
* Don't pass initial set of leaves to Overseer
* Fix compilation error in subsystem-test-helpers
* rust 1.64 enables workspace properties
* add edition, repository and authors.
* of course, update the version in one place.
Co-authored-by: Andronik <write@reusable.software>
* westend: update transaction version
* polkadot: update transaction version
* kusama: update transaction version
* Bump spec_version to 9330
* bump versions to 0.9.33
* Change best effort queue behaviour in `dispute-coordinator`
Use the same type of queue (`BTreeMap<CandidateComparator,
ParticipationRequest>`) for best effort and priority in
`dispute-coordinator`.
Rework `CandidateComparator` to handle unavailable parent
block numbers.
Best effort queue will order disputes the same way as priority does - by
parent's block height. Disputes on candidates for which the parent's
block number can't be obtained will be treated with the lowest priority.
* Fix tests: Handle `ChainApiMessage::BlockNumber` in `handle_sync_queries`
* Some tests are deadlocking on sending messages via overseer so change `SingleItemSink`to `mpsc::Sender` with a buffer of 1
* Fix a race in test after adding a buffered queue for overseer messages
* Fix the rest of the tests
* Guide update - best-effort queue
* Guide update: clarification about spam votes
* Fix tests in `availability-distribution`
* Update comments
* Add `make_buffered_subsystem_context` in `subsystem-test-helpers`
* Code review feedback
* Code review feedback
* Code review feedback
* Don't add best effort candidate if it is already in priority queue
* Remove an old comment
* Fix insert in best_effort
* Bump crate versions
* Bump spec_version to 9280 for kusama
* Bump spec_version to 9280 for polkadot
* Bump spec_version to 9280 for rococo
* Bump spec_version to 9280 for westend
* update Cargo.lock
Co-authored-by: parity-processbot <>
* foo
* rolling session window
* fixup
* remove use statemetn
* fmt
* split NetworkBridge into two subsystems
Pending cleanup
* split
* chore: reexport OrchestraError as OverseerError
* chore: silence warnings
* fixup tests
* chore: add default timenout of 30s to subsystem test helper ctx handle
* single item channel
* fixins
* fmt
* cleanup
* remove dead code
* remove sync bounds again
* wire up shared state
* deal with some FIXMEs
* use distinct tags
Co-authored-by: Andrei Sandu <54316454+sandreim@users.noreply.github.com>
* use tag
Co-authored-by: Andrei Sandu <54316454+sandreim@users.noreply.github.com>
* address naming
tx and rx are common in networking and also have an implicit meaning regarding networking
compared to incoming and outgoing which are already used with subsystems themselvesq
* remove unused sync oracle
* remove unneeded state
* fix tests
* chore: fmt
* do not try to register twice
* leak Metrics type
Co-authored-by: Andrei Sandu <54316454+sandreim@users.noreply.github.com>
Co-authored-by: Andronik <write@reusable.software>
* remove v0 primitives from polkadot-primitives
* first pass: remove v0
* fix fallout in erasure-coding
* remove v1 primitives, consolidate to v2
* the great import update
* update runtime_api_impl_v1 to v2 as well
* guide: add `Version` request for runtime API
* add version query to runtime API
* reintroduce OldV1SessionInfo in a limited way