Lifting some more dependencies to the workspace. Just using the
most-often updated ones for now.
It can be reproduced locally.
```sh
# First you can check if there would be semver incompatible bumps (looks good in this case):
$ zepter transpose dependency lift-to-workspace --ignore-errors syn quote thiserror "regex:^serde.*"
# Then apply the changes:
$ zepter transpose dependency lift-to-workspace --version-resolver=highest syn quote thiserror "regex:^serde.*" --fix
# And format the changes:
$ taplo format --config .config/taplo.toml
```
---------
Signed-off-by: Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io>
Changes (partial https://github.com/paritytech/polkadot-sdk/issues/994):
- Set log to `0.4.20` everywhere
- Lift `log` to the workspace
Starting with a simpler one after seeing
https://github.com/paritytech/polkadot-sdk/pull/2065 from @jsdw.
This sets the `default-features` to `false` in the root and then
overwrites that in each create to its original value. This is necessary
since otherwise the `default` features are additive and its impossible
to disable them in the crate again once they are enabled in the
workspace.
I am using a tool to do this, so its mostly a test to see that it works
as expected.
---------
Signed-off-by: Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io>
## Summary
Built on top of the tooling and ideas introduced in
https://github.com/paritytech/polkadot-sdk/pull/2528, this PR introduces
a synthetic benchmark for measuring and assessing the performance
characteristics of the approval-voting and approval-distribution
subsystems.
Currently this allows, us to simulate the behaviours of these systems
based on the following dimensions:
```
TestConfiguration:
# Test 1
- objective: !ApprovalsTest
last_considered_tranche: 89
min_coalesce: 1
max_coalesce: 6
enable_assignments_v2: true
send_till_tranche: 60
stop_when_approved: false
coalesce_tranche_diff: 12
workdir_prefix: "/tmp"
num_no_shows_per_candidate: 0
approval_distribution_expected_tof: 6.0
approval_distribution_cpu_ms: 3.0
approval_voting_cpu_ms: 4.30
n_validators: 500
n_cores: 100
n_included_candidates: 100
min_pov_size: 1120
max_pov_size: 5120
peer_bandwidth: 524288000000
bandwidth: 524288000000
latency:
min_latency:
secs: 0
nanos: 1000000
max_latency:
secs: 0
nanos: 100000000
error: 0
num_blocks: 10
```
## The approach
1. We build a real overseer with the real implementations for
approval-voting and approval-distribution subsystems.
2. For a given network size, for each validator we pre-computed all
potential assignments and approvals it would send, because this a
computation heavy operation this will be cached on a file on disk and be
re-used if the generation parameters don't change.
3. The messages will be sent accordingly to the configured parameters
and those are split into 3 main benchmarking scenarios.
## Benchmarking scenarios
### Best case scenario *approvals_throughput_best_case.yaml*
It send to the approval-distribution only the minimum required tranche
to gathered the needed_approvals, so that a candidate is approved.
### Behaviour in the presence of no-shows *approvals_no_shows.yaml*
It sends the tranche needed to approve a candidate when we have a
maximum of *num_no_shows_per_candidate* tranches with no-shows for each
candidate.
### Maximum throughput *approvals_throughput.yaml*
It sends all the tranches for each block and measures the used CPU and
necessary network bandwidth. by the approval-voting and
approval-distribution subsystem.
## How to run it
```
cargo run -p polkadot-subsystem-bench --release -- test-sequence --path polkadot/node/subsystem-bench/examples/approvals_throughput.yaml
```
## Evaluating performance
### Use the real subsystems metrics
If you follow the steps in
https://github.com/paritytech/polkadot-sdk/tree/master/polkadot/node/subsystem-bench#install-grafana
for installing locally prometheus and grafana, all real metrics for the
`approval-distribution`, `approval-voting` and overseer are available.
E.g:
<img width="2149" alt="Screenshot 2023-12-05 at 11 07 46"
src="https://github.com/paritytech/polkadot-sdk/assets/49718502/cb8ae2dd-178b-4922-bfa4-dc37e572ed38">
<img width="2551" alt="Screenshot 2023-12-05 at 11 09 42"
src="https://github.com/paritytech/polkadot-sdk/assets/49718502/8b4542ba-88b9-46f9-9b70-cc345366081b">
<img width="2154" alt="Screenshot 2023-12-05 at 11 10 15"
src="https://github.com/paritytech/polkadot-sdk/assets/49718502/b8874d8d-632e-443a-9840-14ad8e90c54f">
<img width="2535" alt="Screenshot 2023-12-05 at 11 10 52"
src="https://github.com/paritytech/polkadot-sdk/assets/49718502/779a439f-fd18-4985-bb80-85d5afad78e2">
### Profile with pyroscope
1. Setup pyroscope following the steps in
https://github.com/paritytech/polkadot-sdk/tree/master/polkadot/node/subsystem-bench#install-pyroscope,
then run any of the benchmark scenario with `--profile` as the
arguments.
2. Open the pyroscope dashboard in grafana, e.g:
<img width="2544" alt="Screenshot 2024-01-09 at 17 09 58"
src="https://github.com/paritytech/polkadot-sdk/assets/49718502/58f50c99-a910-4d20-951a-8b16639303d9">
### Useful logs
1. Network bandwidth requirements:
```
Payload bytes received from peers: 503993 KiB total, 50399 KiB/block
Payload bytes sent to peers: 629971 KiB total, 62997 KiB/block
```
2. Cpu usage by the approval-distribution/approval-voting subsystems.
```
approval-distribution CPU usage 84.061s
approval-distribution CPU usage per block 8.406s
approval-voting CPU usage 96.532s
approval-voting CPU usage per block 9.653s
```
3. Time passed until a given block is approved
```
Chain selection approved after 3500 ms hash=0x0101010101010101010101010101010101010101010101010101010101010101
Chain selection approved after 4500 ms hash=0x0202020202020202020202020202020202020202020202020202020202020202
```
### Using benchmark to quantify improvements from
https://github.com/paritytech/polkadot-sdk/pull/1178 +
https://github.com/paritytech/polkadot-sdk/pull/1191
Using a versi-node we compare the scenarios where all new optimisations
are disabled with a scenarios where tranche0 assignments are sent in a
single message and a conservative simulation where the coalescing of
approvals gives us just 50% reduction in the number of messages we send.
Overall, what we see is a speedup of around 30-40% in the time it takes
to process the necessary messages and a 30-40% reduction in the
necessary bandwidth.
#### Best case scenario comparison(minimum required tranches sent).
Unoptimised
```
Number of blocks: 10
Payload bytes received from peers: 53289 KiB total, 5328 KiB/block
Payload bytes sent to peers: 52489 KiB total, 5248 KiB/block
approval-distribution CPU usage 6.732s
approval-distribution CPU usage per block 0.673s
approval-voting CPU usage 9.523s
approval-voting CPU usage per block 0.952s
```
vs Optimisation enabled
```
Number of blocks: 10
Payload bytes received from peers: 32141 KiB total, 3214 KiB/block
Payload bytes sent to peers: 37314 KiB total, 3731 KiB/block
approval-distribution CPU usage 4.658s
approval-distribution CPU usage per block 0.466s
approval-voting CPU usage 6.236s
approval-voting CPU usage per block 0.624s
```
#### Worst case all tranches sent, very unlikely happens when sharding
breaks.
Unoptimised
```
Number of blocks: 10
Payload bytes received from peers: 746393 KiB total, 74639 KiB/block
Payload bytes sent to peers: 729151 KiB total, 72915 KiB/block
approval-distribution CPU usage 118.681s
approval-distribution CPU usage per block 11.868s
approval-voting CPU usage 124.118s
approval-voting CPU usage per block 12.412s
```
vs optimised
```
Number of blocks: 10
Payload bytes received from peers: 503993 KiB total, 50399 KiB/block
Payload bytes sent to peers: 629971 KiB total, 62997 KiB/block
approval-distribution CPU usage 84.061s
approval-distribution CPU usage per block 8.406s
approval-voting CPU usage 96.532s
approval-voting CPU usage per block 9.653s
```
## TODOs
[x] Polish implementation.
[x] Use what we have so far to evaluate
https://github.com/paritytech/polkadot-sdk/pull/1191 before merging.
[x] List of features and additional dimensions we want to use for
benchmarking.
[x] Run benchmark on hardware similar with versi and kusama nodes.
[ ] Add benchmark to be run in CI for catching regression in
performance.
[ ] Rebase on latest changes for network emulation.
---------
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
Co-authored-by: Andrei Sandu <andrei-mihail@parity.io>
Co-authored-by: Andrei Sandu <54316454+sandreim@users.noreply.github.com>
This change is mainly for people running the local variants. They can
directly start with async backing.
---------
Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
Co-authored-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
Currently, collators and their alongside nodes spin up a full-scale
overseer running a bunch of subsystems that are not needed if the node
is not a validator. That was considered to be harmless; however, we've
got problems with unused subsystems getting stalled for a reason not
currently known, resulting in the overseer exiting and bringing down the
whole node.
This PR aims to only run needed subsystems on such nodes, replacing the
rest with `DummySubsystem`.
It also enables collator-optimized availability recovery subsystem
implementation.
Partially solves #1730.
Step towards https://github.com/paritytech/polkadot-sdk/issues/1975
As reported
https://github.com/paritytech/polkadot-sdk/issues/1975#issuecomment-1774534225
I'd like to encapsulate crypto related stuff in a dedicated folder.
Currently all cryptographic primitive wrappers are all sparsed in
`substrate/core` which contains "misc core" stuff.
To simplify the process, as the first step with this PR I propose to
move the cryptographic hashing there.
The `substrate/crypto` folder was already created to contains `ec-utils`
crate.
Notes:
- rename `sp-core-hashing` to `sp-crypto-hashing`
- rename `sp-core-hashing-proc-macro` to `sp-crypto-hashing-proc-macro`
- As the crates name is changed I took the freedom to restart fresh from
version 0.1.0 for both crates
---------
Co-authored-by: Robert Hambrock <roberthambrock@gmail.com>
This PR aims to channel the backpressure of the PVF host's preparation
and execution queues to the candidate validation subsystem consumers.
Related: #708
resolve#2157
- [x] fix broken doc links
- [x] fix codec macro typo
https://github.com/paritytech/polkadot-sdk/blob/master/polkadot/node/core/pvf/common/src/error.rs#L81
(see the comment below)
- [x] refactor `ValidationError`, `PrepareError` and related error types
to use `thiserror` crate
## `codec` issue
`codec` macro was mistakenly applied two times to `Kernel` error (so it
was encoded with 10 instead of 11 and the same as `JobDied`). The PR
changes it to 11 because
- it was an initial goal of the code author
- Kernel is less frequent than JobDied so in case of existing error
encoding it is more probable to have 10 as JobDied than Kernel
See https://github.com/paritytech/parity-scale-codec/issues/555
----
polkadot address: 13zCyRG2a1W2ih5SioL8byqmQ6mc8vkgFwQgVzJSdRUUmp46
---------
Co-authored-by: s0me0ne-unkn0wn <48632512+s0me0ne-unkn0wn@users.noreply.github.com>
#1259 was merged into a feature branch, but we've decided to merge
node-side changes for disabling straight into master.
This is a dependency of #1841 and #2637.
---------
Co-authored-by: Tsvetomir Dimitrov <tsvetomir@parity.io>
Fixes a potential memory leak.
`PR_SET_PDEATHSIG` is used to terminate children when the parent dies.
Note that this is subject to a race. There seems to be a raceless
alternative [here](https://stackoverflow.com/a/42498370/6085242), but
the concern is small enough that a bit more complexity doesn't seem
worth it. Left a bit more info in the code comment.
Also fixes: https://github.com/paritytech/polkadot-sdk/issues/1417
- [x] CoreIndex -> AssignmentProvider mapping will be able to change any
time.
- [x] Implement
- [x] Provide Migrations
- [x] Add and fix tests
- [x] Implement bulk assigner logic
- [x] bulk assigner tests
- [x] Port over current assigner to use bulk designer (+ share on-demand
with bulk): top-level assigner has core ranges: legacy, bulk
- [x] Adjust migrations to reflect new assigner structure
- [x] Move migration code to Assignment code directly and make it
recursive (make it possible to skip releases) -> follow up ticket.
- [x] Test migrations
- [x] Add migration PR to runtimes repo -> follow up ticket.
- [x] Wire up with actual UMP messages
- [x] Write PR docs
---------
Co-authored-by: eskimor <eskimor@no-such-url.com>
Co-authored-by: Bradley Olson <34992650+BradleyOlson64@users.noreply.github.com>
Co-authored-by: BradleyOlson64 <lotrftw9@gmail.com>
Co-authored-by: Anton Vilhelm Ásgeirsson <antonva@users.noreply.github.com>
Co-authored-by: antonva <anton.asgeirsson@parity.io>
Co-authored-by: Bastian Köcher <git@kchr.de>
Co-authored-by: Marcin S. <marcin@realemail.net>
Co-authored-by: Bastian Köcher <info@kchr.de>
Co-authored-by: command-bot <>
We currently use a bit of a hack in `.cargo/config` to make sure that
clippy isn't too annoying by specifying the list of lints.
There is now a stable way to define lints for a workspace. The only down
side is that every crate seems to have to opt into this so there's a
*few* files modified in this PR.
Dependencies:
- [x] PR that upgrades CI to use rust 1.74 is merged.
---------
Co-authored-by: joe petrowski <25483142+joepetrowski@users.noreply.github.com>
Co-authored-by: Branislav Kontur <bkontur@gmail.com>
Co-authored-by: Liam Aharon <liam.aharon@hotmail.com>
Initial implementation for the plan discussed here: https://github.com/paritytech/polkadot-sdk/issues/701
Built on top of https://github.com/paritytech/polkadot-sdk/pull/1178
v0: https://github.com/paritytech/polkadot/pull/7554,
## Overall idea
When approval-voting checks a candidate and is ready to advertise the
approval, defer it in a per-relay chain block until we either have
MAX_APPROVAL_COALESCE_COUNT candidates to sign or a candidate has stayed
MAX_APPROVALS_COALESCE_TICKS in the queue, in both cases we sign what
candidates we have available.
This should allow us to reduce the number of approvals messages we have
to create/send/verify. The parameters are configurable, so we should
find some values that balance:
- Security of the network: Delaying broadcasting of an approval
shouldn't but the finality at risk and to make sure that never happens
we won't delay sending a vote if we are past 2/3 from the no-show time.
- Scalability of the network: MAX_APPROVAL_COALESCE_COUNT = 1 &
MAX_APPROVALS_COALESCE_TICKS =0, is what we have now and we know from
the measurements we did on versi, it bottlenecks
approval-distribution/approval-voting when increase significantly the
number of validators and parachains
- Block storage: In case of disputes we have to import this votes on
chain and that increase the necessary storage with
MAX_APPROVAL_COALESCE_COUNT * CandidateHash per vote. Given that
disputes are not the normal way of the network functioning and we will
limit MAX_APPROVAL_COALESCE_COUNT in the single digits numbers, this
should be good enough. Alternatively, we could try to create a better
way to store this on-chain through indirection, if that's needed.
## Other fixes:
- Fixed the fact that we were sending random assignments to
non-validators, that was wrong because those won't do anything with it
and they won't gossip it either because they do not have a grid topology
set, so we would waste the random assignments.
- Added metrics to be able to debug potential no-shows and
mis-processing of approvals/assignments.
## TODO:
- [x] Get feedback, that this is moving in the right direction. @ordian
@sandreim @eskimor @burdges, let me know what you think.
- [x] More and more testing.
- [x] Test in versi.
- [x] Make MAX_APPROVAL_COALESCE_COUNT &
MAX_APPROVAL_COALESCE_WAIT_MILLIS a parachain host configuration.
- [x] Make sure the backwards compatibility works correctly
- [x] Make sure this direction is compatible with other streams of work:
https://github.com/paritytech/polkadot-sdk/issues/635 &
https://github.com/paritytech/polkadot-sdk/issues/742
- [x] Final versi burn-in before merging
---------
Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
Using taplo, fixes all our broken and inconsistent toml formatting and
adds CI to keep them tidy.
If people want we can customise the format rules as described here
https://taplo.tamasfe.dev/configuration/formatter-options.html
@ggwpez, I suggest zepter is used only for checking features are
propagated, and leave formatting for taplo to avoid duplicate work and
conflicts.
TODO
- [x] Use `exclude = [...]` syntax in taplo file to ignore zombienet
tests instead of deleting the dir
---------
Signed-off-by: Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io>
Co-authored-by: Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io>
Co-authored-by: Bastian Köcher <git@kchr.de>
Hey guys, as discussed I've changed the name to a more general one
`PvfExecKind`, is this good or too general?
Creating this as a draft, I still have to fix the tests.
Closes#1585
Kusama address: FkB6QEo8VnV3oifugNj5NeVG3Mvq1zFbrUu4P5YwRoe5mQN
---------
Co-authored-by: command-bot <>
Co-authored-by: Marcin S <marcin@realemail.net>
Adds a `NodeFeatures` bitfield value to the runtime `HostConfiguration`,
with the purpose of coordinating the enabling of node-side features,
such as: https://github.com/paritytech/polkadot-sdk/issues/628 and
https://github.com/paritytech/polkadot-sdk/issues/598.
These are features that require all validators enable them at the same
time, assuming all/most nodes have upgraded their node versions.
This PR doesn't add any feature yet. These are coming in future PRs.
Also adds a runtime API for querying the state of the client features
and an extrinsic for setting/unsetting a feature by its index in the bitfield.
Note: originally part of:
https://github.com/paritytech/polkadot-sdk/pull/1644, but posted as
standalone to be reused by other PRs until the initial PR is merged