https://github.com/paritytech/polkadot-sdk/issues/3130
builds on top of https://github.com/paritytech/polkadot-sdk/pull/3160
Processes the availability cores and builds a record of how many
candidates it should request from prospective-parachains and their
predecessors.
Tries to supply as many candidates as the runtime can back. Note that
the runtime changes to back multiple candidates per para are not yet
done, but this paves the way for it.
The following backing/inclusion policy is assumed:
1. the runtime will never back candidates of the same para which don't
form a chain with the already backed candidates. Even if the others are
still pending availability. We're optimistic that they won't time out
and we don't want to back parachain forks (as the complexity would be
huge).
2. if a candidate is timed out of the core before being included, all of
its successors occupying a core will be evicted.
3. only the candidates which are made available and form a chain
starting from the on-chain para head may be included/enacted and cleared
from the cores. In other words, if para head is at A and the cores are
occupied by B->C->D, and B and D are made available, only B will be
included and its core cleared. C and D will remain on the cores awaiting
for C to be made available or timed out. As point (2) above already
says, if C is timed out, D will also be dropped.
4. The runtime will deduplicate candidates which form a cycle. For
example if the provisioner supplies candidates A->B->A, the runtime will
only back A (as the state output will be the same)
Note that if a candidate is timed out, we don't guarantee that in the
next relay chain block the block author will be able to fill all of the
timed out cores of the para. That increases complexity by a lot.
Instead, the provisioner will supply N candidates where N is the number
of candidates timed out, but doesn't include their successors which will
be also deleted by the runtime. This'll be backfilled in the next relay
chain block.
Adjacent changes:
- Also fixes: https://github.com/paritytech/polkadot-sdk/issues/3141
- For non prospective-parachains, don't supply multiple candidates per
para (we can't have elastic scaling without prospective parachains
enabled). paras_inherent should already sanitise this input but it's
more efficient this way.
Note: all of these changes are backwards-compatible with the
non-elastic-scaling scenario (one core per para).
If approval was in progress we didn't actually restart it, so we end up
in a situation where we distribute our assignment, but we don't
distribute any approval.
---------
Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
Fixes https://github.com/paritytech/polkadot-sdk/issues/3144
Builds on top of https://github.com/paritytech/polkadot-sdk/pull/3229
### Summary
Some preparations for Runtime to support elastic scaling, guarded by
config node features bit `FeatureIndex::ElasticScalingMVP`. This PR
introduces a per-candidate `CoreIndex` but does it in a hacky way to
avoid changing `CandidateCommitments`, `CandidateReceipts` primitives
and networking protocols.
#### Including `CoreIndex` in `BackedCandidate`
If the `ElasticScalingMVP` feature bit is enabled then
`BackedCandidate::validator_indices` is extended by 8 bits.
The value stored in these bits represents the assumed core index for the
candidate.
It is temporary solution which works by creating a mapping from
`BackedCandidate` to `CoreIndex` by assuming the `CoreIndex` can be
discovered by checking in which validator group the validator that
signed the statement is.
TODO:
- [x] fix tests
- [x] add new tests
- [x] Bump runtime API for Kusama, so we have that node features thing!
-> https://github.com/polkadot-fellows/runtimes/pull/194
---------
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: alindima <alin@parity.io>
Co-authored-by: alindima <alin@parity.io>
This introduces a check to ensure that the parachain code matches the
validation code stored in the relay chain state. If not, it will print a
warning. This should be mainly useful for parachain builders to make
sure they have setup everything correctly.
First step in implementing
https://github.com/paritytech/polkadot-sdk/issues/3144
### Summary of changes
- switch statement `Table` candidate mapping from `ParaId` to
`CoreIndex`
- introduce experimental `InjectCoreIndex` node feature.
- determine and assume a `CoreIndex` for a candidate based on statement
validator index. If the signature is valid it means validator controls
the validator that index and we can easily map it to a validator
group/core.
- introduce a temporary provisioner fix until we fully enable elastic
scaling in the subystem. The fix ensures we don't fetch the same
backable candidate when calling `get_backable_candidate` for each core.
TODO:
- [x] fix backing tests
- [x] fix statement table tests
- [x] add new test
---------
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: alindima <alin@parity.io>
Co-authored-by: alindima <alin@parity.io>
Lifting some more dependencies to the workspace. Just using the
most-often updated ones for now.
It can be reproduced locally.
```sh
# First you can check if there would be semver incompatible bumps (looks good in this case):
$ zepter transpose dependency lift-to-workspace --ignore-errors syn quote thiserror "regex:^serde.*"
# Then apply the changes:
$ zepter transpose dependency lift-to-workspace --version-resolver=highest syn quote thiserror "regex:^serde.*" --fix
# And format the changes:
$ taplo format --config .config/taplo.toml
```
---------
Signed-off-by: Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io>
Changes (partial https://github.com/paritytech/polkadot-sdk/issues/994):
- Set log to `0.4.20` everywhere
- Lift `log` to the workspace
Starting with a simpler one after seeing
https://github.com/paritytech/polkadot-sdk/pull/2065 from @jsdw.
This sets the `default-features` to `false` in the root and then
overwrites that in each create to its original value. This is necessary
since otherwise the `default` features are additive and its impossible
to disable them in the crate again once they are enabled in the
workspace.
I am using a tool to do this, so its mostly a test to see that it works
as expected.
---------
Signed-off-by: Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io>
## Summary
Built on top of the tooling and ideas introduced in
https://github.com/paritytech/polkadot-sdk/pull/2528, this PR introduces
a synthetic benchmark for measuring and assessing the performance
characteristics of the approval-voting and approval-distribution
subsystems.
Currently this allows, us to simulate the behaviours of these systems
based on the following dimensions:
```
TestConfiguration:
# Test 1
- objective: !ApprovalsTest
last_considered_tranche: 89
min_coalesce: 1
max_coalesce: 6
enable_assignments_v2: true
send_till_tranche: 60
stop_when_approved: false
coalesce_tranche_diff: 12
workdir_prefix: "/tmp"
num_no_shows_per_candidate: 0
approval_distribution_expected_tof: 6.0
approval_distribution_cpu_ms: 3.0
approval_voting_cpu_ms: 4.30
n_validators: 500
n_cores: 100
n_included_candidates: 100
min_pov_size: 1120
max_pov_size: 5120
peer_bandwidth: 524288000000
bandwidth: 524288000000
latency:
min_latency:
secs: 0
nanos: 1000000
max_latency:
secs: 0
nanos: 100000000
error: 0
num_blocks: 10
```
## The approach
1. We build a real overseer with the real implementations for
approval-voting and approval-distribution subsystems.
2. For a given network size, for each validator we pre-computed all
potential assignments and approvals it would send, because this a
computation heavy operation this will be cached on a file on disk and be
re-used if the generation parameters don't change.
3. The messages will be sent accordingly to the configured parameters
and those are split into 3 main benchmarking scenarios.
## Benchmarking scenarios
### Best case scenario *approvals_throughput_best_case.yaml*
It send to the approval-distribution only the minimum required tranche
to gathered the needed_approvals, so that a candidate is approved.
### Behaviour in the presence of no-shows *approvals_no_shows.yaml*
It sends the tranche needed to approve a candidate when we have a
maximum of *num_no_shows_per_candidate* tranches with no-shows for each
candidate.
### Maximum throughput *approvals_throughput.yaml*
It sends all the tranches for each block and measures the used CPU and
necessary network bandwidth. by the approval-voting and
approval-distribution subsystem.
## How to run it
```
cargo run -p polkadot-subsystem-bench --release -- test-sequence --path polkadot/node/subsystem-bench/examples/approvals_throughput.yaml
```
## Evaluating performance
### Use the real subsystems metrics
If you follow the steps in
https://github.com/paritytech/polkadot-sdk/tree/master/polkadot/node/subsystem-bench#install-grafana
for installing locally prometheus and grafana, all real metrics for the
`approval-distribution`, `approval-voting` and overseer are available.
E.g:
<img width="2149" alt="Screenshot 2023-12-05 at 11 07 46"
src="https://github.com/paritytech/polkadot-sdk/assets/49718502/cb8ae2dd-178b-4922-bfa4-dc37e572ed38">
<img width="2551" alt="Screenshot 2023-12-05 at 11 09 42"
src="https://github.com/paritytech/polkadot-sdk/assets/49718502/8b4542ba-88b9-46f9-9b70-cc345366081b">
<img width="2154" alt="Screenshot 2023-12-05 at 11 10 15"
src="https://github.com/paritytech/polkadot-sdk/assets/49718502/b8874d8d-632e-443a-9840-14ad8e90c54f">
<img width="2535" alt="Screenshot 2023-12-05 at 11 10 52"
src="https://github.com/paritytech/polkadot-sdk/assets/49718502/779a439f-fd18-4985-bb80-85d5afad78e2">
### Profile with pyroscope
1. Setup pyroscope following the steps in
https://github.com/paritytech/polkadot-sdk/tree/master/polkadot/node/subsystem-bench#install-pyroscope,
then run any of the benchmark scenario with `--profile` as the
arguments.
2. Open the pyroscope dashboard in grafana, e.g:
<img width="2544" alt="Screenshot 2024-01-09 at 17 09 58"
src="https://github.com/paritytech/polkadot-sdk/assets/49718502/58f50c99-a910-4d20-951a-8b16639303d9">
### Useful logs
1. Network bandwidth requirements:
```
Payload bytes received from peers: 503993 KiB total, 50399 KiB/block
Payload bytes sent to peers: 629971 KiB total, 62997 KiB/block
```
2. Cpu usage by the approval-distribution/approval-voting subsystems.
```
approval-distribution CPU usage 84.061s
approval-distribution CPU usage per block 8.406s
approval-voting CPU usage 96.532s
approval-voting CPU usage per block 9.653s
```
3. Time passed until a given block is approved
```
Chain selection approved after 3500 ms hash=0x0101010101010101010101010101010101010101010101010101010101010101
Chain selection approved after 4500 ms hash=0x0202020202020202020202020202020202020202020202020202020202020202
```
### Using benchmark to quantify improvements from
https://github.com/paritytech/polkadot-sdk/pull/1178 +
https://github.com/paritytech/polkadot-sdk/pull/1191
Using a versi-node we compare the scenarios where all new optimisations
are disabled with a scenarios where tranche0 assignments are sent in a
single message and a conservative simulation where the coalescing of
approvals gives us just 50% reduction in the number of messages we send.
Overall, what we see is a speedup of around 30-40% in the time it takes
to process the necessary messages and a 30-40% reduction in the
necessary bandwidth.
#### Best case scenario comparison(minimum required tranches sent).
Unoptimised
```
Number of blocks: 10
Payload bytes received from peers: 53289 KiB total, 5328 KiB/block
Payload bytes sent to peers: 52489 KiB total, 5248 KiB/block
approval-distribution CPU usage 6.732s
approval-distribution CPU usage per block 0.673s
approval-voting CPU usage 9.523s
approval-voting CPU usage per block 0.952s
```
vs Optimisation enabled
```
Number of blocks: 10
Payload bytes received from peers: 32141 KiB total, 3214 KiB/block
Payload bytes sent to peers: 37314 KiB total, 3731 KiB/block
approval-distribution CPU usage 4.658s
approval-distribution CPU usage per block 0.466s
approval-voting CPU usage 6.236s
approval-voting CPU usage per block 0.624s
```
#### Worst case all tranches sent, very unlikely happens when sharding
breaks.
Unoptimised
```
Number of blocks: 10
Payload bytes received from peers: 746393 KiB total, 74639 KiB/block
Payload bytes sent to peers: 729151 KiB total, 72915 KiB/block
approval-distribution CPU usage 118.681s
approval-distribution CPU usage per block 11.868s
approval-voting CPU usage 124.118s
approval-voting CPU usage per block 12.412s
```
vs optimised
```
Number of blocks: 10
Payload bytes received from peers: 503993 KiB total, 50399 KiB/block
Payload bytes sent to peers: 629971 KiB total, 62997 KiB/block
approval-distribution CPU usage 84.061s
approval-distribution CPU usage per block 8.406s
approval-voting CPU usage 96.532s
approval-voting CPU usage per block 9.653s
```
## TODOs
[x] Polish implementation.
[x] Use what we have so far to evaluate
https://github.com/paritytech/polkadot-sdk/pull/1191 before merging.
[x] List of features and additional dimensions we want to use for
benchmarking.
[x] Run benchmark on hardware similar with versi and kusama nodes.
[ ] Add benchmark to be run in CI for catching regression in
performance.
[ ] Rebase on latest changes for network emulation.
---------
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
Co-authored-by: Andrei Sandu <andrei-mihail@parity.io>
Co-authored-by: Andrei Sandu <54316454+sandreim@users.noreply.github.com>
This change is mainly for people running the local variants. They can
directly start with async backing.
---------
Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
Co-authored-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
Currently, collators and their alongside nodes spin up a full-scale
overseer running a bunch of subsystems that are not needed if the node
is not a validator. That was considered to be harmless; however, we've
got problems with unused subsystems getting stalled for a reason not
currently known, resulting in the overseer exiting and bringing down the
whole node.
This PR aims to only run needed subsystems on such nodes, replacing the
rest with `DummySubsystem`.
It also enables collator-optimized availability recovery subsystem
implementation.
Partially solves #1730.
Step towards https://github.com/paritytech/polkadot-sdk/issues/1975
As reported
https://github.com/paritytech/polkadot-sdk/issues/1975#issuecomment-1774534225
I'd like to encapsulate crypto related stuff in a dedicated folder.
Currently all cryptographic primitive wrappers are all sparsed in
`substrate/core` which contains "misc core" stuff.
To simplify the process, as the first step with this PR I propose to
move the cryptographic hashing there.
The `substrate/crypto` folder was already created to contains `ec-utils`
crate.
Notes:
- rename `sp-core-hashing` to `sp-crypto-hashing`
- rename `sp-core-hashing-proc-macro` to `sp-crypto-hashing-proc-macro`
- As the crates name is changed I took the freedom to restart fresh from
version 0.1.0 for both crates
---------
Co-authored-by: Robert Hambrock <roberthambrock@gmail.com>
This PR aims to channel the backpressure of the PVF host's preparation
and execution queues to the candidate validation subsystem consumers.
Related: #708
resolve#2157
- [x] fix broken doc links
- [x] fix codec macro typo
https://github.com/paritytech/polkadot-sdk/blob/master/polkadot/node/core/pvf/common/src/error.rs#L81
(see the comment below)
- [x] refactor `ValidationError`, `PrepareError` and related error types
to use `thiserror` crate
## `codec` issue
`codec` macro was mistakenly applied two times to `Kernel` error (so it
was encoded with 10 instead of 11 and the same as `JobDied`). The PR
changes it to 11 because
- it was an initial goal of the code author
- Kernel is less frequent than JobDied so in case of existing error
encoding it is more probable to have 10 as JobDied than Kernel
See https://github.com/paritytech/parity-scale-codec/issues/555
----
polkadot address: 13zCyRG2a1W2ih5SioL8byqmQ6mc8vkgFwQgVzJSdRUUmp46
---------
Co-authored-by: s0me0ne-unkn0wn <48632512+s0me0ne-unkn0wn@users.noreply.github.com>
#1259 was merged into a feature branch, but we've decided to merge
node-side changes for disabling straight into master.
This is a dependency of #1841 and #2637.
---------
Co-authored-by: Tsvetomir Dimitrov <tsvetomir@parity.io>
Fixes a potential memory leak.
`PR_SET_PDEATHSIG` is used to terminate children when the parent dies.
Note that this is subject to a race. There seems to be a raceless
alternative [here](https://stackoverflow.com/a/42498370/6085242), but
the concern is small enough that a bit more complexity doesn't seem
worth it. Left a bit more info in the code comment.
Also fixes: https://github.com/paritytech/polkadot-sdk/issues/1417
- [x] CoreIndex -> AssignmentProvider mapping will be able to change any
time.
- [x] Implement
- [x] Provide Migrations
- [x] Add and fix tests
- [x] Implement bulk assigner logic
- [x] bulk assigner tests
- [x] Port over current assigner to use bulk designer (+ share on-demand
with bulk): top-level assigner has core ranges: legacy, bulk
- [x] Adjust migrations to reflect new assigner structure
- [x] Move migration code to Assignment code directly and make it
recursive (make it possible to skip releases) -> follow up ticket.
- [x] Test migrations
- [x] Add migration PR to runtimes repo -> follow up ticket.
- [x] Wire up with actual UMP messages
- [x] Write PR docs
---------
Co-authored-by: eskimor <eskimor@no-such-url.com>
Co-authored-by: Bradley Olson <34992650+BradleyOlson64@users.noreply.github.com>
Co-authored-by: BradleyOlson64 <lotrftw9@gmail.com>
Co-authored-by: Anton Vilhelm Ásgeirsson <antonva@users.noreply.github.com>
Co-authored-by: antonva <anton.asgeirsson@parity.io>
Co-authored-by: Bastian Köcher <git@kchr.de>
Co-authored-by: Marcin S. <marcin@realemail.net>
Co-authored-by: Bastian Köcher <info@kchr.de>
Co-authored-by: command-bot <>
We currently use a bit of a hack in `.cargo/config` to make sure that
clippy isn't too annoying by specifying the list of lints.
There is now a stable way to define lints for a workspace. The only down
side is that every crate seems to have to opt into this so there's a
*few* files modified in this PR.
Dependencies:
- [x] PR that upgrades CI to use rust 1.74 is merged.
---------
Co-authored-by: joe petrowski <25483142+joepetrowski@users.noreply.github.com>
Co-authored-by: Branislav Kontur <bkontur@gmail.com>
Co-authored-by: Liam Aharon <liam.aharon@hotmail.com>
Initial implementation for the plan discussed here: https://github.com/paritytech/polkadot-sdk/issues/701
Built on top of https://github.com/paritytech/polkadot-sdk/pull/1178
v0: https://github.com/paritytech/polkadot/pull/7554,
## Overall idea
When approval-voting checks a candidate and is ready to advertise the
approval, defer it in a per-relay chain block until we either have
MAX_APPROVAL_COALESCE_COUNT candidates to sign or a candidate has stayed
MAX_APPROVALS_COALESCE_TICKS in the queue, in both cases we sign what
candidates we have available.
This should allow us to reduce the number of approvals messages we have
to create/send/verify. The parameters are configurable, so we should
find some values that balance:
- Security of the network: Delaying broadcasting of an approval
shouldn't but the finality at risk and to make sure that never happens
we won't delay sending a vote if we are past 2/3 from the no-show time.
- Scalability of the network: MAX_APPROVAL_COALESCE_COUNT = 1 &
MAX_APPROVALS_COALESCE_TICKS =0, is what we have now and we know from
the measurements we did on versi, it bottlenecks
approval-distribution/approval-voting when increase significantly the
number of validators and parachains
- Block storage: In case of disputes we have to import this votes on
chain and that increase the necessary storage with
MAX_APPROVAL_COALESCE_COUNT * CandidateHash per vote. Given that
disputes are not the normal way of the network functioning and we will
limit MAX_APPROVAL_COALESCE_COUNT in the single digits numbers, this
should be good enough. Alternatively, we could try to create a better
way to store this on-chain through indirection, if that's needed.
## Other fixes:
- Fixed the fact that we were sending random assignments to
non-validators, that was wrong because those won't do anything with it
and they won't gossip it either because they do not have a grid topology
set, so we would waste the random assignments.
- Added metrics to be able to debug potential no-shows and
mis-processing of approvals/assignments.
## TODO:
- [x] Get feedback, that this is moving in the right direction. @ordian
@sandreim @eskimor @burdges, let me know what you think.
- [x] More and more testing.
- [x] Test in versi.
- [x] Make MAX_APPROVAL_COALESCE_COUNT &
MAX_APPROVAL_COALESCE_WAIT_MILLIS a parachain host configuration.
- [x] Make sure the backwards compatibility works correctly
- [x] Make sure this direction is compatible with other streams of work:
https://github.com/paritytech/polkadot-sdk/issues/635 &
https://github.com/paritytech/polkadot-sdk/issues/742
- [x] Final versi burn-in before merging
---------
Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
Using taplo, fixes all our broken and inconsistent toml formatting and
adds CI to keep them tidy.
If people want we can customise the format rules as described here
https://taplo.tamasfe.dev/configuration/formatter-options.html
@ggwpez, I suggest zepter is used only for checking features are
propagated, and leave formatting for taplo to avoid duplicate work and
conflicts.
TODO
- [x] Use `exclude = [...]` syntax in taplo file to ignore zombienet
tests instead of deleting the dir
---------
Signed-off-by: Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io>
Co-authored-by: Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io>
Co-authored-by: Bastian Köcher <git@kchr.de>
Hey guys, as discussed I've changed the name to a more general one
`PvfExecKind`, is this good or too general?
Creating this as a draft, I still have to fix the tests.
Closes#1585
Kusama address: FkB6QEo8VnV3oifugNj5NeVG3Mvq1zFbrUu4P5YwRoe5mQN
---------
Co-authored-by: command-bot <>
Co-authored-by: Marcin S <marcin@realemail.net>