Runtime release 1.2 includes bumping of the ParachainHost APIs up to
v10, so let's move all the released APIs out of vstaging folder, this PR
does not include any logic changes only renaming of the modules and some
moving around.
Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
Fixes#3826.
The docs on the `candidates` field of `BlockEntry` were incorrectly
stating that they are sorted by core index. The (incorrect) optimization
was introduced in #3747 based on this assumption. The actual ordering is
based on `CandidateIncluded` events ordering in the runtime. We revert
this optimization here.
- [x] verify the underlying issue
- [x] add a regression test
---------
Co-authored-by: Bastian Köcher <git@kchr.de>
**Update:** Pushed additional changes based on the review comments.
**This pull request fixes various spelling mistakes in this
repository.**
Most of the changes are contained in the first **3** commits:
- `Fix spelling mistakes in comments and docs`
- `Fix spelling mistakes in test names`
- `Fix spelling mistakes in error messages, panic messages, logs and
tracing`
Other source code spelling mistakes are separated into individual
commits for easier reviewing:
- `Fix the spelling of 'authority'`
- `Fix the spelling of 'REASONABLE_HEADERS_IN_JUSTIFICATION_ANCESTRY'`
- `Fix the spelling of 'prev_enqueud_messages'`
- `Fix the spelling of 'endpoint'`
- `Fix the spelling of 'children'`
- `Fix the spelling of 'PenpalSiblingSovereignAccount'`
- `Fix the spelling of 'PenpalSudoAccount'`
- `Fix the spelling of 'insufficient'`
- `Fix the spelling of 'PalletXcmExtrinsicsBenchmark'`
- `Fix the spelling of 'subtracted'`
- `Fix the spelling of 'CandidatePendingAvailability'`
- `Fix the spelling of 'exclusive'`
- `Fix the spelling of 'until'`
- `Fix the spelling of 'discriminator'`
- `Fix the spelling of 'nonexistent'`
- `Fix the spelling of 'subsystem'`
- `Fix the spelling of 'indices'`
- `Fix the spelling of 'committed'`
- `Fix the spelling of 'topology'`
- `Fix the spelling of 'response'`
- `Fix the spelling of 'beneficiary'`
- `Fix the spelling of 'formatted'`
- `Fix the spelling of 'UNKNOWN_PROOF_REQUEST'`
- `Fix the spelling of 'succeeded'`
- `Fix the spelling of 'reopened'`
- `Fix the spelling of 'proposer'`
- `Fix the spelling of 'InstantiationNonce'`
- `Fix the spelling of 'depositor'`
- `Fix the spelling of 'expiration'`
- `Fix the spelling of 'phantom'`
- `Fix the spelling of 'AggregatedKeyValue'`
- `Fix the spelling of 'randomness'`
- `Fix the spelling of 'defendant'`
- `Fix the spelling of 'AquaticMammal'`
- `Fix the spelling of 'transactions'`
- `Fix the spelling of 'PassingTracingSubscriber'`
- `Fix the spelling of 'TxSignaturePayload'`
- `Fix the spelling of 'versioning'`
- `Fix the spelling of 'descendant'`
- `Fix the spelling of 'overridden'`
- `Fix the spelling of 'network'`
Let me know if this structure is adequate.
**Note:** The usage of the words `Merkle`, `Merkelize`, `Merklization`,
`Merkelization`, `Merkleization`, is somewhat inconsistent but I left it
as it is.
~~**Note:** In some places the term `Receival` is used to refer to
message reception, IMO `Reception` is the correct word here, but I left
it as it is.~~
~~**Note:** In some places the term `Overlayed` is used instead of the
more acceptable version `Overlaid` but I also left it as it is.~~
~~**Note:** In some places the term `Applyable` is used instead of the
correct version `Applicable` but I also left it as it is.~~
**Note:** Some usage of British vs American english e.g. `judgement` vs
`judgment`, `initialise` vs `initialize`, `optimise` vs `optimize` etc.
are both present in different places, but I suppose that's
understandable given the number of contributors.
~~**Note:** There is a spelling mistake in `.github/CODEOWNERS` but it
triggers errors in CI when I make changes to it, so I left it as it
is.~~
Small refactoring to reduce the algorithmic complexity of the initial
message distribution in approval voting after a sync from O(n_candidates
^ 2) to O(n_candidates).
If approval was in progress we didn't actually restart it, so we end up
in a situation where we distribute our assignment, but we don't
distribute any approval.
---------
Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
## Summary
Built on top of the tooling and ideas introduced in
https://github.com/paritytech/polkadot-sdk/pull/2528, this PR introduces
a synthetic benchmark for measuring and assessing the performance
characteristics of the approval-voting and approval-distribution
subsystems.
Currently this allows, us to simulate the behaviours of these systems
based on the following dimensions:
```
TestConfiguration:
# Test 1
- objective: !ApprovalsTest
last_considered_tranche: 89
min_coalesce: 1
max_coalesce: 6
enable_assignments_v2: true
send_till_tranche: 60
stop_when_approved: false
coalesce_tranche_diff: 12
workdir_prefix: "/tmp"
num_no_shows_per_candidate: 0
approval_distribution_expected_tof: 6.0
approval_distribution_cpu_ms: 3.0
approval_voting_cpu_ms: 4.30
n_validators: 500
n_cores: 100
n_included_candidates: 100
min_pov_size: 1120
max_pov_size: 5120
peer_bandwidth: 524288000000
bandwidth: 524288000000
latency:
min_latency:
secs: 0
nanos: 1000000
max_latency:
secs: 0
nanos: 100000000
error: 0
num_blocks: 10
```
## The approach
1. We build a real overseer with the real implementations for
approval-voting and approval-distribution subsystems.
2. For a given network size, for each validator we pre-computed all
potential assignments and approvals it would send, because this a
computation heavy operation this will be cached on a file on disk and be
re-used if the generation parameters don't change.
3. The messages will be sent accordingly to the configured parameters
and those are split into 3 main benchmarking scenarios.
## Benchmarking scenarios
### Best case scenario *approvals_throughput_best_case.yaml*
It send to the approval-distribution only the minimum required tranche
to gathered the needed_approvals, so that a candidate is approved.
### Behaviour in the presence of no-shows *approvals_no_shows.yaml*
It sends the tranche needed to approve a candidate when we have a
maximum of *num_no_shows_per_candidate* tranches with no-shows for each
candidate.
### Maximum throughput *approvals_throughput.yaml*
It sends all the tranches for each block and measures the used CPU and
necessary network bandwidth. by the approval-voting and
approval-distribution subsystem.
## How to run it
```
cargo run -p polkadot-subsystem-bench --release -- test-sequence --path polkadot/node/subsystem-bench/examples/approvals_throughput.yaml
```
## Evaluating performance
### Use the real subsystems metrics
If you follow the steps in
https://github.com/paritytech/polkadot-sdk/tree/master/polkadot/node/subsystem-bench#install-grafana
for installing locally prometheus and grafana, all real metrics for the
`approval-distribution`, `approval-voting` and overseer are available.
E.g:
<img width="2149" alt="Screenshot 2023-12-05 at 11 07 46"
src="https://github.com/paritytech/polkadot-sdk/assets/49718502/cb8ae2dd-178b-4922-bfa4-dc37e572ed38">
<img width="2551" alt="Screenshot 2023-12-05 at 11 09 42"
src="https://github.com/paritytech/polkadot-sdk/assets/49718502/8b4542ba-88b9-46f9-9b70-cc345366081b">
<img width="2154" alt="Screenshot 2023-12-05 at 11 10 15"
src="https://github.com/paritytech/polkadot-sdk/assets/49718502/b8874d8d-632e-443a-9840-14ad8e90c54f">
<img width="2535" alt="Screenshot 2023-12-05 at 11 10 52"
src="https://github.com/paritytech/polkadot-sdk/assets/49718502/779a439f-fd18-4985-bb80-85d5afad78e2">
### Profile with pyroscope
1. Setup pyroscope following the steps in
https://github.com/paritytech/polkadot-sdk/tree/master/polkadot/node/subsystem-bench#install-pyroscope,
then run any of the benchmark scenario with `--profile` as the
arguments.
2. Open the pyroscope dashboard in grafana, e.g:
<img width="2544" alt="Screenshot 2024-01-09 at 17 09 58"
src="https://github.com/paritytech/polkadot-sdk/assets/49718502/58f50c99-a910-4d20-951a-8b16639303d9">
### Useful logs
1. Network bandwidth requirements:
```
Payload bytes received from peers: 503993 KiB total, 50399 KiB/block
Payload bytes sent to peers: 629971 KiB total, 62997 KiB/block
```
2. Cpu usage by the approval-distribution/approval-voting subsystems.
```
approval-distribution CPU usage 84.061s
approval-distribution CPU usage per block 8.406s
approval-voting CPU usage 96.532s
approval-voting CPU usage per block 9.653s
```
3. Time passed until a given block is approved
```
Chain selection approved after 3500 ms hash=0x0101010101010101010101010101010101010101010101010101010101010101
Chain selection approved after 4500 ms hash=0x0202020202020202020202020202020202020202020202020202020202020202
```
### Using benchmark to quantify improvements from
https://github.com/paritytech/polkadot-sdk/pull/1178 +
https://github.com/paritytech/polkadot-sdk/pull/1191
Using a versi-node we compare the scenarios where all new optimisations
are disabled with a scenarios where tranche0 assignments are sent in a
single message and a conservative simulation where the coalescing of
approvals gives us just 50% reduction in the number of messages we send.
Overall, what we see is a speedup of around 30-40% in the time it takes
to process the necessary messages and a 30-40% reduction in the
necessary bandwidth.
#### Best case scenario comparison(minimum required tranches sent).
Unoptimised
```
Number of blocks: 10
Payload bytes received from peers: 53289 KiB total, 5328 KiB/block
Payload bytes sent to peers: 52489 KiB total, 5248 KiB/block
approval-distribution CPU usage 6.732s
approval-distribution CPU usage per block 0.673s
approval-voting CPU usage 9.523s
approval-voting CPU usage per block 0.952s
```
vs Optimisation enabled
```
Number of blocks: 10
Payload bytes received from peers: 32141 KiB total, 3214 KiB/block
Payload bytes sent to peers: 37314 KiB total, 3731 KiB/block
approval-distribution CPU usage 4.658s
approval-distribution CPU usage per block 0.466s
approval-voting CPU usage 6.236s
approval-voting CPU usage per block 0.624s
```
#### Worst case all tranches sent, very unlikely happens when sharding
breaks.
Unoptimised
```
Number of blocks: 10
Payload bytes received from peers: 746393 KiB total, 74639 KiB/block
Payload bytes sent to peers: 729151 KiB total, 72915 KiB/block
approval-distribution CPU usage 118.681s
approval-distribution CPU usage per block 11.868s
approval-voting CPU usage 124.118s
approval-voting CPU usage per block 12.412s
```
vs optimised
```
Number of blocks: 10
Payload bytes received from peers: 503993 KiB total, 50399 KiB/block
Payload bytes sent to peers: 629971 KiB total, 62997 KiB/block
approval-distribution CPU usage 84.061s
approval-distribution CPU usage per block 8.406s
approval-voting CPU usage 96.532s
approval-voting CPU usage per block 9.653s
```
## TODOs
[x] Polish implementation.
[x] Use what we have so far to evaluate
https://github.com/paritytech/polkadot-sdk/pull/1191 before merging.
[x] List of features and additional dimensions we want to use for
benchmarking.
[x] Run benchmark on hardware similar with versi and kusama nodes.
[ ] Add benchmark to be run in CI for catching regression in
performance.
[ ] Rebase on latest changes for network emulation.
---------
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
Co-authored-by: Andrei Sandu <andrei-mihail@parity.io>
Co-authored-by: Andrei Sandu <54316454+sandreim@users.noreply.github.com>
Initial implementation for the plan discussed here: https://github.com/paritytech/polkadot-sdk/issues/701
Built on top of https://github.com/paritytech/polkadot-sdk/pull/1178
v0: https://github.com/paritytech/polkadot/pull/7554,
## Overall idea
When approval-voting checks a candidate and is ready to advertise the
approval, defer it in a per-relay chain block until we either have
MAX_APPROVAL_COALESCE_COUNT candidates to sign or a candidate has stayed
MAX_APPROVALS_COALESCE_TICKS in the queue, in both cases we sign what
candidates we have available.
This should allow us to reduce the number of approvals messages we have
to create/send/verify. The parameters are configurable, so we should
find some values that balance:
- Security of the network: Delaying broadcasting of an approval
shouldn't but the finality at risk and to make sure that never happens
we won't delay sending a vote if we are past 2/3 from the no-show time.
- Scalability of the network: MAX_APPROVAL_COALESCE_COUNT = 1 &
MAX_APPROVALS_COALESCE_TICKS =0, is what we have now and we know from
the measurements we did on versi, it bottlenecks
approval-distribution/approval-voting when increase significantly the
number of validators and parachains
- Block storage: In case of disputes we have to import this votes on
chain and that increase the necessary storage with
MAX_APPROVAL_COALESCE_COUNT * CandidateHash per vote. Given that
disputes are not the normal way of the network functioning and we will
limit MAX_APPROVAL_COALESCE_COUNT in the single digits numbers, this
should be good enough. Alternatively, we could try to create a better
way to store this on-chain through indirection, if that's needed.
## Other fixes:
- Fixed the fact that we were sending random assignments to
non-validators, that was wrong because those won't do anything with it
and they won't gossip it either because they do not have a grid topology
set, so we would waste the random assignments.
- Added metrics to be able to debug potential no-shows and
mis-processing of approvals/assignments.
## TODO:
- [x] Get feedback, that this is moving in the right direction. @ordian
@sandreim @eskimor @burdges, let me know what you think.
- [x] More and more testing.
- [x] Test in versi.
- [x] Make MAX_APPROVAL_COALESCE_COUNT &
MAX_APPROVAL_COALESCE_WAIT_MILLIS a parachain host configuration.
- [x] Make sure the backwards compatibility works correctly
- [x] Make sure this direction is compatible with other streams of work:
https://github.com/paritytech/polkadot-sdk/issues/635 &
https://github.com/paritytech/polkadot-sdk/issues/742
- [x] Final versi burn-in before merging
---------
Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
Hey guys, as discussed I've changed the name to a more general one
`PvfExecKind`, is this good or too general?
Creating this as a draft, I still have to fix the tests.
Closes#1585
Kusama address: FkB6QEo8VnV3oifugNj5NeVG3Mvq1zFbrUu4P5YwRoe5mQN
---------
Co-authored-by: command-bot <>
Co-authored-by: Marcin S <marcin@realemail.net>
**_PR migrated from https://github.com/paritytech/polkadot/pull/6782_**
This PR will upgrade the network protocol to version 3 -> VStaging which
will later be renamed to V3. This version introduces a new kind of
assignment certificate that will be used for tranche0 assignments.
Instead of issuing/importing one tranche0 assignment per candidate,
there will be just one certificate per relay chain block per validator.
However, we will not be sending out the new assignment certificates,
yet. So everything should work exactly as before. Once the majority of
the validators have been upgraded to the new protocol version we will
enable the new certificates (starting at a specific relay chain block)
with a new client update.
There are still a few things that need to be done:
- [x] Use bitfield instead of Vec<CandidateIndex>:
https://github.com/paritytech/polkadot/pull/6802
- [x] Fix existing approval-distribution and approval-voting tests
- [x] Fix bitfield-distribution and statement-distribution tests
- [x] Fix network bridge tests
- [x] Implement todos in the code
- [x] Add tests to cover new code
- [x] Update metrics
- [x] Remove the approval distribution aggression levels: TBD PR
- [x] Parachains DB migration
- [x] Test network protocol upgrade on Versi
- [x] Versi Load test
- [x] Add Zombienet test
- [x] Documentation updates
- [x] Fix for sending DistributeAssignment for each candidate claimed by
a v2 assignment (warning: Importing locally an already known assignment)
- [x] Fix AcceptedDuplicate
- [x] Fix DB migration so that we can still keep old data.
- [x] Final Versi burn in
---------
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
Co-authored-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
* Remove superflous parameter `overseer_enable_anyways`
We don't need this flag, as we don't need the overseer enabled when the
node isn't a collator or validator.
* Rename `IsCollator` to `IsParachainNode`
`IsParachainNode` is more expressive and also encapsulates the state of
the parachain node being a full node. Some functionality like the
overseer needs to run always when the node runs alongside a parachain
node. The parachain node needs the overseer to e.g. recover PoVs. Other
things like candidate validation or pvf checking are only required for
when the node is running as validator.
* FMT
* Fix CI
* Replace `RollingSessionWindow` with `RuntimeInfo` - initial commit
* Fix tests in import
* Fix the rest of the tests
* Remove dead code
* Fix todos
* Simplify session caching
* Comments for `SessionInfoProvider`
* Separate `SessionInfoProvider` from `State`
* `cache_session_info_for_head` becomes freestanding function
* Remove unneeded `mut` usage
* fn session_info -> fn get_session_info() to avoid name clashes. The function also tries to initialize `SessionInfoProvider`
* Fix SessionInfo retrieval
* Code cleanup
* Don't wrap `SessionInfoProvider` in an `Option`
* Remove `earliest_session()`
* Remove pre-caching -> wip
* Fix some tests and code cleanup
* Fix all tests
* Fixes in tests
* Fix comments, variable names and small style changes
* Fix a warning
* impl From<SessionWindowSize> for NonZeroUsize
* Fix logging for `get_session_info` - remove redundant logs and decrease log level to DEBUG
* Code review feedback
* Storage migration removing `COL_SESSION_WINDOW_DATA` from parachains db
* Remove `col_session_data` usages
* Storage migration clearing columns w/o removing them
* Remove session data column usages from `approval-voting` and `dispute-coordinator` tests
* Add some test cases from `RollingSessionWindow` to `dispute-coordinator` tests
* Fix formatting in initialized.rs
* Fix a corner case in `SessionInfo` caching for `dispute-coordinator`
* Remove `RollingSessionWindow` ;(
* Revert "Fix formatting in initialized.rs"
This reverts commit 0f94664ec9f3a7e3737a30291195990e1e7065fc.
* v2 to v3 migration drops `COL_DISPUTE_COORDINATOR_DATA` instead of clearing it
* Fix `NUM_COLUMNS` in `approval-voting`
* Use `columns::v3::NUM_COLUMNS` when opening db
* Update node/service/src/parachains_db/upgrade.rs
Co-authored-by: Andrei Sandu <54316454+sandreim@users.noreply.github.com>
* Don't write in `COL_DISPUTE_COORDINATOR_DATA` for `test_rocksdb_migrate_2_to_3`
* Fix `NUM+COLUMNS` in approval_voting
* Fix formatting
* Fix columns usage
* Clarification comments about the different db versions
---------
Co-authored-by: Andrei Sandu <54316454+sandreim@users.noreply.github.com>
* Replace `RollingSessionWindow` with `RuntimeInfo` - initial commit
* Fix tests in import
* Fix the rest of the tests
* Remove dead code
* Fix todos
* Simplify session caching
* Comments for `SessionInfoProvider`
* Separate `SessionInfoProvider` from `State`
* `cache_session_info_for_head` becomes freestanding function
* Remove unneeded `mut` usage
* fn session_info -> fn get_session_info() to avoid name clashes. The function also tries to initialize `SessionInfoProvider`
* Fix SessionInfo retrieval
* Code cleanup
* Don't wrap `SessionInfoProvider` in an `Option`
* Remove `earliest_session()`
* Remove pre-caching -> wip
* Fix some tests and code cleanup
* Fix all tests
* Fixes in tests
* Fix comments, variable names and small style changes
* Fix a warning
* impl From<SessionWindowSize> for NonZeroUsize
* Fix logging for `get_session_info` - remove redundant logs and decrease log level to DEBUG
* Code review feedback
* Happy New Year!
* Remove year entierly
Co-authored-by: Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io>
* Remove years from copyright notice in the entire repo
---------
Co-authored-by: Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io>
* Pass the PerLeafSpan as mutable reference to handle_new_head function
* cargo +nightly fmt --all
* Add mock span for test
* cargo +nightly fmt --all
* add new-blocks-hashes to span
* ref span in match statement, set span to disabled if not passed
* remove second match clause, make handle_new_head_span mutable
* cargo +nightly fmt --all
* improve tag on error and warning
* add imported blocks and info span
* cargo +nightly fmt --all
* Improve error for imported_blocks_and_info trace
* format tags on get_header_span
* add lost-to-finality tag
* add missing bracket
* - Add bitfield child span
- Add block db insertion span
* - fix update-bitfield span tag
* - Fix type conversion to u64
- Add missing argument
* - Cargo fmt
* - Test add_follows_from
* - Revert as relationship between spans not working correctly
* - use drop to test if parent-child relationship can be re-established
* - remove bitfield span, check if parent-child relationship can be reestablished
* - Remove dangling bitfield span which is not used, to see if parent-child relationship can be re-established
* Another dangling bitfield span
* cargo fmt
* - add imported blocks and info span
- add candidate span per candidate
* add tags before moving block_header to push scope
* - Add db-insertion span
* cargo fmt
* fix types
* * Pass mutable reference to span in handle_new_head
* Change get-header-span tags in handle_new_head
* Create cache-session-info span in handle_new_head
* Create optional argument in determine_new_blocks
* Pass mutable reference to handle_new_head_span in determine_new_blocks in handle_new_head function
* Add candidate-hash, candidate-number, lost-to-finality tags to candidate_span in handle_new_head function
* Manually drop db_insertion_span and remove superfluous tags to it, only keeping approved-bitfields tag
* Add ApprovalVoting stage in jaeger
* * Pass mutable reference to jaeger::Span in stead of PerLeafSpan
* Add block-import span
* *Pass optional_span (optional argument) to determine_new_blocks util function
* * Add num-candidates int tag to block_import_span
* * Add head tag to cache_session_span
* * Create PerLeafSpan in handle_from_overseer (this is required to establish parent-child relationship between approval-voting span, and leaf-activated root span)
* * Add candidate-import-span as child of block-import-span
* Add candidate-hash and num-approval tags to candidate-import-span
* * Fix num-candidate tag to bitvec-len tag in candidate-import-span
* *Fix imported_blocKs_and_info span to create new-block-span as not dealing with candidates
* Consider the future::select! block
* Use HashMap<Hash, jaeger::PerLeafSpan>
* Remove Stage 9
* Add missing spans
* cargo +nightly fmt --all
* Remove optional span argument for determine_new_blocks
* * Remove no-longer needed default PerLeafSpan implementation
* Remove no-longer necessary mock span given re-factoring of handle_new_head() no longer neeing mutable span
* Split validation-result and request-data (availability and validation code) spans into two by dropping request_validation_data_spans
* Remove drop statements for cache_session_info_span
*
* Remove unnecessary span
* Remove another excessively spammy span
* Add missing spans from State in import tests
* Use functional approach to get spans
* - Add functional approach for the approval-voting span
- Add doc on block_numbers given labelling ambiguity
- Add span pruning logic
- Use .add_para_id on validation_result_span
* Replace for hash_set in hash_set_iter with map closure
* cargo +nightly fmt --all
* Change from unconsumed `map` to `.for_each`
* cargo +nightly fmt --all
* Refactor add_para_id to validation_result_span
* cargo +nightly fmt --all
* Remove duplicate tag
* Add missing tag to handle-approved-ancestor span
* Refactor span pruning to only invoke retain once
* Typo in span name
* - Replace unwrap_or with unwrap_or_else due to lazy evaluation of trace-identifier in polkadot_node_jaeger
- Remove some redundant spans
* Add approval-distribution spans
* - Add unwrap_or_else on note-approved-in-chain-selection
- Use child_with_trace_id to add traceID string tag on span (note this does not change the traceID, but just adds a tag)
* cargo +nightly fmt --all
* - Add traceID tags were necessary in approval-voting and availability-distribution
- Always use block-hash tag in stead of relay-parent tag in approval-distribution
* Remove schedule-wakeup span as it will duplicate spans on existing wakeups (which should be a no-op)
* Remove a couple of warnings related to mutability
* Fix failing tests in availability distribution
* Add traceID tag to launch-approval and validation-result
* Reshuffle the validation and validation result spans to where more appropriate and add block-hash tag
* - Add tranche and should-trigger tag to process-wakeup span
- Add candidate-hash and traceID to check-and-import-approval span
* cargo fmt
* - Adjustments after PR comments
* Move span pruning after other pruning logic
* Remove DerefMut - no longer needed
* Relabel request-chunk spans
* - Fix typo in span label
- Add docs for drops
* Add new approval-voting span pruning logic
* Undo removal of !
* cargo fmt
* Add clippy config and remove .cargo from gitignore
* first fixes
* Clippyfied
* Add clippy CI job
* comment out rusty-cachier
* minor
* fix ci
* remove DAG from check-dependent-project
* add DAG to clippy
Co-authored-by: alvicsam <alvicsam@gmail.com>
Was seeing these warnings when running `cargo check --all`:
```
warning: for loop over an `Option`. This is more readably written as an `if let` statement
--> node/core/approval-voting/src/lib.rs:1147:21
|
1147 | for activated in update.activated {
| ^^^^^^^^^^^^^^^^
|
= note: `#[warn(for_loops_over_fallibles)]` on by default
help: to check pattern in a loop use `while let`
|
1147 | while let Some(activated) = update.activated {
| ~~~~~~~~~~~~~~~ ~~~
help: consider using `if let` to clear intent
|
1147 | if let Some(activated) = update.activated {
| ~~~~~~~~~~~~ ~~~
```
My guess is that `activated` used to be a SmallVec or similar, as is
`deactivated`. It was changed to an `Option`, the `for` still compiled (it's
technically correct, just weird), and the compiler didn't catch it until now.
* Don't import backing statements directly
into the dispute coordinator. This also gets rid of a redundant
signature check. Both should have some impact on backing performance.
In general this PR should make us scale better in the number of parachains.
Reasoning (aka why this is fine):
For the signature check: As mentioned, it is a redundant check. The
signature has already been checked at this point. This is even made
obvious by the used types. The smart constructor is not perfect as
discussed [here](https://github.com/paritytech/polkadot/issues/3455),
but is still a reasonable security.
For not importing to the dispute-coordinator: This should be good as the
dispute coordinator does scrape backing votes from chain. This suffices
in practice as a super majority of validators must have seen a backing
fork in order for a candidate to get included and only included
candidates pose a threat to our system. The import from chain is
preferable over direct import of backing votes for two reasons:
1. The import is batched, greatly improving import performance. All
backing votes for a candidate are imported with a single import.
And indeed we were able to see in metrics that importing votes
from chain is fast.
2. We do less work in general as not every candidate for which
statements are gossiped might actually make it on a chain. The
dispute coordinator as with the current implementation would still
import and keep those votes around for six sessions.
While redundancy is good for reliability in the event of bugs, this also
comes at a non negligible cost. The dispute-coordinator right now is the
subsystem with the highest load, despite the fact that it should not be
doing much during mormal operation and it is only getting worse
with more parachains as the load is a direct function of the number of statements.
We'll see on Versi how much of a performance improvement this PR
* Get rid of dead code.
* Dont send approval vote
* Make it pass CI
* Bring back tests for fixing them later.
* Explicit signature check.
* Resurrect approval-voting tests (not fixed yet)
* Send out approval votes in dispute-distribution.
Use BTreeMap for ordered dispute votes.
* Bring back an important warning.
* Fix approval voting tests.
* Don't send out dispute message on import + test
+ Some cleanup.
* Guide changes.
Note that the introduced complexity is actually redundant.
* WIP: guide changes.
* Finish guide changes about dispute-coordinator
conceputally. Requires more proof read still.
Also removed obsolete implementation details, where the code is better
suited as the source of truth.
* Finish guide changes for now.
* Remove own approval vote import logic.
* Implement logic for retrieving approval-votes
into approval-voting and approval-distribution subsystems.
* Update roadmap/implementers-guide/src/node/disputes/dispute-coordinator.md
Co-authored-by: asynchronous rob <rphmeier@gmail.com>
* Review feedback.
In particular: Add note about disputes of non included candidates.
* Incorporate Review Remarks
* Get rid of superfluous space.
* Tidy up import logic a bit.
Logical vote import is now separated, making the code more readable and
maintainable.
Also: Accept import if there is at least one invalid signer that has not
exceeded its spam slots, instead of requiring all of them to not exceed
their limits. This is more correct and a preparation for vote batching.
* We don't need/have empty imports.
* Fix tests and bugs.
* Remove error prone redundancy.
* Import approval votes on dispute initiated/concluded.
* Add test for approval vote import.
* Make guide checker happy (hopefully)
* Another sanity check + better logs.
* Reasoning about boundedness.
* Use `CandidateIndex` as opposed to `CoreIndex`.
* Remove redundant import.
* Review remarks.
* Add metric for calls to request signatures
* More review remarks.
* Add metric on imported approval votes.
* Include candidate hash in logs.
* More trace log
* Break cycle.
* Add some tracing.
* Cleanup allowed messages.
* fmt
* Tracing + timeout for get inherent data.
* Better error.
* Break cycle in all places.
* Clarified comment some more.
* Typo.
* Break cycle approval-distribution - approval-voting.
Co-authored-by: asynchronous rob <rphmeier@gmail.com>
* Revert approval-voting subsystem
* Approval voting revert encapsulated within 'ops' module
* use 'get_stored_blocks' to get lower block height
* Fix error message
* Optionally shrink/delete stored blocks range
* range end number is last block number plus 1
* Apply code review suggestions
* Use tristate enum for block range in backend overlay
* Add clarification comment
* Add comments to private struct
* gossip-support: be explicit about dimensions
* some guide updates
* update network-bridge to distinguish x and y dimensions
* get everything to compile
* beginnings
* some TODOs
* polkadot runtime: use relevant_authorities
* make gossip topologies per-session
* better formatting
* gossip support: use current session validators
* expand in comment
* adjust tests and fix index bug
* add past/present/future connection test and clean up code
* fmt
* network bridge: updated types
* update protocols to new gossip topology message
* guide updates
* add session to BlockApprovalMeta
* add session to block info
* refactor knowledge and remove most unify logic
* start replacing gossip_peers with new SessionTopologies
* add routing information to message state
* add some utilities to SessionTopology
* implement new gossip topology logic
* re-implement unify_with_peer
* distribute assignments according to topology
* finish grid topology implementation
* refactor network bridge slightly
* issue connection requests on all past/present/future
* fmt
* address grumbles
* tighten invariants in unify_with_peer
* implement random propagation
* refactor: extract required routing adjustment logic
* some block-age logic
* aggressively propagate messages when finality is slow
* overhaul aggression system to have 3 levels
* add aggression metrics
* remove aggression L3
* reduce random circulation
* remove PeerData
* get approval tests compiling
* use btree_map in known_by to make deterministic
* Revert "use btree_map in known_by to make deterministic"
This reverts commit 330d65343a7bb6fe4dd0f24bd8dbc15c0cbdbd9d.
* test XY grid propagation
* remove stray println
* test unshared dimension propagation
* add random gossip check
* test unify_with_peer better
* test sending after getting gossip topology
* test L1 aggression on originator
* test L1 aggression for non-originators
* test non-originator aggression L2
* fnt
* ~spellcheck
* fix statement-distribution tests
* fix flaky test
* fix metrics typo
* re-send periodically
* test resending
* typo
Co-authored-by: Bernhard Schuster <bernhard@ahoi.io>
* add more metrics about apd messages
* add back unify_with_peer logs
* make Resend an enum
* be more explicit when resending
* fmt
* fix error
* add a TODO for refactoring
* remove debug metrics
* add some guide stuff
* fmt
* update runtime API in test-runtim
Co-authored-by: Bernhard Schuster <bernhard@ahoi.io>
* Don't wait for dispute coordinator
in backing and approval-voting - we are single threaded there, so this
is blocking everything.
* Add missing import.
* Don't warn on dropped receiver.
* remove v0 primitives from polkadot-primitives
* first pass: remove v0
* fix fallout in erasure-coding
* remove v1 primitives, consolidate to v2
* the great import update
* update runtime_api_impl_v1 to v2 as well
* guide: add `Version` request for runtime API
* add version query to runtime API
* reintroduce OldV1SessionInfo in a limited way
* parity db subsystem without cache and no splitted column
* fmt
* fix path (auto from parity-db fail)
* lru cache for db column with cache
* Revert "lru cache for db column with cache"
This reverts commit ae177bc5e107a075eff6a21f651218ada6599b74.
* Write_lock mutex
* theoric code for bridges
* revert changes
* Revert bridge changes
* fix spec_version
* update parity db
* test purge-db
* Use specific ordered collection with paritydb.
* Revert "Use specific ordered collection with paritydb."
This reverts commit 8b66d0a4ae914cba1af0f44050d45dd6d9327c6b.
* fix chain selection tests.
* remove patch
* fix auto.
* Remove useless exists directory method
* purge chain without parity-db removal
* spellcheck
* renamings and filtering.
* fix assertion
* format
* update parity-db and fmt
* Auto keep using rocksdb when it exists.
* Revert "Auto keep using rocksdb when it exists."
This reverts commit cea49b32ae590bdce31fed5c45f3c028ae0c7564.
* Update kvdb version.