This tool makes it easy to run parachain consensus stress/performance
testing on your development machine or in CI.
## Motivation
The parachain consensus node implementation spans across many modules
which we call subsystems. Each subsystem is responsible for a small part
of logic of the parachain consensus pipeline, but in general the most
load and performance issues are localized in just a few core subsystems
like `availability-recovery`, `approval-voting` or
`dispute-coordinator`. In the absence of such a tool, we would run large
test nets to load/stress test these parts of the system. Setting up and
making sense of the amount of data produced by such a large test is very
expensive, hard to orchestrate and is a huge development time sink.
## PR contents
- CLI tool
- Data Availability Read test
- reusable mockups and components needed so far
- Documentation on how to get started
### Data Availability Read test
An overseer is built with using a real `availability-recovery` susbsytem
instance while dependent subsystems like `av-store`, `network-bridge`
and `runtime-api` are mocked. The network bridge will emulate all the
network peers and their answering to requests.
The test is going to be run for a number of blocks. For each block it
will generate send a “RecoverAvailableData” request for an arbitrary
number of candidates. We wait for the subsystem to respond to all
requests before moving to the next block.
At the same time we collect the usual subsystem metrics and task CPU
metrics and show some nice progress reports while running.
### Here is how the CLI looks like:
```
[2023-11-28T13:06:27Z INFO subsystem_bench::core::display] n_validators = 1000, n_cores = 20, pov_size = 5120 - 5120, error = 3, latency = Some(PeerLatency { min_latency: 1ms, max_latency: 100ms })
[2023-11-28T13:06:27Z INFO subsystem-bench::availability] Generating template candidate index=0 pov_size=5242880
[2023-11-28T13:06:27Z INFO subsystem-bench::availability] Created test environment.
[2023-11-28T13:06:27Z INFO subsystem-bench::availability] Pre-generating 60 candidates.
[2023-11-28T13:06:30Z INFO subsystem-bench::core] Initializing network emulation for 1000 peers.
[2023-11-28T13:06:30Z INFO subsystem-bench::availability] Current block 1/3
[2023-11-28T13:06:30Z INFO substrate_prometheus_endpoint] 〽️ Prometheus exporter started at 127.0.0.1:9999
[2023-11-28T13:06:30Z INFO subsystem_bench::availability] 20 recoveries pending
[2023-11-28T13:06:37Z INFO subsystem_bench::availability] Block time 6262ms
[2023-11-28T13:06:37Z INFO subsystem-bench::availability] Sleeping till end of block (0ms)
[2023-11-28T13:06:37Z INFO subsystem-bench::availability] Current block 2/3
[2023-11-28T13:06:37Z INFO subsystem_bench::availability] 20 recoveries pending
[2023-11-28T13:06:43Z INFO subsystem_bench::availability] Block time 6369ms
[2023-11-28T13:06:43Z INFO subsystem-bench::availability] Sleeping till end of block (0ms)
[2023-11-28T13:06:43Z INFO subsystem-bench::availability] Current block 3/3
[2023-11-28T13:06:43Z INFO subsystem_bench::availability] 20 recoveries pending
[2023-11-28T13:06:49Z INFO subsystem_bench::availability] Block time 6194ms
[2023-11-28T13:06:49Z INFO subsystem-bench::availability] Sleeping till end of block (0ms)
[2023-11-28T13:06:49Z INFO subsystem_bench::availability] All blocks processed in 18829ms
[2023-11-28T13:06:49Z INFO subsystem_bench::availability] Throughput: 102400 KiB/block
[2023-11-28T13:06:49Z INFO subsystem_bench::availability] Block time: 6276 ms
[2023-11-28T13:06:49Z INFO subsystem_bench::availability]
Total received from network: 415 MiB
Total sent to network: 724 KiB
Total subsystem CPU usage 24.00s
CPU usage per block 8.00s
Total test environment CPU usage 0.15s
CPU usage per block 0.05s
```
### Prometheus/Grafana stack in action
<img width="1246" alt="Screenshot 2023-11-28 at 15 11 10"
src="https://github.com/paritytech/polkadot-sdk/assets/54316454/eaa47422-4a5e-4a3a-aaef-14ca644c1574">
<img width="1246" alt="Screenshot 2023-11-28 at 15 12 01"
src="https://github.com/paritytech/polkadot-sdk/assets/54316454/237329d6-1710-4c27-8f67-5fb11d7f66ea">
<img width="1246" alt="Screenshot 2023-11-28 at 15 12 38"
src="https://github.com/paritytech/polkadot-sdk/assets/54316454/a07119e8-c9f1-4810-a1b3-f1b7b01cf357">
---------
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
We currently use a bit of a hack in `.cargo/config` to make sure that
clippy isn't too annoying by specifying the list of lints.
There is now a stable way to define lints for a workspace. The only down
side is that every crate seems to have to opt into this so there's a
*few* files modified in this PR.
Dependencies:
- [x] PR that upgrades CI to use rust 1.74 is merged.
---------
Co-authored-by: joe petrowski <25483142+joepetrowski@users.noreply.github.com>
Co-authored-by: Branislav Kontur <bkontur@gmail.com>
Co-authored-by: Liam Aharon <liam.aharon@hotmail.com>
Initial implementation for the plan discussed here: https://github.com/paritytech/polkadot-sdk/issues/701
Built on top of https://github.com/paritytech/polkadot-sdk/pull/1178
v0: https://github.com/paritytech/polkadot/pull/7554,
## Overall idea
When approval-voting checks a candidate and is ready to advertise the
approval, defer it in a per-relay chain block until we either have
MAX_APPROVAL_COALESCE_COUNT candidates to sign or a candidate has stayed
MAX_APPROVALS_COALESCE_TICKS in the queue, in both cases we sign what
candidates we have available.
This should allow us to reduce the number of approvals messages we have
to create/send/verify. The parameters are configurable, so we should
find some values that balance:
- Security of the network: Delaying broadcasting of an approval
shouldn't but the finality at risk and to make sure that never happens
we won't delay sending a vote if we are past 2/3 from the no-show time.
- Scalability of the network: MAX_APPROVAL_COALESCE_COUNT = 1 &
MAX_APPROVALS_COALESCE_TICKS =0, is what we have now and we know from
the measurements we did on versi, it bottlenecks
approval-distribution/approval-voting when increase significantly the
number of validators and parachains
- Block storage: In case of disputes we have to import this votes on
chain and that increase the necessary storage with
MAX_APPROVAL_COALESCE_COUNT * CandidateHash per vote. Given that
disputes are not the normal way of the network functioning and we will
limit MAX_APPROVAL_COALESCE_COUNT in the single digits numbers, this
should be good enough. Alternatively, we could try to create a better
way to store this on-chain through indirection, if that's needed.
## Other fixes:
- Fixed the fact that we were sending random assignments to
non-validators, that was wrong because those won't do anything with it
and they won't gossip it either because they do not have a grid topology
set, so we would waste the random assignments.
- Added metrics to be able to debug potential no-shows and
mis-processing of approvals/assignments.
## TODO:
- [x] Get feedback, that this is moving in the right direction. @ordian
@sandreim @eskimor @burdges, let me know what you think.
- [x] More and more testing.
- [x] Test in versi.
- [x] Make MAX_APPROVAL_COALESCE_COUNT &
MAX_APPROVAL_COALESCE_WAIT_MILLIS a parachain host configuration.
- [x] Make sure the backwards compatibility works correctly
- [x] Make sure this direction is compatible with other streams of work:
https://github.com/paritytech/polkadot-sdk/issues/635 &
https://github.com/paritytech/polkadot-sdk/issues/742
- [x] Final versi burn-in before merging
---------
Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
This PR introduces a script and some templates to use the prdoc involved
in a release and build:
- the changelog
- a simple draft of audience documentation
Since the prdoc presence was enforced in the middle of the version
1.5.0, not all PRs did come with a `prdoc` file.
This PR creates all the missing `prdoc` files with some minimum content
allowing to properly generate the changelog.
The generated content is **not** suitable for the audience
documentation.
The audience documentation will be possible with the next version, when
all PR come with a proper `prdoc`.
## Assumptions
- the prdoc files for release `vX.Y.Z` have been moved under
`prdoc/X.Y.Z`
- the changelog requires for now for the prdoc files to contain author +
topic. Thos fields are optional.
The build script can be called as:
```
VERSION=X.Y.Z ./scripts/release/build-changelogs.sh
```
Related:
- #1408
---------
Co-authored-by: EgorPopelyaev <egor@parity.io>
Bumps [proc-macro-crate](https://github.com/bkchr/proc-macro-crate) from
1.3.1 to 2.0.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/bkchr/proc-macro-crate/releases">proc-macro-crate's
releases</a>.</em></p>
<blockquote>
<h2>v2.0.0</h2>
<h2>What's Changed</h2>
<ul>
<li>Release 2.0.0 by <a
href="https://github.com/bkchr"><code>@bkchr</code></a> in <a
href="https://redirect.github.com/bkchr/proc-macro-crate/pull/39">bkchr/proc-macro-crate#39</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/bkchr/proc-macro-crate/compare/v1.3.1...v2.0.0">https://github.com/bkchr/proc-macro-crate/compare/v1.3.1...v2.0.0</a></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/bkchr/proc-macro-crate/commit/cc100ed2bc1ac9bb81812fd435a43e2ef023f355"><code>cc100ed</code></a>
Merge pull request <a
href="https://redirect.github.com/bkchr/proc-macro-crate/issues/39">#39</a>
from bkchr/bkchr-2.0.0</li>
<li><a
href="https://github.com/bkchr/proc-macro-crate/commit/39a7c1844fc4d73ef397a7e8d9419f1f3381aa44"><code>39a7c18</code></a>
Release 2.0.0</li>
<li>See full diff in <a
href="https://github.com/bkchr/proc-macro-crate/compare/v1.3.1...v2.0.0">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore <dependency name> major version` will close this
group update PR and stop Dependabot creating any more for the specific
dependency's major version (unless you unignore this specific
dependency's major version or upgrade to it yourself)
- `@dependabot ignore <dependency name> minor version` will close this
group update PR and stop Dependabot creating any more for the specific
dependency's minor version (unless you unignore this specific
dependency's minor version or upgrade to it yourself)
- `@dependabot ignore <dependency name>` will close this group update PR
and stop Dependabot creating any more for the specific dependency
(unless you unignore this specific dependency or upgrade to it yourself)
- `@dependabot unignore <dependency name>` will remove all of the ignore
conditions of the specified dependency
- `@dependabot unignore <dependency name> <ignore condition>` will
remove the ignore condition of the specified dependency and ignore
conditions
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Using taplo, fixes all our broken and inconsistent toml formatting and
adds CI to keep them tidy.
If people want we can customise the format rules as described here
https://taplo.tamasfe.dev/configuration/formatter-options.html
@ggwpez, I suggest zepter is used only for checking features are
propagated, and leave formatting for taplo to avoid duplicate work and
conflicts.
TODO
- [x] Use `exclude = [...]` syntax in taplo file to ignore zombienet
tests instead of deleting the dir
---------
Signed-off-by: Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io>
Co-authored-by: Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io>
Co-authored-by: Bastian Köcher <git@kchr.de>
In gossip-support, we shuffled the list of authorities using the
`rand::seq::SliceRandom::shuffle`. This function's behavior is
unspecified beyond being `O(n)` and could change in the future, leading
to network issues between nodes using different shuffling algorithms. In
practice, the implementation was a Fisher-Yates shuffle. This PR
replaces the call with a re-implementation of Fisher-Yates and adds a
test to ensure the behavior is the same between the two at the moment.
Good day,
This PR requests the inclusion of two bootnode entries for Polkadot,
Kusama and Westend as part of LuckyFriday's IBP application. The nodes
are hosted on self-owned hardware in a co-located facility. We've
undertaken connectivity tests ourselves and also from members of the
IBP.
The test commands used are as follows:
```
polkadot --chain westend --base-path /tmp/node --name "Boot" --reserved-only --reserved-nodes "/dns/ibp-boot-westend.luckyfriday.io/tcp/30334/wss/p2p/12D3KooWDg1YEytdwFFNWroFj6gio4YFsMB3miSbHKgdpJteUMB9" --no-hardware-benchmarks
polkadot --chain westend --base-path /tmp/node --name "Boot" --reserved-only --reserved-nodes "/dns/ibp-boot-westend.luckyfriday.io/tcp/30333/p2p/12D3KooWDg1YEytdwFFNWroFj6gio4YFsMB3miSbHKgdpJteUMB9" --no-hardware-benchmarks
polkadot --chain kusama --base-path /tmp/node --name "Boot" --reserved-only --reserved-nodes "/dns/ibp-boot-kusama.luckyfriday.io/tcp/30334/wss/p2p/12D3KooW9vu1GWHBuxyhm7rZgD3fhGZpNajPXFexadvhujWMgwfT" --no-hardware-benchmarks
polkadot --chain kusama --base-path /tmp/node --name "Boot" --reserved-only --reserved-nodes "/dns/ibp-boot-kusama.luckyfriday.io/tcp/30333/p2p/12D3KooW9vu1GWHBuxyhm7rZgD3fhGZpNajPXFexadvhujWMgwfT" --no-hardware-benchmarks
polkadot --chain polkadot --base-path /tmp/node --name "Boot" --reserved-only --reserved-nodes "/dns/ibp-boot-polkadot.luckyfriday.io/tcp/30334/wss/p2p/12D3KooWEjk6QXrZJ26fLpaajisJGHiz6WiQsR8k7mkM9GmWKnRZ" --no-hardware-benchmarks
polkadot --chain polkadot --base-path /tmp/node --name "Boot" --reserved-only --reserved-nodes "/dns/ibp-boot-polkadot.luckyfriday.io/tcp/30333/p2p/12D3KooWEjk6QXrZJ26fLpaajisJGHiz6WiQsR8k7mkM9GmWKnRZ" --no-hardware-benchmarks
```
All tests yielded syncing with 1 peer, a positive result. We have also
backed up our node-key in the event that restoration is required.
I hope that the aforementioned meets the requirement for inclusion and
look forward to a speedy turnaround.
Kind Regards,
Will | Paradox
This commit introduces a new concept called `NotificationService` which
allows Polkadot protocols to communicate with the underlying
notification protocol implementation directly, without routing events
through `NetworkWorker`. This implies that each protocol has its own
service which it uses to communicate with remote peers and that each
`NotificationService` is unique with respect to the underlying
notification protocol, meaning `NotificationService` for the transaction
protocol can only be used to send and receive transaction-related
notifications.
The `NotificationService` concept introduces two additional benefits:
* allow protocols to start using custom handshakes
* allow protocols to accept/reject inbound peers
Previously the validation of inbound connections was solely the
responsibility of `ProtocolController`. This caused issues with light
peers and `SyncingEngine` as `ProtocolController` would accept more
peers than `SyncingEngine` could accept which caused peers to have
differing views of their own states. `SyncingEngine` would reject excess
peers but these rejections were not properly communicated to those peers
causing them to assume that they were accepted.
With `NotificationService`, the local handshake is not sent to remote
peer if peer is rejected which allows it to detect that it was rejected.
This commit also deprecates the use of `NetworkEventStream` for all
notification-related events and going forward only DHT events are
provided through `NetworkEventStream`. If protocols wish to follow each
other's events, they must introduce additional abtractions, as is done
for GRANDPA and transactions protocols by following the syncing protocol
through `SyncEventStream`.
Fixes https://github.com/paritytech/polkadot-sdk/issues/512
Fixes https://github.com/paritytech/polkadot-sdk/issues/514
Fixes https://github.com/paritytech/polkadot-sdk/issues/515
Fixes https://github.com/paritytech/polkadot-sdk/issues/554
Fixes https://github.com/paritytech/polkadot-sdk/issues/556
---
These changes are transferred from
https://github.com/paritytech/substrate/pull/14197 but there are no
functional changes compared to that PR
---------
Co-authored-by: Dmitry Markin <dmitry@markin.tech>
Co-authored-by: Alexandru Vasile <60601340+lexnv@users.noreply.github.com>
Currently the polkadot node will backoff from block authoring if
finality starts lagging. This PR disables this mechanism on production
networks (polkadot and kusama) and adds a flags to optionally force
enabling it.
**Overview:**
Adding an extra malus variant focusing on disputing finalized blocks. It
will:
- wrap around approval-voting
- listen to `OverseerSignal::BlockFinalized` and when encountered start
a dispute for the `dispute_offset`th ancestor
- simply pass through all other messages and signals
Add zombienet tests testing various edgecases:
- disputing freshly finalized blocks
- disputing stale finalized blocks
- disputing eagerly pruned finalized blocks (might be separate PR)
**TODO:**
- [x] Register new malus variant
- [x] Simple pass through wrapper (approval-voting)
- [x] Simple network definition
- [x] Listen to block finalizations
- [x] Fetch ancestor hash
- [x] Fetch session index
- [x] Fetch candidate
- [x] Construct and send dispute message
- [x] zndsl test 1 checking that disputes on fresh finalizations resolve
valid Closes#1365
- [x] zndsl test 2 checking that disputes for too old finalized blocks
are not possible Closes#1364
- [ ] zndsl test 3 checking that disputes for candidates with eagerly
pruned relay parent state are handled correctly #1359 (deferred to a
separate PR)
- [x] Unit tests for new malus variant (testing cli etc)
- [x] Clean/streamline error handling
- [ ] ~~Ensure it tests properly on session boundaries~~
---------
Co-authored-by: Javier Viola <javier@parity.io>
Co-authored-by: Marcin S. <marcin@realemail.net>
Co-authored-by: Tsvetomir Dimitrov <tsvetomir@parity.io>
This moves the macro related re-exports to `__private` to make it more
obvious for downstream users that they are using an internal api.
---------
Co-authored-by: command-bot <>
Hey guys, as discussed I've changed the name to a more general one
`PvfExecKind`, is this good or too general?
Creating this as a draft, I still have to fix the tests.
Closes#1585
Kusama address: FkB6QEo8VnV3oifugNj5NeVG3Mvq1zFbrUu4P5YwRoe5mQN
---------
Co-authored-by: command-bot <>
Co-authored-by: Marcin S <marcin@realemail.net>
Adds a `NodeFeatures` bitfield value to the runtime `HostConfiguration`,
with the purpose of coordinating the enabling of node-side features,
such as: https://github.com/paritytech/polkadot-sdk/issues/628 and
https://github.com/paritytech/polkadot-sdk/issues/598.
These are features that require all validators enable them at the same
time, assuming all/most nodes have upgraded their node versions.
This PR doesn't add any feature yet. These are coming in future PRs.
Also adds a runtime API for querying the state of the client features
and an extrinsic for setting/unsetting a feature by its index in the bitfield.
Note: originally part of:
https://github.com/paritytech/polkadot-sdk/pull/1644, but posted as
standalone to be reused by other PRs until the initial PR is merged
Collators were previously reencoding the available data and checking the
erasure root.
Replace that with just checking the PoV hash, which consumes much less
CPU and takes less time.
We also don't need to check the `PersistedValidationData` hash, as
collators don't use it.
Reason:
https://github.com/paritytech/polkadot-sdk/issues/575#issuecomment-1806572230
After systematic chunks recovery is merged, collators will no longer do
any reed-solomon encoding/decoding, which has proven to be a great CPU
consumer.
Signed-off-by: alindima <alin@parity.io>
One for local networks with `fast-runtime` feature activated (1 minute
sessions) and one without the feature activated that will be the default
that runs with 1 hour long sessions.
**_PR migrated from https://github.com/paritytech/polkadot/pull/6782_**
This PR will upgrade the network protocol to version 3 -> VStaging which
will later be renamed to V3. This version introduces a new kind of
assignment certificate that will be used for tranche0 assignments.
Instead of issuing/importing one tranche0 assignment per candidate,
there will be just one certificate per relay chain block per validator.
However, we will not be sending out the new assignment certificates,
yet. So everything should work exactly as before. Once the majority of
the validators have been upgraded to the new protocol version we will
enable the new certificates (starting at a specific relay chain block)
with a new client update.
There are still a few things that need to be done:
- [x] Use bitfield instead of Vec<CandidateIndex>:
https://github.com/paritytech/polkadot/pull/6802
- [x] Fix existing approval-distribution and approval-voting tests
- [x] Fix bitfield-distribution and statement-distribution tests
- [x] Fix network bridge tests
- [x] Implement todos in the code
- [x] Add tests to cover new code
- [x] Update metrics
- [x] Remove the approval distribution aggression levels: TBD PR
- [x] Parachains DB migration
- [x] Test network protocol upgrade on Versi
- [x] Versi Load test
- [x] Add Zombienet test
- [x] Documentation updates
- [x] Fix for sending DistributeAssignment for each candidate claimed by
a v2 assignment (warning: Importing locally an already known assignment)
- [x] Fix AcceptedDuplicate
- [x] Fix DB migration so that we can still keep old data.
- [x] Final Versi burn in
---------
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
Co-authored-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>