Commit Graph

50 Commits

Author SHA1 Message Date
s0me0ne-unkn0wn d37a45650e Make candidate validation bounded again (#2125)
This PR aims to channel the backpressure of the PVF host's preparation
and execution queues to the candidate validation subsystem consumers.

Related: #708
2024-01-21 13:56:44 +00:00
Andrei Sandu 8a6e9ef189 Introduce subsystem benchmarking tool (#2528)
This tool makes it easy to run parachain consensus stress/performance
testing on your development machine or in CI.

## Motivation
The parachain consensus node implementation spans across many modules
which we call subsystems. Each subsystem is responsible for a small part
of logic of the parachain consensus pipeline, but in general the most
load and performance issues are localized in just a few core subsystems
like `availability-recovery`, `approval-voting` or
`dispute-coordinator`. In the absence of such a tool, we would run large
test nets to load/stress test these parts of the system. Setting up and
making sense of the amount of data produced by such a large test is very
expensive, hard to orchestrate and is a huge development time sink.

## PR contents
- CLI tool 
- Data Availability Read test
- reusable mockups and components needed so far
- Documentation on how to get started

### Data Availability Read test

An overseer is built with using a real `availability-recovery` susbsytem
instance while dependent subsystems like `av-store`, `network-bridge`
and `runtime-api` are mocked. The network bridge will emulate all the
network peers and their answering to requests.

The test is going to be run for a number of blocks. For each block it
will generate send a “RecoverAvailableData” request for an arbitrary
number of candidates. We wait for the subsystem to respond to all
requests before moving to the next block.
At the same time we collect the usual subsystem metrics and task CPU
metrics and show some nice progress reports while running.

### Here is how the CLI looks like:

```
[2023-11-28T13:06:27Z INFO  subsystem_bench::core::display] n_validators = 1000, n_cores = 20, pov_size = 5120 - 5120, error = 3, latency = Some(PeerLatency { min_latency: 1ms, max_latency: 100ms })
[2023-11-28T13:06:27Z INFO  subsystem-bench::availability] Generating template candidate index=0 pov_size=5242880
[2023-11-28T13:06:27Z INFO  subsystem-bench::availability] Created test environment.
[2023-11-28T13:06:27Z INFO  subsystem-bench::availability] Pre-generating 60 candidates.
[2023-11-28T13:06:30Z INFO  subsystem-bench::core] Initializing network emulation for 1000 peers.
[2023-11-28T13:06:30Z INFO  subsystem-bench::availability] Current block 1/3
[2023-11-28T13:06:30Z INFO  substrate_prometheus_endpoint] 〽️ Prometheus exporter started at 127.0.0.1:9999
[2023-11-28T13:06:30Z INFO  subsystem_bench::availability] 20 recoveries pending
[2023-11-28T13:06:37Z INFO  subsystem_bench::availability] Block time 6262ms
[2023-11-28T13:06:37Z INFO  subsystem-bench::availability] Sleeping till end of block (0ms)
[2023-11-28T13:06:37Z INFO  subsystem-bench::availability] Current block 2/3
[2023-11-28T13:06:37Z INFO  subsystem_bench::availability] 20 recoveries pending
[2023-11-28T13:06:43Z INFO  subsystem_bench::availability] Block time 6369ms
[2023-11-28T13:06:43Z INFO  subsystem-bench::availability] Sleeping till end of block (0ms)
[2023-11-28T13:06:43Z INFO  subsystem-bench::availability] Current block 3/3
[2023-11-28T13:06:43Z INFO  subsystem_bench::availability] 20 recoveries pending
[2023-11-28T13:06:49Z INFO  subsystem_bench::availability] Block time 6194ms
[2023-11-28T13:06:49Z INFO  subsystem-bench::availability] Sleeping till end of block (0ms)
[2023-11-28T13:06:49Z INFO  subsystem_bench::availability] All blocks processed in 18829ms
[2023-11-28T13:06:49Z INFO  subsystem_bench::availability] Throughput: 102400 KiB/block
[2023-11-28T13:06:49Z INFO  subsystem_bench::availability] Block time: 6276 ms
[2023-11-28T13:06:49Z INFO  subsystem_bench::availability] 
    
    Total received from network: 415 MiB
    Total sent to network: 724 KiB
    Total subsystem CPU usage 24.00s
    CPU usage per block 8.00s
    Total test environment CPU usage 0.15s
    CPU usage per block 0.05s
```

### Prometheus/Grafana stack in action
<img width="1246" alt="Screenshot 2023-11-28 at 15 11 10"
src="https://github.com/paritytech/polkadot-sdk/assets/54316454/eaa47422-4a5e-4a3a-aaef-14ca644c1574">
<img width="1246" alt="Screenshot 2023-11-28 at 15 12 01"
src="https://github.com/paritytech/polkadot-sdk/assets/54316454/237329d6-1710-4c27-8f67-5fb11d7f66ea">
<img width="1246" alt="Screenshot 2023-11-28 at 15 12 38"
src="https://github.com/paritytech/polkadot-sdk/assets/54316454/a07119e8-c9f1-4810-a1b3-f1b7b01cf357">

---------

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
2023-12-14 12:57:17 +02:00
ordian 5ca909cc09 polkadot: eradicate LeafStatus (#1565)
Fixes #768.
2023-10-23 16:22:37 +02:00
Vsevolod Stakhov 44dbb73945 Allow to broadcast network messages in parallel (#1409)
This PR addresses multiple issues pending:

* [x] Update orchestra to the recent version and test how the node
performs
* [x] Add some useful metrics for outbound network bridge
* [x] Try to send incoming network requests to all subsystems without
blocking on some particular subsystem in that loop
* [x] Fix all incompatibilities between orchestra and polkadot code
(e.g. malus node)
2023-09-11 20:33:51 +02:00
ordian 15503883e2 polkadot: pin one block per session (#1220)
* polkadot: propagate UnpinHandle to ActiveLeafUpdate

Also extract the leaf creation for tests
into a common function.

* dispute-coordinator: try pinned blocks for slashin

* apparently 1.72 is smarter than 1.70

* address nits

* rename fresh_leaf to new_leaf
2023-09-07 13:24:40 +03:00
Bastian Köcher a33d7922f8 Rename polkadot-parachain to polkadot-parachain-primitives (#1334)
* Rename `polkadot-parachain` to `polkadot-parachain-primitives`

While doing this it also fixes some last `rustdoc` issues and fixes
another Cargo warning related to `pallet-paged-list`.

* Fix compilation

* ".git/.scripts/commands/fmt/fmt.sh"

* Fix XCM docs

---------

Co-authored-by: command-bot <>
2023-08-31 23:53:29 +02:00
Alexander Samusev e49493442a Add CI for monorepo (#1145)
* Add CI for monorepo

* fix frame tests

* Format features

Signed-off-by: Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io>

* add note for skipping tests and disable test-linux-stable-all

* Fix tests and compile issues (#1152)

* Fix feature dependant import

Signed-off-by: Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io>

* Bump test timeout

Signed-off-by: Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io>

* Remove feature gate

Signed-off-by: Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io>

* Add resolver 2

Signed-off-by: Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io>

* Remove old lockfile

Signed-off-by: Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io>

* Format features

Signed-off-by: Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io>

---------

Signed-off-by: Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io>

* Fix check-dependency-rules

Signed-off-by: Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io>

* rm test-runtime

Signed-off-by: Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io>

* Actually fix script

Signed-off-by: Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io>

* enable cargo-check-each-crate-macos

* Run check-each-crate on 6 machines (#1163)

---------

Signed-off-by: Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io>
Co-authored-by: Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io>
2023-08-25 16:35:22 +02:00
Oliver Tale-Yazdi 342d720573 Use same fmt and clippy configs as in Substrate (#7611)
* Use same rustfmt.toml as Substrate

Signed-off-by: Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io>

* format format file

Signed-off-by: Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io>

* Format with new config

Signed-off-by: Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io>

* Add Substrate Clippy config

Signed-off-by: Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io>

* Print Clippy version in CI

Otherwise its difficult to reproduce locally.

Signed-off-by: Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io>

* Make fmt happy

Signed-off-by: Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io>

* Update node/core/pvf/src/error.rs

Co-authored-by: Tsvetomir Dimitrov <tsvetomir@parity.io>

* Update node/core/pvf/src/error.rs

Co-authored-by: Tsvetomir Dimitrov <tsvetomir@parity.io>

---------

Signed-off-by: Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io>
Co-authored-by: Tsvetomir Dimitrov <tsvetomir@parity.io>
2023-08-14 14:29:29 +00:00
s0me0ne-unkn0wn 64660ee8d2 Remove years from copyright notes (#7034)
* Happy New Year!

* Remove year entierly

Co-authored-by: Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io>

* Remove years from copyright notice in the entire repo

---------

Co-authored-by: Oliver Tale-Yazdi <oliver.tale-yazdi@parity.io>
2023-04-08 20:38:35 +00:00
Davide Galassi 260d073658 Companion for #13683 (#6944)
* Companion for #13683

* Wraps trait is not required

* update lockfile for {"substrate"}

---------

Co-authored-by: parity-processbot <>
2023-03-24 14:43:21 +00:00
Davide Galassi 46c36e5a4f [Companion #13615] Keystore overhaul (#6892)
* Remove not required async calls

* Fixed missing renaming

* make_keystore can be sync

* More fixes

* Trivial nitpicks

* Cherry pick test fix from master

* Fixes after master merge

* update lockfile for {"substrate"}

---------

Co-authored-by: parity-processbot <>
2023-03-17 12:09:15 +00:00
Tsvetomir Dimitrov ed6fa5499c Don't send ActiveLeaves from leaves in db on startup in Overseer (#6727)
* Don't send `ActiveLeaves` from leaves in db on startup in Overseer. Wait for fresh leaves instead.

* Don't pass initial set of leaves to Overseer

* Fix compilation error in subsystem-test-helpers
2023-03-06 15:02:16 +00:00
s0me0ne-unkn0wn 1cb1d03c08 Re-export current primitives in crate root (#6487)
* Re-export current primitives in crate root

* Add missing exports

* restart CI
2023-01-11 11:28:12 +00:00
Tsvetomir Dimitrov ccad411e46 Change best effort queue behaviour in dispute-coordinator (#6275)
* Change best effort queue behaviour in `dispute-coordinator`

Use the same type of queue (`BTreeMap<CandidateComparator,
ParticipationRequest>`) for best effort and priority in
`dispute-coordinator`.

Rework `CandidateComparator` to handle unavailable parent
block numbers.

Best effort queue will order disputes the same way as priority does - by
parent's block height. Disputes on candidates for which the parent's
block number can't be obtained will be treated with the lowest priority.

* Fix tests: Handle `ChainApiMessage::BlockNumber` in `handle_sync_queries`

* Some tests are deadlocking on sending messages via overseer so change `SingleItemSink`to `mpsc::Sender` with a buffer of 1

* Fix a race in test after adding a buffered queue for overseer messages

* Fix the rest of the tests

* Guide update - best-effort queue

* Guide update: clarification about spam votes

* Fix tests in `availability-distribution`

* Update comments

* Add `make_buffered_subsystem_context` in `subsystem-test-helpers`

* Code review feedback

* Code review feedback

* Code review feedback

* Don't add best effort candidate if it is already in priority queue

* Remove an old comment

* Fix insert in best_effort
2022-11-17 15:41:19 +00:00
Chris Sosnin e4361b6d80 Fix flaky test (#6131)
* Split test + decrease test timeout

* fmt

* spellcheck
2022-10-10 08:06:44 +02:00
Sebastian Kunert 72bde2889f Introduce async runtime calling trait for runtime-api subsystem (#5782)
* Implement OverseerRuntimeClient

* blockchainevents

* Update patches

* Finish merging rntime-api subsystem

* First version that is able to produce blocks

* Make OverseerRuntimeClient async

* Move overseer notification stream forwarding to cumulus

* Remove unused imports

* Add more logging to collator-protocol

* Lockfile

* Use hashes in OverseerRuntimeClient

* Move OverseerRuntimeClient into extra module

* Fix old session info call and make HeadSupportsParachain async

* Improve naming of trait

* Cleanup

* Remove unused From trait implementation

* Remove unwanted debug print

* Move trait to polkadot-node-subsystem-types

* Add sections to runtime client

Co-authored-by: Davide Galassi <davxy@datawok.net>

* Reorder methods

* Fix spelling

* Fix spacing in Cargo.toml

Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>

* Remove unused babe methods

Co-authored-by: Davide Galassi <davxy@datawok.net>
Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>
2022-07-20 10:23:25 +00:00
Bernhard Schuster 3240cb5e4d split NetworkBridge into two subsystems (#5616)
* foo

* rolling session window

* fixup

* remove use statemetn

* fmt

* split NetworkBridge into two subsystems

Pending cleanup

* split

* chore: reexport OrchestraError as OverseerError

* chore: silence warnings

* fixup tests

* chore: add default timenout of 30s to subsystem test helper ctx handle

* single item channel

* fixins

* fmt

* cleanup

* remove dead code

* remove sync bounds again

* wire up shared state

* deal with some FIXMEs

* use distinct tags

Co-authored-by: Andrei Sandu <54316454+sandreim@users.noreply.github.com>

* use tag

Co-authored-by: Andrei Sandu <54316454+sandreim@users.noreply.github.com>

* address naming

tx and rx are common in networking and also have an implicit meaning regarding networking
compared to incoming and outgoing which are already used with subsystems themselvesq

* remove unused sync oracle

* remove unneeded state

* fix tests

* chore: fmt

* do not try to register twice

* leak Metrics type

Co-authored-by: Andrei Sandu <54316454+sandreim@users.noreply.github.com>
Co-authored-by: Andronik <write@reusable.software>
2022-07-12 16:22:36 +00:00
Bernhard Schuster 450ca2baca overseer becomes orchestra (#5542)
* rename overseer-gen to orchestra

Also drop `gum` and use `tracing`.

* make orchestra compile as standalone

* introduce Spawner trait to split from sp_core

Finalizes the independence of orchestra from polkadot-overseer

* slip of the pen

* other fixins

* remove unused import

* Update node/overseer/orchestra/proc-macro/src/impl_builder.rs

Co-authored-by: Vsevolod Stakhov <vsevolod.stakhov@parity.io>

* Update node/overseer/orchestra/proc-macro/src/impl_builder.rs

Co-authored-by: Vsevolod Stakhov <vsevolod.stakhov@parity.io>

* orchestra everywhere

* leaky data

* Bump scale-info from 2.1.1 to 2.1.2 (#5552)

Bumps [scale-info](https://github.com/paritytech/scale-info) from 2.1.1 to 2.1.2.
- [Release notes](https://github.com/paritytech/scale-info/releases)
- [Changelog](https://github.com/paritytech/scale-info/blob/master/CHANGELOG.md)
- [Commits](https://github.com/paritytech/scale-info/compare/v2.1.1...v2.1.2)

---
updated-dependencies:
- dependency-name: scale-info
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Add missing markdown code block delimiter (#5555)

* bitfield-signing: remove util::jobs usage  (#5523)

* Switch to pooling copy-on-write instantiation strategy for WASM (companion for Substrate#11232) (#5337)

* Switch to pooling copy-on-write instantiation strategy for WASM

* Fix compilation of `polkadot-test-service`

* Update comments

* Move `max_memory_size` to `Semantics`

* Rename `WasmInstantiationStrategy` to `WasmtimeInstantiationStrategy`

* Update a safety comment

* update lockfile for {"substrate"}

Co-authored-by: parity-processbot <>

* Fix build

Co-authored-by: Vsevolod Stakhov <vsevolod.stakhov@parity.io>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Malte Kliemann <mail@maltekliemann.com>
Co-authored-by: Chris Sosnin <48099298+slumber@users.noreply.github.com>
Co-authored-by: Koute <koute@users.noreply.github.com>
2022-05-19 13:42:02 +01:00
Bernhard Schuster 511891dcce refactor+feat: allow subsystems to send only declared messages, generate graphviz (#5314)
Closes #3774
Closes #3826
2022-05-12 17:39:05 +02:00
Robert Habermeier 49f7e5cce4 Finish migration to v2 primitives (#5037)
* remove v0 primitives from polkadot-primitives

* first pass: remove v0

* fix fallout in erasure-coding

* remove v1 primitives, consolidate to v2

* the great import update

* update runtime_api_impl_v1 to v2 as well

* guide: add `Version` request for runtime API

* add version query to runtime API

* reintroduce OldV1SessionInfo in a limited way
2022-03-09 14:01:13 -06:00
sandreim b0f89bbfbc Per subsystem CPU usage tracking (#4239)
* SubsystemContext: add subsystem name str

Signed-off-by: Andrei Sandu <sandu.andrei@gmail.com>

* Overseer builder proc macro changes

* initilize SubsystemContext name field.
* Add subsystem name in TaskKind::launch_task()

Signed-off-by: Andrei Sandu <sandu.andrei@gmail.com>

* Update ToOverseer enum

Signed-off-by: Andrei Sandu <sandu.andrei@gmail.com>

* Assign subsystem names to orphan tasks

Signed-off-by: Andrei Sandu <sandu.andrei@gmail.com>

* cargo fmt

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>

* SubsystemContext: add subsystem name str

Signed-off-by: Andrei Sandu <sandu.andrei@gmail.com>

* Overseer builder proc macro changes

* initilize SubsystemContext name field.
* Add subsystem name in TaskKind::launch_task()

Signed-off-by: Andrei Sandu <sandu.andrei@gmail.com>

* Update ToOverseer enum

Signed-off-by: Andrei Sandu <sandu.andrei@gmail.com>

* Assign subsystem names to orphan tasks

Signed-off-by: Andrei Sandu <sandu.andrei@gmail.com>

* cargo fmt

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>

* Rebase changes for new spawn() group param

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>

* Add subsystem constat in JobTrait

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>

* Add subsystem string

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>

* Fix tests

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>

* Fix spawn() calls

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>

* cargo fmt

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>

* Fix

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>

* Fix tests

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>

* fix

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>

* Fix more tests

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>

* Address PR review feedback #1

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>

* Address PR review round 2

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>

* Fixes
- remove JobTrait::Subsystem
- fix tests

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>

* update Cargo.lock

Co-authored-by: Andronik Ordian <write@reusable.software>
2021-11-11 18:53:37 +00:00
Bernhard Schuster c57a1e7934 remove AllSubsystems and AllSubsystemsGen types (#3874)
* introduce the OverseerConnector, use it

* introduce is_relay_chain to RelayChainSelection

* Update node/service/src/lib.rs

Co-authored-by: Andronik Ordian <write@reusable.software>

* avoid the deferred setting of `is_relay_chain` in `RelayChainSelection`

* positive assertion is not mandated, only the negative one, to avoid a stall

* cleanup: overseer residue

* spellcheck

* fixin

* groundwork to obsolete Overseer::new and AllSubsystemsGen proc-macro

* Now all malus & tests can be ported to the builder pattern.

Obsoletes `Overseer::new`, `AllSubsystemsGen` derive macro, `AllSubsystems`.

* spellcheck

* adjust tests, minor fixes

* remove derive macro AllSubsystemsGen

* add forgotten file dummy.rs

* remove residue

* good news everyone!

* spellcheck

* address review comments

* fixup imports

* make it conditional

* fixup docs

* reduce import

* chore: fmt

* chore: fmt

* chore: spellcheck / nlprules

* fixup malus variant-a

* fmt

* fix

* fixins

* pfmt

* fixins

* chore: fmt

* remove expanded overseer generation

* tracing version

* Update node/network/statement-distribution/src/lib.rs

Co-authored-by: Robert Habermeier <rphmeier@gmail.com>

* use future::ready instead

* silence warning

* chore: fmt

Co-authored-by: Andronik Ordian <write@reusable.software>
Co-authored-by: Robert Habermeier <rphmeier@gmail.com>
2021-09-29 14:24:56 +00:00
Bernhard Schuster c9662531b6 remove connected disconnected state, 3rd attempt (#3898)
* overseer: remove mut in connector

* rename SelectRelayChainWFallback -> SelectRelayChain

* split Basics

* introduce the OverseerConnector, use it

* introduce is_relay_chain to RelayChainSelection

* chore: rename var

* avoid dummy import in subsystem

* actually remove Disconnecte/Connected enum

* extract DummySubsystem into mod dummy.

* Handle::Connected -> Handle::new

* chore: fmt

* fix test

* select relay chain takes no arg, simplification

* fmt

* Update node/service/src/lib.rs

Co-authored-by: Andronik Ordian <write@reusable.software>

* chore: improve malus tests

* avoid the deferred setting of `is_relay_chain` in `RelayChainSelection`

* positive assertion is not mandated, only the negative one, to avoid a stall

* chore: fmt

* assure the `RelayChainSelection` is not used before the overseer is up and running

Co-authored-by: Andronik Ordian <write@reusable.software>
2021-09-28 15:01:04 +02:00
Bernhard Schuster d711673ee2 Revert "remove connected disconnected state only (#3868)" (#3896)
This reverts commit 5f637c510e.
2021-09-20 13:02:36 +00:00
Bernhard Schuster 5f637c510e remove connected disconnected state only (#3868)
* remove connected disconnected state from overseer

* foo

* split new partial

* fix

* refactor init code to not require a `OverseerHandle` when we don't have an overseer

* intermediate

* fixins

* X

* fixup

* foo

* fixup

* docs

* conditional

* Update node/service/src/lib.rs

* review by ladi
2021-09-17 14:39:33 -05:00
Bernhard Schuster cc8b861271 add dispute metrics, some chores (#3842)
* rename: MsgFilter -> MessageInterceptor

* feat: add dispute metrics

* fixup

* test fixins

* fix metrics

* dummysubsystem export and trait fn fix

* chore: fmt

* undo unwanted changes

* foo

* pfmt

* fixup

* fixup

* revert

* some more

* Update node/malus/Cargo.toml

Co-authored-by: Andronik Ordian <write@reusable.software>

* Update node/core/dispute-coordinator/src/metrics.rs

Co-authored-by: Andronik Ordian <write@reusable.software>

* Update node/core/dispute-coordinator/src/metrics.rs

Co-authored-by: Andronik Ordian <write@reusable.software>

* Update node/core/dispute-coordinator/src/metrics.rs

Co-authored-by: Andronik Ordian <write@reusable.software>

* add license header

* fix lockfile

* new with opts

* fmt

* Update node/core/dispute-coordinator/src/metrics.rs

* feature gate

Co-authored-by: Andronik Ordian <write@reusable.software>
2021-09-16 08:19:51 +00:00
Bernhard Schuster 3cc5a1eee9 feat/overseer: introduce closure init (#3775)
* feat/overseer: introduce closure init

Enables removal of the connected/disconnected overseer state.

* feat/overseer: allow replacement logic to access the original

Allows to re-use init-once types, which would otherwise error.

* feat/overseer: introduce external connector

Preparation for removal of `AllSubsystems`
which is another prerequisite for removing
the connect/disconnect state.

* fix/test: replace needs closure

* fixup

* simplify

* mea culpa

* all-subsystems-gen test
2021-09-04 08:07:07 +00:00
Bernhard Schuster bff0ed532f chore: test helper arbitrary ordering for 2 (#3762)
* chore: add arbitrary order macro for more resilient subsystem tests

* move to subsystem-test-helpers
2021-09-01 18:52:59 +02:00
Shawn Tabrizi ff5d56fb76 cargo +nightly fmt (#3540)
* cargo +nightly fmt

* add cargo-fmt check to ci

* update ci

* fmt

* fmt

* skip macro

* ignore bridges
2021-08-02 10:47:33 +00:00
Andronik Ordian bd9b743872 enable disputes (#3478)
* initial integration and migration code

* fix tests

* fix counting test

* assume the current version on missing file

* use SelectRelayChain

* remove duplicate metric

* Update node/service/src/lib.rs

Co-authored-by: Robert Habermeier <rphmeier@gmail.com>

* remove ApprovalCheckingVotingRule

* address my concern

* never mode for StagnantCheckInterval

* REVERTME: some logs

* w00t

* it's ugly but it works

* Revert "REVERTME: some logs"

This reverts commit e210505a2e83e31c381394924500b69277bb042e.

* it's handle, not handler

* fix a few typos

Co-authored-by: Robert Habermeier <rphmeier@gmail.com>
2021-07-26 14:46:31 +02:00
Robert Klotzner b5257b2407 Dispute distribution implementation (#3282)
* Dispute protocol.

* Dispute distribution protocol.

* Get network requests routed.

* WIP: Basic dispute sender logic.

* Basic validator determination logic.

* WIP: Getting things to typecheck.

* Slightly larger timeout.

* More typechecking stuff.

* Cleanup.

* Finished most of the sending logic.

* Handle active leaves updates

- Cleanup dead disputes
- Update sends for new sessions
- Retry on errors

* Pass sessions in already.

* Startup dispute sending.

* Provide incoming decoding facilities

and use them in statement-distribution.

* Relaxed runtime util requirements.

We only need a `SubsystemSender` not a full `SubsystemContext`.

* Better usability of incoming requests.

Make it possible to consume stuff without clones.

* Add basic receiver functionality.

* Cleanup + fixes for sender.

* One more sender fix.

* Start receiver.

* Make sure to send responses back.

* WIP: Exposed authority discovery

* Make tests pass.

* Fully featured receiver.

* Decrease cost of `NotAValidator`.

* Make `RuntimeInfo` LRU cache size configurable.

* Cache more sessions.

* Fix collator protocol.

* Disable metrics for now.

* Make dispute-distribution a proper subsystem.

* Fix naming.

* Code style fixes.

* Factored out 4x copied mock function.

* WIP: Tests.

* Whitespace cleanup.

* Accessor functions.

* More testing.

* More Debug instances.

* Fix busy loop.

* Working tests.

* More tests.

* Cleanup.

* Fix build.

* Basic receiving test.

* Non validator message gets dropped.

* More receiving tests.

* Test nested and subsequent imports.

* Fix spaces.

* Better formatted imports.

* Import cleanup.

* Metrics.

* Message -> MuxedMessage

* Message -> MuxedMessage

* More review remarks.

* Add missing metrics.rs.

* Fix flaky test.

* Dispute coordinator - deliver confirmations.

* Send out `DisputeMessage` on issue local statement.

* Unwire dispute distribution.

* Review remarks.

* Review remarks.

* Better docs.
2021-07-09 04:29:53 +02:00
Bernhard Schuster 3c9104daff refactor overseer into proc-macro based pattern (#2962) 2021-07-08 21:09:26 +02:00
Andronik Ordian 373a545118 make it easier to dbg stalls (#3351)
* make it easier to dbg

* revert channel sizes

* BAnon
2021-07-02 21:09:18 +02:00
Andronik Ordian ffc6f7c731 make ctx.spawn blocking (#3337)
* make spawn sync

* improve error type
2021-06-21 20:43:40 -05:00
Lldenaurois 2abaca3a8c Remove candidate selection (#3148)
* Create validator_side module

* Subsume Candidate Selection

* Add test to ensure candidate backing logic is correct

* Ensure secondings are adequately cleaned up and address test flakyness

* Address Feedback
2021-06-08 14:07:19 -04:00
Robert Habermeier 57038b2e46 Remove real-overseer 🎉 (#2834)
* remove real-overseer

* overseer: only activate leaves which support parachains

* integrate HeadSupportsParachains into service

* remove unneeded line
2021-04-08 20:24:06 +02:00
Robert Habermeier 8ebbe19d10 Split NetworkBridge and break cycles with Unbounded (#2736)
* overseer: pass messages directly between subsystems

* test that message is held on to

* Update node/overseer/src/lib.rs

Co-authored-by: Peter Goodspeed-Niklaus <coriolinus@users.noreply.github.com>

* give every subsystem an unbounded sender too

* remove metered_channel::name

1. we don't provide good names
2. these names are never used anywhere

* unused mut

* remove unnecessary &mut

* subsystem unbounded_send

* remove unused MaybeTimer

We have channel size metrics that serve the same purpose better now and the implementation of message timing was pretty ugly.

* remove comment

* split up senders and receivers

* update metrics

* fix tests

* fix test subsystem context

* use SubsystemSender in jobs system now

* refactor of awful jobs code

* expose public `run` on JobSubsystem

* update candidate backing to new jobs & use unbounded

* bitfield signing

* candidate-selection

* provisioner

* approval voting: send unbounded for assignment/approvals

* async not needed

* begin bridge split

* split up network tasks into background worker

* port over network bridge

* Update node/network/bridge/src/lib.rs

Co-authored-by: Andronik Ordian <write@reusable.software>

* rename ValidationWorkerNotifications

Co-authored-by: Peter Goodspeed-Niklaus <coriolinus@users.noreply.github.com>
Co-authored-by: Andronik Ordian <write@reusable.software>
2021-03-29 01:18:53 +02:00
Robert Habermeier 5952e790fa Overseer: subsystems communicate directly (#2227)
* overseer: pass messages directly between subsystems

* test that message is held on to

* Update node/overseer/src/lib.rs

Co-authored-by: Peter Goodspeed-Niklaus <coriolinus@users.noreply.github.com>

* give every subsystem an unbounded sender too

* remove metered_channel::name

1. we don't provide good names
2. these names are never used anywhere

* unused mut

* remove unnecessary &mut

* subsystem unbounded_send

* remove unused MaybeTimer

We have channel size metrics that serve the same purpose better now and the implementation of message timing was pretty ugly.

* remove comment

* split up senders and receivers

* update metrics

* fix tests

* fix test subsystem context

* fix flaky test

* fix docs

* doc

* use select_biased to favor signals

* Update node/subsystem/src/lib.rs

Co-authored-by: Andronik Ordian <write@reusable.software>

Co-authored-by: Peter Goodspeed-Niklaus <coriolinus@users.noreply.github.com>
Co-authored-by: Andronik Ordian <write@reusable.software>
2021-03-28 15:55:10 +00:00
Robert Klotzner 78ac4b7add Whole subsystem test for new availability-distribution (#2552)
* WIP: Whole subsystem test.

* New tests compile.

* Avoid needless runtime queries for no validator nodes.

* Make tx and rx publicly accessible in virtual overseer.

This simplifies mocking in some cases, as tx can be cloned, but rx can
not.

* Whole subsystem test working.

* Update node/network/availability-distribution/src/session_cache.rs

Co-authored-by: Andronik Ordian <write@reusable.software>

* Update node/network/availability-distribution/src/session_cache.rs

Co-authored-by: Andronik Ordian <write@reusable.software>

* Document better what `None` return value means.

* Get rid of BitVec dependency.

* Update Cargo.lock

* Hopefully fixed implementers guide build.

Co-authored-by: Andronik Ordian <write@reusable.software>
2021-03-03 16:23:15 +01:00
Andronik Ordian 69b103b1d5 overseer: send_msg should not return an error (#1995)
* send_message should not return an error

* Apply suggestions from code review

Co-authored-by: Peter Goodspeed-Niklaus <coriolinus@users.noreply.github.com>

* s/send_logging_error/send_and_log_error

Co-authored-by: Peter Goodspeed-Niklaus <coriolinus@users.noreply.github.com>
2020-11-23 12:42:14 +01:00
Fedor Sakharov 935fcd1666 Change SpawnedSubsystem type to log subsystem errors (#1878)
* Change SpawnedSubsystem type to log subsystem errors

* Remove clone
2020-10-28 20:57:06 +01:00
Bernhard Schuster f345123748 introduce errors with info (#1834) 2020-10-27 08:10:03 +01:00
Bastian Köcher f2d7b6f5ac Make AllSubsystems usage easier in tests (#1794)
* Make `AllSubsystems` usage easier in tests

This makes the usage of `AllSubsystems` easier in tests by introducing
new methods.

- `dummy` initializes `AllSubsystems` with all systems set to dummy
- `replace_*` to replace any subsystem

Besides that this pr adds a `ForwardSubsystem` that is also useful for
tests. This subsystem will forward all incoming messages to the given channel.

* Update node/overseer/src/lib.rs

Co-authored-by: Andronik Ordian <write@reusable.software>

* Update node/subsystem/src/lib.rs

Co-authored-by: Andronik Ordian <write@reusable.software>

* Update node/subsystem/src/lib.rs

Co-authored-by: Andronik Ordian <write@reusable.software>

* Move ForwardSubsystem and add a test

* Break some lines

Co-authored-by: Andronik Ordian <write@reusable.software>
2020-10-08 11:27:19 +02:00
Fedor Sakharov 98660cbd94 Collator protocol subsystem (#1659)
* WIP

* The initial implementation of the collator side.

* Improve comments

* Multiple collation requests

* Add more tests and comments to validator side

* Add comments, remove dead code

* Apply suggestions from code review

Co-authored-by: Peter Goodspeed-Niklaus <coriolinus@users.noreply.github.com>

* Fix build after suggested changes

* Also connect to the next validator group

* Remove a Future impl and move TimeoutExt to util

* Minor nits

* Fix build

* Change FetchCollations back to FetchCollation

* Try this

* Final fixes

* Fix build

Co-authored-by: Peter Goodspeed-Niklaus <coriolinus@users.noreply.github.com>
2020-09-10 16:54:59 +03:00
Andronik Ordian e899a3f844 jobs: don't early exit when there are no jobs (#1621)
* jobs: don't early exit when there are no jobs

* utils: fix merged test

* utils: less verbose

* utils: add an assert subsystem is running

* utils: use TimeoutExt from test-helpers

* test-helpers: use TimeoutExt
2020-08-21 16:45:39 +02:00
Fedor Sakharov cc19f13468 use own timeout in tests instead of smol-timeout (#1618) 2020-08-20 20:27:09 +03:00
Peter Goodspeed-Niklaus 54bec3bfc0 implement collation generation subsystem (#1557)
* start sketching out a collation generation subsystem

* invent a basic strategy for double initialization

* clean up warnings

* impl util requests from runtime assuming a context instead of a FromJob sender

* implement collation generation algorithm from guide

* update AllMessages in tests

* fix trivial review comments

* remove another redundant declaration from merge

* filter availability cores by para_id

* handle new activations each in their own async task

* update guide according to the actual current implementation

* add initialization to guide

* add general-purpose subsystem_test_harness helper

* write first handle_new_activations test

* add test that handle_new_activations filters local_validation_data requests

* add (failing) test of collation distribution message sending

* rustfmt

* broken: work on fixing sender test

Unfortunately, for reasons that are not yet clear, despite the public key
and checked data being identical, the signer is not producing an identical
signature. This commit produces this output (among more):

signing with  Public(c4733ab0bbe3ba4c096685d1737a7f498cdbdd167a767d04a21dc7df12b8c858 (5GWHUNm5...))
checking with Public(c4733ab0bbe3ba4c096685d1737a7f498cdbdd167a767d04a21dc7df12b8c858 (5GWHUNm5...))
signed payload:  [4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 10, 0, 0, 0, c7, e5, c0, 64, 7a, db, fe, 44, 81, e5, 51, 11, 79, 9f, a5, 63, 93, 94, 3c, c4, 36, c6, 30, 36, c2, c5, 44, a2, 1b, db, b7, 82, 3, 17, a, 2e, 75, 97, b7, b7, e3, d8, 4c, 5, 39, 1d, 13, 9a, 62, b1, 57, e7, 87, 86, d8, c0, 82, f2, 9d, cf, 4c, 11, 13, 14]
checked payload: [4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 10, 0, 0, 0, c7, e5, c0, 64, 7a, db, fe, 44, 81, e5, 51, 11, 79, 9f, a5, 63, 93, 94, 3c, c4, 36, c6, 30, 36, c2, c5, 44, a2, 1b, db, b7, 82, 3, 17, a, 2e, 75, 97, b7, b7, e3, d8, 4c, 5, 39, 1d, 13, 9a, 62, b1, 57, e7, 87, 86, d8, c0, 82, f2, 9d, cf, 4c, 11, 13, 14]

* fix broken test

* collation function returns commitments hash

It doesn't look like we use the actual commitments data anywhere, and
it's not obvious if there are any fields of `CandidateCommitments`
not available to the collator, so this commit just assigns them the
entire responsibility of generating the hash.

* add missing overseer impls

* calculating erasure coding is polkadot's responsibility, not cumulus

* concurrentize per-relay_parent requests
2020-08-17 14:27:37 +02:00
Andronik Ordian ab1a513265 Add spawn_blocking to SubsystemContext (#1570)
* subsystem: add spawn_blocking to SubsystemContext

* candidate-validation: use spawn_blocking for exhaustive tasks
2020-08-17 12:26:32 +00:00
Bernhard Schuster 4bdfd02f93 impl availability distribution
Closes #1237
2020-08-10 15:02:30 +02:00
Peter Goodspeed-Niklaus 9fda6cb416 break out subsystem-util and subsystem-test-helpers into individual crates (#1553)
* break out subsystem-util and subsystem-test-helpers into individual crates

* cause all packages to check successfully
2020-08-07 16:51:36 +02:00