Commit Graph

30 Commits

Author SHA1 Message Date
sandreim b0f89bbfbc Per subsystem CPU usage tracking (#4239)
* SubsystemContext: add subsystem name str

Signed-off-by: Andrei Sandu <sandu.andrei@gmail.com>

* Overseer builder proc macro changes

* initilize SubsystemContext name field.
* Add subsystem name in TaskKind::launch_task()

Signed-off-by: Andrei Sandu <sandu.andrei@gmail.com>

* Update ToOverseer enum

Signed-off-by: Andrei Sandu <sandu.andrei@gmail.com>

* Assign subsystem names to orphan tasks

Signed-off-by: Andrei Sandu <sandu.andrei@gmail.com>

* cargo fmt

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>

* SubsystemContext: add subsystem name str

Signed-off-by: Andrei Sandu <sandu.andrei@gmail.com>

* Overseer builder proc macro changes

* initilize SubsystemContext name field.
* Add subsystem name in TaskKind::launch_task()

Signed-off-by: Andrei Sandu <sandu.andrei@gmail.com>

* Update ToOverseer enum

Signed-off-by: Andrei Sandu <sandu.andrei@gmail.com>

* Assign subsystem names to orphan tasks

Signed-off-by: Andrei Sandu <sandu.andrei@gmail.com>

* cargo fmt

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>

* Rebase changes for new spawn() group param

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>

* Add subsystem constat in JobTrait

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>

* Add subsystem string

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>

* Fix tests

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>

* Fix spawn() calls

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>

* cargo fmt

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>

* Fix

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>

* Fix tests

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>

* fix

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>

* Fix more tests

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>

* Address PR review feedback #1

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>

* Address PR review round 2

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>

* Fixes
- remove JobTrait::Subsystem
- fix tests

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>

* update Cargo.lock

Co-authored-by: Andronik Ordian <write@reusable.software>
2021-11-11 18:53:37 +00:00
Bernhard Schuster c57a1e7934 remove AllSubsystems and AllSubsystemsGen types (#3874)
* introduce the OverseerConnector, use it

* introduce is_relay_chain to RelayChainSelection

* Update node/service/src/lib.rs

Co-authored-by: Andronik Ordian <write@reusable.software>

* avoid the deferred setting of `is_relay_chain` in `RelayChainSelection`

* positive assertion is not mandated, only the negative one, to avoid a stall

* cleanup: overseer residue

* spellcheck

* fixin

* groundwork to obsolete Overseer::new and AllSubsystemsGen proc-macro

* Now all malus & tests can be ported to the builder pattern.

Obsoletes `Overseer::new`, `AllSubsystemsGen` derive macro, `AllSubsystems`.

* spellcheck

* adjust tests, minor fixes

* remove derive macro AllSubsystemsGen

* add forgotten file dummy.rs

* remove residue

* good news everyone!

* spellcheck

* address review comments

* fixup imports

* make it conditional

* fixup docs

* reduce import

* chore: fmt

* chore: fmt

* chore: spellcheck / nlprules

* fixup malus variant-a

* fmt

* fix

* fixins

* pfmt

* fixins

* chore: fmt

* remove expanded overseer generation

* tracing version

* Update node/network/statement-distribution/src/lib.rs

Co-authored-by: Robert Habermeier <rphmeier@gmail.com>

* use future::ready instead

* silence warning

* chore: fmt

Co-authored-by: Andronik Ordian <write@reusable.software>
Co-authored-by: Robert Habermeier <rphmeier@gmail.com>
2021-09-29 14:24:56 +00:00
Bernhard Schuster c9662531b6 remove connected disconnected state, 3rd attempt (#3898)
* overseer: remove mut in connector

* rename SelectRelayChainWFallback -> SelectRelayChain

* split Basics

* introduce the OverseerConnector, use it

* introduce is_relay_chain to RelayChainSelection

* chore: rename var

* avoid dummy import in subsystem

* actually remove Disconnecte/Connected enum

* extract DummySubsystem into mod dummy.

* Handle::Connected -> Handle::new

* chore: fmt

* fix test

* select relay chain takes no arg, simplification

* fmt

* Update node/service/src/lib.rs

Co-authored-by: Andronik Ordian <write@reusable.software>

* chore: improve malus tests

* avoid the deferred setting of `is_relay_chain` in `RelayChainSelection`

* positive assertion is not mandated, only the negative one, to avoid a stall

* chore: fmt

* assure the `RelayChainSelection` is not used before the overseer is up and running

Co-authored-by: Andronik Ordian <write@reusable.software>
2021-09-28 15:01:04 +02:00
Bernhard Schuster d711673ee2 Revert "remove connected disconnected state only (#3868)" (#3896)
This reverts commit 5f637c510e.
2021-09-20 13:02:36 +00:00
Bernhard Schuster 5f637c510e remove connected disconnected state only (#3868)
* remove connected disconnected state from overseer

* foo

* split new partial

* fix

* refactor init code to not require a `OverseerHandle` when we don't have an overseer

* intermediate

* fixins

* X

* fixup

* foo

* fixup

* docs

* conditional

* Update node/service/src/lib.rs

* review by ladi
2021-09-17 14:39:33 -05:00
Bernhard Schuster cc8b861271 add dispute metrics, some chores (#3842)
* rename: MsgFilter -> MessageInterceptor

* feat: add dispute metrics

* fixup

* test fixins

* fix metrics

* dummysubsystem export and trait fn fix

* chore: fmt

* undo unwanted changes

* foo

* pfmt

* fixup

* fixup

* revert

* some more

* Update node/malus/Cargo.toml

Co-authored-by: Andronik Ordian <write@reusable.software>

* Update node/core/dispute-coordinator/src/metrics.rs

Co-authored-by: Andronik Ordian <write@reusable.software>

* Update node/core/dispute-coordinator/src/metrics.rs

Co-authored-by: Andronik Ordian <write@reusable.software>

* Update node/core/dispute-coordinator/src/metrics.rs

Co-authored-by: Andronik Ordian <write@reusable.software>

* add license header

* fix lockfile

* new with opts

* fmt

* Update node/core/dispute-coordinator/src/metrics.rs

* feature gate

Co-authored-by: Andronik Ordian <write@reusable.software>
2021-09-16 08:19:51 +00:00
Bernhard Schuster 3cc5a1eee9 feat/overseer: introduce closure init (#3775)
* feat/overseer: introduce closure init

Enables removal of the connected/disconnected overseer state.

* feat/overseer: allow replacement logic to access the original

Allows to re-use init-once types, which would otherwise error.

* feat/overseer: introduce external connector

Preparation for removal of `AllSubsystems`
which is another prerequisite for removing
the connect/disconnect state.

* fix/test: replace needs closure

* fixup

* simplify

* mea culpa

* all-subsystems-gen test
2021-09-04 08:07:07 +00:00
Bernhard Schuster bff0ed532f chore: test helper arbitrary ordering for 2 (#3762)
* chore: add arbitrary order macro for more resilient subsystem tests

* move to subsystem-test-helpers
2021-09-01 18:52:59 +02:00
Shawn Tabrizi ff5d56fb76 cargo +nightly fmt (#3540)
* cargo +nightly fmt

* add cargo-fmt check to ci

* update ci

* fmt

* fmt

* skip macro

* ignore bridges
2021-08-02 10:47:33 +00:00
Andronik Ordian bd9b743872 enable disputes (#3478)
* initial integration and migration code

* fix tests

* fix counting test

* assume the current version on missing file

* use SelectRelayChain

* remove duplicate metric

* Update node/service/src/lib.rs

Co-authored-by: Robert Habermeier <rphmeier@gmail.com>

* remove ApprovalCheckingVotingRule

* address my concern

* never mode for StagnantCheckInterval

* REVERTME: some logs

* w00t

* it's ugly but it works

* Revert "REVERTME: some logs"

This reverts commit e210505a2e83e31c381394924500b69277bb042e.

* it's handle, not handler

* fix a few typos

Co-authored-by: Robert Habermeier <rphmeier@gmail.com>
2021-07-26 14:46:31 +02:00
Robert Klotzner b5257b2407 Dispute distribution implementation (#3282)
* Dispute protocol.

* Dispute distribution protocol.

* Get network requests routed.

* WIP: Basic dispute sender logic.

* Basic validator determination logic.

* WIP: Getting things to typecheck.

* Slightly larger timeout.

* More typechecking stuff.

* Cleanup.

* Finished most of the sending logic.

* Handle active leaves updates

- Cleanup dead disputes
- Update sends for new sessions
- Retry on errors

* Pass sessions in already.

* Startup dispute sending.

* Provide incoming decoding facilities

and use them in statement-distribution.

* Relaxed runtime util requirements.

We only need a `SubsystemSender` not a full `SubsystemContext`.

* Better usability of incoming requests.

Make it possible to consume stuff without clones.

* Add basic receiver functionality.

* Cleanup + fixes for sender.

* One more sender fix.

* Start receiver.

* Make sure to send responses back.

* WIP: Exposed authority discovery

* Make tests pass.

* Fully featured receiver.

* Decrease cost of `NotAValidator`.

* Make `RuntimeInfo` LRU cache size configurable.

* Cache more sessions.

* Fix collator protocol.

* Disable metrics for now.

* Make dispute-distribution a proper subsystem.

* Fix naming.

* Code style fixes.

* Factored out 4x copied mock function.

* WIP: Tests.

* Whitespace cleanup.

* Accessor functions.

* More testing.

* More Debug instances.

* Fix busy loop.

* Working tests.

* More tests.

* Cleanup.

* Fix build.

* Basic receiving test.

* Non validator message gets dropped.

* More receiving tests.

* Test nested and subsequent imports.

* Fix spaces.

* Better formatted imports.

* Import cleanup.

* Metrics.

* Message -> MuxedMessage

* Message -> MuxedMessage

* More review remarks.

* Add missing metrics.rs.

* Fix flaky test.

* Dispute coordinator - deliver confirmations.

* Send out `DisputeMessage` on issue local statement.

* Unwire dispute distribution.

* Review remarks.

* Review remarks.

* Better docs.
2021-07-09 04:29:53 +02:00
Bernhard Schuster 3c9104daff refactor overseer into proc-macro based pattern (#2962) 2021-07-08 21:09:26 +02:00
Andronik Ordian 373a545118 make it easier to dbg stalls (#3351)
* make it easier to dbg

* revert channel sizes

* BAnon
2021-07-02 21:09:18 +02:00
Andronik Ordian ffc6f7c731 make ctx.spawn blocking (#3337)
* make spawn sync

* improve error type
2021-06-21 20:43:40 -05:00
Lldenaurois 2abaca3a8c Remove candidate selection (#3148)
* Create validator_side module

* Subsume Candidate Selection

* Add test to ensure candidate backing logic is correct

* Ensure secondings are adequately cleaned up and address test flakyness

* Address Feedback
2021-06-08 14:07:19 -04:00
Robert Habermeier 57038b2e46 Remove real-overseer 🎉 (#2834)
* remove real-overseer

* overseer: only activate leaves which support parachains

* integrate HeadSupportsParachains into service

* remove unneeded line
2021-04-08 20:24:06 +02:00
Robert Habermeier 8ebbe19d10 Split NetworkBridge and break cycles with Unbounded (#2736)
* overseer: pass messages directly between subsystems

* test that message is held on to

* Update node/overseer/src/lib.rs

Co-authored-by: Peter Goodspeed-Niklaus <coriolinus@users.noreply.github.com>

* give every subsystem an unbounded sender too

* remove metered_channel::name

1. we don't provide good names
2. these names are never used anywhere

* unused mut

* remove unnecessary &mut

* subsystem unbounded_send

* remove unused MaybeTimer

We have channel size metrics that serve the same purpose better now and the implementation of message timing was pretty ugly.

* remove comment

* split up senders and receivers

* update metrics

* fix tests

* fix test subsystem context

* use SubsystemSender in jobs system now

* refactor of awful jobs code

* expose public `run` on JobSubsystem

* update candidate backing to new jobs & use unbounded

* bitfield signing

* candidate-selection

* provisioner

* approval voting: send unbounded for assignment/approvals

* async not needed

* begin bridge split

* split up network tasks into background worker

* port over network bridge

* Update node/network/bridge/src/lib.rs

Co-authored-by: Andronik Ordian <write@reusable.software>

* rename ValidationWorkerNotifications

Co-authored-by: Peter Goodspeed-Niklaus <coriolinus@users.noreply.github.com>
Co-authored-by: Andronik Ordian <write@reusable.software>
2021-03-29 01:18:53 +02:00
Robert Habermeier 5952e790fa Overseer: subsystems communicate directly (#2227)
* overseer: pass messages directly between subsystems

* test that message is held on to

* Update node/overseer/src/lib.rs

Co-authored-by: Peter Goodspeed-Niklaus <coriolinus@users.noreply.github.com>

* give every subsystem an unbounded sender too

* remove metered_channel::name

1. we don't provide good names
2. these names are never used anywhere

* unused mut

* remove unnecessary &mut

* subsystem unbounded_send

* remove unused MaybeTimer

We have channel size metrics that serve the same purpose better now and the implementation of message timing was pretty ugly.

* remove comment

* split up senders and receivers

* update metrics

* fix tests

* fix test subsystem context

* fix flaky test

* fix docs

* doc

* use select_biased to favor signals

* Update node/subsystem/src/lib.rs

Co-authored-by: Andronik Ordian <write@reusable.software>

Co-authored-by: Peter Goodspeed-Niklaus <coriolinus@users.noreply.github.com>
Co-authored-by: Andronik Ordian <write@reusable.software>
2021-03-28 15:55:10 +00:00
Robert Klotzner 78ac4b7add Whole subsystem test for new availability-distribution (#2552)
* WIP: Whole subsystem test.

* New tests compile.

* Avoid needless runtime queries for no validator nodes.

* Make tx and rx publicly accessible in virtual overseer.

This simplifies mocking in some cases, as tx can be cloned, but rx can
not.

* Whole subsystem test working.

* Update node/network/availability-distribution/src/session_cache.rs

Co-authored-by: Andronik Ordian <write@reusable.software>

* Update node/network/availability-distribution/src/session_cache.rs

Co-authored-by: Andronik Ordian <write@reusable.software>

* Document better what `None` return value means.

* Get rid of BitVec dependency.

* Update Cargo.lock

* Hopefully fixed implementers guide build.

Co-authored-by: Andronik Ordian <write@reusable.software>
2021-03-03 16:23:15 +01:00
Andronik Ordian 69b103b1d5 overseer: send_msg should not return an error (#1995)
* send_message should not return an error

* Apply suggestions from code review

Co-authored-by: Peter Goodspeed-Niklaus <coriolinus@users.noreply.github.com>

* s/send_logging_error/send_and_log_error

Co-authored-by: Peter Goodspeed-Niklaus <coriolinus@users.noreply.github.com>
2020-11-23 12:42:14 +01:00
Fedor Sakharov 935fcd1666 Change SpawnedSubsystem type to log subsystem errors (#1878)
* Change SpawnedSubsystem type to log subsystem errors

* Remove clone
2020-10-28 20:57:06 +01:00
Bernhard Schuster f345123748 introduce errors with info (#1834) 2020-10-27 08:10:03 +01:00
Bastian Köcher f2d7b6f5ac Make AllSubsystems usage easier in tests (#1794)
* Make `AllSubsystems` usage easier in tests

This makes the usage of `AllSubsystems` easier in tests by introducing
new methods.

- `dummy` initializes `AllSubsystems` with all systems set to dummy
- `replace_*` to replace any subsystem

Besides that this pr adds a `ForwardSubsystem` that is also useful for
tests. This subsystem will forward all incoming messages to the given channel.

* Update node/overseer/src/lib.rs

Co-authored-by: Andronik Ordian <write@reusable.software>

* Update node/subsystem/src/lib.rs

Co-authored-by: Andronik Ordian <write@reusable.software>

* Update node/subsystem/src/lib.rs

Co-authored-by: Andronik Ordian <write@reusable.software>

* Move ForwardSubsystem and add a test

* Break some lines

Co-authored-by: Andronik Ordian <write@reusable.software>
2020-10-08 11:27:19 +02:00
Fedor Sakharov 98660cbd94 Collator protocol subsystem (#1659)
* WIP

* The initial implementation of the collator side.

* Improve comments

* Multiple collation requests

* Add more tests and comments to validator side

* Add comments, remove dead code

* Apply suggestions from code review

Co-authored-by: Peter Goodspeed-Niklaus <coriolinus@users.noreply.github.com>

* Fix build after suggested changes

* Also connect to the next validator group

* Remove a Future impl and move TimeoutExt to util

* Minor nits

* Fix build

* Change FetchCollations back to FetchCollation

* Try this

* Final fixes

* Fix build

Co-authored-by: Peter Goodspeed-Niklaus <coriolinus@users.noreply.github.com>
2020-09-10 16:54:59 +03:00
Andronik Ordian e899a3f844 jobs: don't early exit when there are no jobs (#1621)
* jobs: don't early exit when there are no jobs

* utils: fix merged test

* utils: less verbose

* utils: add an assert subsystem is running

* utils: use TimeoutExt from test-helpers

* test-helpers: use TimeoutExt
2020-08-21 16:45:39 +02:00
Fedor Sakharov cc19f13468 use own timeout in tests instead of smol-timeout (#1618) 2020-08-20 20:27:09 +03:00
Peter Goodspeed-Niklaus 54bec3bfc0 implement collation generation subsystem (#1557)
* start sketching out a collation generation subsystem

* invent a basic strategy for double initialization

* clean up warnings

* impl util requests from runtime assuming a context instead of a FromJob sender

* implement collation generation algorithm from guide

* update AllMessages in tests

* fix trivial review comments

* remove another redundant declaration from merge

* filter availability cores by para_id

* handle new activations each in their own async task

* update guide according to the actual current implementation

* add initialization to guide

* add general-purpose subsystem_test_harness helper

* write first handle_new_activations test

* add test that handle_new_activations filters local_validation_data requests

* add (failing) test of collation distribution message sending

* rustfmt

* broken: work on fixing sender test

Unfortunately, for reasons that are not yet clear, despite the public key
and checked data being identical, the signer is not producing an identical
signature. This commit produces this output (among more):

signing with  Public(c4733ab0bbe3ba4c096685d1737a7f498cdbdd167a767d04a21dc7df12b8c858 (5GWHUNm5...))
checking with Public(c4733ab0bbe3ba4c096685d1737a7f498cdbdd167a767d04a21dc7df12b8c858 (5GWHUNm5...))
signed payload:  [4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 10, 0, 0, 0, c7, e5, c0, 64, 7a, db, fe, 44, 81, e5, 51, 11, 79, 9f, a5, 63, 93, 94, 3c, c4, 36, c6, 30, 36, c2, c5, 44, a2, 1b, db, b7, 82, 3, 17, a, 2e, 75, 97, b7, b7, e3, d8, 4c, 5, 39, 1d, 13, 9a, 62, b1, 57, e7, 87, 86, d8, c0, 82, f2, 9d, cf, 4c, 11, 13, 14]
checked payload: [4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 10, 0, 0, 0, c7, e5, c0, 64, 7a, db, fe, 44, 81, e5, 51, 11, 79, 9f, a5, 63, 93, 94, 3c, c4, 36, c6, 30, 36, c2, c5, 44, a2, 1b, db, b7, 82, 3, 17, a, 2e, 75, 97, b7, b7, e3, d8, 4c, 5, 39, 1d, 13, 9a, 62, b1, 57, e7, 87, 86, d8, c0, 82, f2, 9d, cf, 4c, 11, 13, 14]

* fix broken test

* collation function returns commitments hash

It doesn't look like we use the actual commitments data anywhere, and
it's not obvious if there are any fields of `CandidateCommitments`
not available to the collator, so this commit just assigns them the
entire responsibility of generating the hash.

* add missing overseer impls

* calculating erasure coding is polkadot's responsibility, not cumulus

* concurrentize per-relay_parent requests
2020-08-17 14:27:37 +02:00
Andronik Ordian ab1a513265 Add spawn_blocking to SubsystemContext (#1570)
* subsystem: add spawn_blocking to SubsystemContext

* candidate-validation: use spawn_blocking for exhaustive tasks
2020-08-17 12:26:32 +00:00
Bernhard Schuster 4bdfd02f93 impl availability distribution
Closes #1237
2020-08-10 15:02:30 +02:00
Peter Goodspeed-Niklaus 9fda6cb416 break out subsystem-util and subsystem-test-helpers into individual crates (#1553)
* break out subsystem-util and subsystem-test-helpers into individual crates

* cause all packages to check successfully
2020-08-07 16:51:36 +02:00