* Don't import backing statements directly
into the dispute coordinator. This also gets rid of a redundant
signature check. Both should have some impact on backing performance.
In general this PR should make us scale better in the number of parachains.
Reasoning (aka why this is fine):
For the signature check: As mentioned, it is a redundant check. The
signature has already been checked at this point. This is even made
obvious by the used types. The smart constructor is not perfect as
discussed [here](https://github.com/paritytech/polkadot/issues/3455),
but is still a reasonable security.
For not importing to the dispute-coordinator: This should be good as the
dispute coordinator does scrape backing votes from chain. This suffices
in practice as a super majority of validators must have seen a backing
fork in order for a candidate to get included and only included
candidates pose a threat to our system. The import from chain is
preferable over direct import of backing votes for two reasons:
1. The import is batched, greatly improving import performance. All
backing votes for a candidate are imported with a single import.
And indeed we were able to see in metrics that importing votes
from chain is fast.
2. We do less work in general as not every candidate for which
statements are gossiped might actually make it on a chain. The
dispute coordinator as with the current implementation would still
import and keep those votes around for six sessions.
While redundancy is good for reliability in the event of bugs, this also
comes at a non negligible cost. The dispute-coordinator right now is the
subsystem with the highest load, despite the fact that it should not be
doing much during mormal operation and it is only getting worse
with more parachains as the load is a direct function of the number of statements.
We'll see on Versi how much of a performance improvement this PR
* Get rid of dead code.
* Dont send approval vote
* Make it pass CI
* Bring back tests for fixing them later.
* Explicit signature check.
* Resurrect approval-voting tests (not fixed yet)
* Send out approval votes in dispute-distribution.
Use BTreeMap for ordered dispute votes.
* Bring back an important warning.
* Fix approval voting tests.
* Don't send out dispute message on import + test
+ Some cleanup.
* Guide changes.
Note that the introduced complexity is actually redundant.
* WIP: guide changes.
* Finish guide changes about dispute-coordinator
conceputally. Requires more proof read still.
Also removed obsolete implementation details, where the code is better
suited as the source of truth.
* Finish guide changes for now.
* Remove own approval vote import logic.
* Implement logic for retrieving approval-votes
into approval-voting and approval-distribution subsystems.
* Update roadmap/implementers-guide/src/node/disputes/dispute-coordinator.md
Co-authored-by: asynchronous rob <rphmeier@gmail.com>
* Review feedback.
In particular: Add note about disputes of non included candidates.
* Incorporate Review Remarks
* Get rid of superfluous space.
* Tidy up import logic a bit.
Logical vote import is now separated, making the code more readable and
maintainable.
Also: Accept import if there is at least one invalid signer that has not
exceeded its spam slots, instead of requiring all of them to not exceed
their limits. This is more correct and a preparation for vote batching.
* We don't need/have empty imports.
* Fix tests and bugs.
* Remove error prone redundancy.
* Import approval votes on dispute initiated/concluded.
* Add test for approval vote import.
* Make guide checker happy (hopefully)
* Another sanity check + better logs.
* Reasoning about boundedness.
* Use `CandidateIndex` as opposed to `CoreIndex`.
* Remove redundant import.
* Review remarks.
* Add metric for calls to request signatures
* More review remarks.
* Add metric on imported approval votes.
* Include candidate hash in logs.
* More trace log
* Break cycle.
* Add some tracing.
* Cleanup allowed messages.
* fmt
* Tracing + timeout for get inherent data.
* Better error.
* Break cycle in all places.
* Clarified comment some more.
* Typo.
* Break cycle approval-distribution - approval-voting.
Co-authored-by: asynchronous rob <rphmeier@gmail.com>
We are awaiting on the oneshot anyways, so we have back pressure. By
using the unbounded channel make log messages like the following less
likely (due to higher priority):
2022-05-30 13:46:38
2022-05-30 11:46:38.565 WARN tokio-runtime-worker parachain::provisioner: failed to assemble or send inherent data err=CanceledBackedCandidates(Canceled)
* Increase message channel size to 2048
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
* Use unbounded channel for reading data
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
* provisioner data, one more log
* minor cleanups in provisioner logs
* remove deprecated foo in overseer
Closes#5128
* candidate_hash needs the ? prefix in gum
* demote to trace
* remove v0 primitives from polkadot-primitives
* first pass: remove v0
* fix fallout in erasure-coding
* remove v1 primitives, consolidate to v2
* the great import update
* update runtime_api_impl_v1 to v2 as well
* guide: add `Version` request for runtime API
* add version query to runtime API
* reintroduce OldV1SessionInfo in a limited way
* fix miscalculation of remaining weight
* rename a var
* move out enforcing filtering by dropping inherents
* prepare for dispute statement validity check being split off
* refactor
* refactor, only check disputes we actually want to include
* more refactor and documentation
* refactor and minimize inherent checks
* chore: warnings
* fix a few tests
* fix dedup regression
* fix
* more asserts in tests
* remove some asserts
* chore: fmt
* skip signatures checks, some more
* undo unwatend changes
* Update runtime/parachains/src/paras_inherent/mod.rs
Co-authored-by: sandreim <54316454+sandreim@users.noreply.github.com>
* cleanups, checking CheckedDisputeStatments makes no sense
* integrity, if called create_inherent_inner, it shall do the checks, and not rely on enter_inner
* review comments
* use from impl rather than into
* remove outdated comment
* adjust tests accordingly
* assure no weight is lost
* address review comments
* remove unused import
* split error into two and document
* use assurance, O(n)
* Revert "adjust tests accordingly"
This reverts commit 3cc9a3c449f82db38cea22c48f4a21876603374b.
* fix comment
* fix sorting
* comment
Co-authored-by: sandreim <54316454+sandreim@users.noreply.github.com>
* enable disputes, for all known chains but polkadot
* chore: fmt
* don't propagate disputes either
* review
* remove disputes feature
* remove superfluous line
* Update node/service/src/lib.rs
Co-authored-by: Andronik Ordian <write@reusable.software>
* fixup
* allow being a dummy
* rialto
* add an enum, to make things work better
* overseer
* fix test
* comments
* move condition out
* excess arg
Co-authored-by: Andronik Ordian <write@reusable.software>
* disputes: Allow batch queries in dispute-coordinator
This commit moves to batch queries when responding to QueryCandidateVotes
messages. This simplifies the code in the provisioner and dispute-coordinator
by no longer requiring to make use of a FuturesOrdered when awaiting multiple
quries. Instead, the provisioner need only request the batch itself.
* node/approval-voting: Address Feedback to fail on query element missing.
* Address feedback
* Fix implementer's guide
* dispute types
* add Debug to dispute primitives in std and InherentData
* use ParachainsInherentData on node-side
* change inclusion_inherent to paras_inherent
* RuntimeDebug
* add type parameter to PersistedValidationData users
* fix test client
* spaces
* fix collation-generation test
* fix provisioner tests
* remove references to inclusion inherent
* overseer: pass messages directly between subsystems
* test that message is held on to
* Update node/overseer/src/lib.rs
Co-authored-by: Peter Goodspeed-Niklaus <coriolinus@users.noreply.github.com>
* give every subsystem an unbounded sender too
* remove metered_channel::name
1. we don't provide good names
2. these names are never used anywhere
* unused mut
* remove unnecessary &mut
* subsystem unbounded_send
* remove unused MaybeTimer
We have channel size metrics that serve the same purpose better now and the implementation of message timing was pretty ugly.
* remove comment
* split up senders and receivers
* update metrics
* fix tests
* fix test subsystem context
* use SubsystemSender in jobs system now
* refactor of awful jobs code
* expose public `run` on JobSubsystem
* update candidate backing to new jobs & use unbounded
* bitfield signing
* candidate-selection
* provisioner
* approval voting: send unbounded for assignment/approvals
* async not needed
* begin bridge split
* split up network tasks into background worker
* port over network bridge
* Update node/network/bridge/src/lib.rs
Co-authored-by: Andronik Ordian <write@reusable.software>
* rename ValidationWorkerNotifications
Co-authored-by: Peter Goodspeed-Niklaus <coriolinus@users.noreply.github.com>
Co-authored-by: Andronik Ordian <write@reusable.software>
* WIP
* availability distribution, still very wip.
Work on the requesting side of things.
* Some docs on what I intend to do.
* Checkpoint of session cache implementation
as I will likely replace it with something smarter.
* More work, mostly on cache
and getting things to type check.
* Only derive MallocSizeOf and Debug for std.
* availability-distribution: Cache feature complete.
* Sketch out logic in `FetchTask` for actual fetching.
- Compile fixes.
- Cleanup.
* Format cleanup.
* More format fixes.
* Almost feature complete `fetch_task`.
Missing:
- Check for cancel
- Actual querying of peer ids.
* Finish FetchTask so far.
* Directly use AuthorityDiscoveryId in protocol and cache.
* Resolve `AuthorityDiscoveryId` on sending requests.
* Rework fetch_task
- also make it impossible to check the wrong chunk index.
- Export needed function in validator_discovery.
* From<u32> implementation for `ValidatorIndex`.
* Fixes and more integration work.
* Make session cache proper lru cache.
* Use proper lru cache.
* Requester finished.
* ProtocolState -> Requester
Also make sure to not fetch our own chunk.
* Cleanup + fixes.
* Remove unused functions
- FetchTask::is_finished
- SessionCache::fetch_session_info
* availability-distribution responding side.
* Cleanup + Fixes.
* More fixes.
* More fixes.
adder-collator is running!
* Some docs.
* Docs.
* Fix reporting of bad guys.
* Fix tests
* Make all tests compile.
* Fix test.
* Cleanup + get rid of some warnings.
* state -> requester
* Mostly doc fixes.
* Fix test suite.
* Get rid of now redundant message types.
* WIP
* Rob's review remarks.
* Fix test suite.
* core.relay_parent -> leaf for session request.
* Style fix.
* Decrease request timeout.
* Cleanup obsolete errors.
* Metrics + don't fail on non fatal errors.
* requester.rs -> requester/mod.rs
* Panic on invalid BadValidator report.
* Fix indentation.
* Use typed default timeout constant.
* Make channel size 0, as each sender gets one slot anyways.
* Fix incorrect metrics initialization.
* Fix build after merge.
* More fixes.
* Hopefully valid metrics names.
* Better metrics names.
* Some tests that already work.
* Slightly better docs.
* Some more tests.
* Fix network bridge test.
* add tracing to approval voting
* notify if session info is not working
* add dispute period to chain specs
* propagate genesis session to parachains runtime
* use `on_genesis_session`
* protect against zero cores in computation
* tweak voting rule to be based off of best and add logs
* genesis configuration should use VRF slots only
* swallow more keystore errors
* add some docs
* make validation-worker args non-optional and update clap
* better tracing for bitfield signing and provisioner
* pass amount of bits in bitfields to inclusion instead of recomputing
* debug -> warn for some logs
* better tracing for availability recovery
* a little av-store tracing
* bridge: forward availability recovery messages
* add missing try_from impl
* some more tracing
* improve approval distribution tracing
* guide: hold onto pending approval messages until NewBlocks
* Hold onto pending approval messages until NewBlocks
* guide: adjust comment
* process all actions for one wakeup at a time
* vec
* fix network bridge test
* replace randomness-collective-flip with Babe
* remove PairNotFound
* Add one Jaeger span per relay parent
This adds one Jaeger span per relay parent, instead of always creating
new spans per relay parent. This should improve the UI view, because
subsystems are now grouped below one common span.
* Fix doc tests
* Replace `PerLeaveSpan` to `PerLeafSpan`
* More renaming
* Moare
* Update node/subsystem/src/lib.rs
Co-authored-by: Andronik Ordian <write@reusable.software>
* Skip the spans
* Increase `spec_version`
Co-authored-by: Andronik Ordian <write@reusable.software>
* guide: non-semantic changes
* guide: update per the issue description
* GetBackedCandidates operates on multiple hashes now
* GetBackedCandidates still needs a relay parent
* implement changes specified in guide
* distinguish between various occasions for canceled oneshots
* add tracing info to getbackedcandidates
* REVERT ME: add tracing messages for GetBackedCandidates
Note that these messages are only sometimes actually passed on to the
candidate backing subsystem, with the consequence that it is
unexpectedly frequent that the provisioner fails to create its
provisionable data.
* REVERT ME: more tracing logging
* REVERT ME: log when CandidateBackingJob receives any message at all
* REVERT ME: log when send_msg sends a message to a job
* fix candidate-backing tests
* streamline GetBackedCandidates
This uses table.attested_candidate instead of table.get_candidate, because
it's not obvious how to get a BackedCandidate from just a
CommittedCandidateReceipt.
* REVERT ME: more logging tracing job lifespans
* promote warning about job premature demise
* don't terminate CandiateBackingJob::run_loop in event of failure to process message
* Revert "REVERT ME: more logging tracing job lifespans"
This reverts commit 7365f2fb3dec988d95cfcd317eba75587fe7fd16.
* Revert "REVERT ME: log when send_msg sends a message to a job"
This reverts commit 58e46aad038e6517d6d56390c8be65b046a21884.
* Revert "REVERT ME: log when CandidateBackingJob receives any message at all"
This reverts commit 0d6f38413c7c66b5e9e81dabc587906fa9f82656.
* Revert "REVERT ME: more tracing logging"
This reverts commit 675fd2628e84d1596965280e7314155ef21b28e6.
* Revert "REVERT ME: add tracing messages for GetBackedCandidates"
This reverts commit e09e156493430b33b6c8ab4b5cedb3f2f91afd51.
* formatting
* add logging message to CandidateBackingJob::run_loop start
* REVERT ME: add tracing to candidate-backing job creation
* run candidatebacking loop even if no assignment
* use unique error variants for each canceled oneshot
* Revert "REVERT ME: add tracing to candidate-backing job creation"
This reverts commit 8ce5f4f0bd7186dade134b118751480f72ea1fd6.
* try_runtime_api more to reduce silent exits
* add sanity check that returned backed candidates preserve ordering
* remove redundant err attribute
* Simplify subsystem jobs
This pr simplifies the subsystem jobs interface. Instead of requiring an
extra message that is used to signal that a job should be ended, a job
now ends when the receiver returns `None`. Besides that it changes the
interface to enforce that messages to a job provide a relay parent.
* Drop ToJobTrait
* Remove FromJob
We always convert this message to FromJobCommand anyway.