* Don't import backing statements directly
into the dispute coordinator. This also gets rid of a redundant
signature check. Both should have some impact on backing performance.
In general this PR should make us scale better in the number of parachains.
Reasoning (aka why this is fine):
For the signature check: As mentioned, it is a redundant check. The
signature has already been checked at this point. This is even made
obvious by the used types. The smart constructor is not perfect as
discussed [here](https://github.com/paritytech/polkadot/issues/3455),
but is still a reasonable security.
For not importing to the dispute-coordinator: This should be good as the
dispute coordinator does scrape backing votes from chain. This suffices
in practice as a super majority of validators must have seen a backing
fork in order for a candidate to get included and only included
candidates pose a threat to our system. The import from chain is
preferable over direct import of backing votes for two reasons:
1. The import is batched, greatly improving import performance. All
backing votes for a candidate are imported with a single import.
And indeed we were able to see in metrics that importing votes
from chain is fast.
2. We do less work in general as not every candidate for which
statements are gossiped might actually make it on a chain. The
dispute coordinator as with the current implementation would still
import and keep those votes around for six sessions.
While redundancy is good for reliability in the event of bugs, this also
comes at a non negligible cost. The dispute-coordinator right now is the
subsystem with the highest load, despite the fact that it should not be
doing much during mormal operation and it is only getting worse
with more parachains as the load is a direct function of the number of statements.
We'll see on Versi how much of a performance improvement this PR
* Get rid of dead code.
* Dont send approval vote
* Make it pass CI
* Bring back tests for fixing them later.
* Explicit signature check.
* Resurrect approval-voting tests (not fixed yet)
* Send out approval votes in dispute-distribution.
Use BTreeMap for ordered dispute votes.
* Bring back an important warning.
* Fix approval voting tests.
* Don't send out dispute message on import + test
+ Some cleanup.
* Guide changes.
Note that the introduced complexity is actually redundant.
* WIP: guide changes.
* Finish guide changes about dispute-coordinator
conceputally. Requires more proof read still.
Also removed obsolete implementation details, where the code is better
suited as the source of truth.
* Finish guide changes for now.
* Remove own approval vote import logic.
* Implement logic for retrieving approval-votes
into approval-voting and approval-distribution subsystems.
* Update roadmap/implementers-guide/src/node/disputes/dispute-coordinator.md
Co-authored-by: asynchronous rob <rphmeier@gmail.com>
* Review feedback.
In particular: Add note about disputes of non included candidates.
* Incorporate Review Remarks
* Get rid of superfluous space.
* Tidy up import logic a bit.
Logical vote import is now separated, making the code more readable and
maintainable.
Also: Accept import if there is at least one invalid signer that has not
exceeded its spam slots, instead of requiring all of them to not exceed
their limits. This is more correct and a preparation for vote batching.
* We don't need/have empty imports.
* Fix tests and bugs.
* Remove error prone redundancy.
* Import approval votes on dispute initiated/concluded.
* Add test for approval vote import.
* Make guide checker happy (hopefully)
* Another sanity check + better logs.
* Reasoning about boundedness.
* Use `CandidateIndex` as opposed to `CoreIndex`.
* Remove redundant import.
* Review remarks.
* Add metric for calls to request signatures
* More review remarks.
* Add metric on imported approval votes.
* Include candidate hash in logs.
* More trace log
* Break cycle.
* Add some tracing.
* Cleanup allowed messages.
* fmt
* Tracing + timeout for get inherent data.
* Better error.
* Break cycle in all places.
* Clarified comment some more.
* Typo.
* Break cycle approval-distribution - approval-voting.
Co-authored-by: asynchronous rob <rphmeier@gmail.com>
* foo
* rolling session window
* fixup
* remove use statemetn
* fmt
* split NetworkBridge into two subsystems
Pending cleanup
* split
* chore: reexport OrchestraError as OverseerError
* chore: silence warnings
* fixup tests
* chore: add default timenout of 30s to subsystem test helper ctx handle
* single item channel
* fixins
* fmt
* cleanup
* remove dead code
* remove sync bounds again
* wire up shared state
* deal with some FIXMEs
* use distinct tags
Co-authored-by: Andrei Sandu <54316454+sandreim@users.noreply.github.com>
* use tag
Co-authored-by: Andrei Sandu <54316454+sandreim@users.noreply.github.com>
* address naming
tx and rx are common in networking and also have an implicit meaning regarding networking
compared to incoming and outgoing which are already used with subsystems themselvesq
* remove unused sync oracle
* remove unneeded state
* fix tests
* chore: fmt
* do not try to register twice
* leak Metrics type
Co-authored-by: Andrei Sandu <54316454+sandreim@users.noreply.github.com>
Co-authored-by: Andronik <write@reusable.software>
* Move NewGossipTopology -> SessionGridTopology outside as this implementation is shared
* Add method to return peers difference between topologies
* Implement basic grid topology usage for the bitfield distribution
* Fix tests
* Oops, fix tests
* Add some tests for random routing
* Add a unit test for topology distribution
* Store the current and the previous topology to match sessions boundaries
* Update tests
* Update node/network/bitfield-distribution/src/lib.rs
Co-authored-by: Andronik <write@reusable.software>
* Update node/network/protocol/src/grid_topology.rs
Co-authored-by: Andronik <write@reusable.software>
* Update node/network/bitfield-distribution/src/lib.rs
Co-authored-by: Andronik <write@reusable.software>
* Add some debug
* Fix tests as HashSet order is undefined
* Move session bounded topology to the common code part
* Fix tests
* Allow to select routing by peer index
* Implement grid topology in the statement distribution subsystem
* Fix tests compilation
* Fix test
* Refactor API slightly
* Address review comments
* Reduce runtime error logging severity
* Update node/network/protocol/src/grid_topology.rs
Co-authored-by: Bernhard Schuster <bernhard@ahoi.io>
* Update node/network/bitfield-distribution/src/tests.rs
Co-authored-by: Bernhard Schuster <bernhard@ahoi.io>
* Fmt run
* Use named struct
* Fix logging stuff
* One more accidental fmt damage
* Increase active queue size and add metrics
Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
* Revert "Increase active queue size and add metrics"
This reverts commit c4f48e8bded6dfeb9c62814ba2f8d815c34b04cf.
* Use validator index to choose the routing strategy
Noted by: @rphmeier
* Fix test after distribution logic fix
Co-authored-by: Andronik <write@reusable.software>
Co-authored-by: Bernhard Schuster <bernhard@ahoi.io>
Co-authored-by: Andrei Sandu <andrei-mihail@parity.io>
* Double grandpa gossip duration.
* Make resend period slightly larger.
So it won't get triggered by additional grandpa delay.
* Bump other values as well.
* Don't change gossip duration on Polkadot.
(and Westend as it is meant to be a testbed for Polkadot)
* Move NewGossipTopology -> SessionGridTopology outside as this implementation is shared
* Add method to return peers difference between topologies
* Implement basic grid topology usage for the bitfield distribution
* Fix tests
* Oops, fix tests
* Add some tests for random routing
* Add a unit test for topology distribution
* Store the current and the previous topology to match sessions boundaries
* Update tests
* Update node/network/bitfield-distribution/src/lib.rs
Co-authored-by: Andronik <write@reusable.software>
* Update node/network/protocol/src/grid_topology.rs
Co-authored-by: Andronik <write@reusable.software>
* Update node/network/bitfield-distribution/src/lib.rs
Co-authored-by: Andronik <write@reusable.software>
* Add some debug
* Fix tests as HashSet order is undefined
Co-authored-by: Andronik <write@reusable.software>
* Initial attempt to extract grid topology related code
* Use shared code in the approval distribution subsystem
* Fix spellcheck issues
* Moe Aggression stuff back to the approval-distribution subsystem
* Cargo fmt
* explicitly tag network requests with version
* fmt
* make PeerSet more aware of versioning
* some generalization of the network bridge to support upgrades
* walk back some renaming
* walk back some version stuff
* extract version from fallback
* remove V1 from NetworkBridgeUpdate
* add accidentally-removed timer
* implement focusing for versioned messages
* fmt
* fix up network bridge & tests
* remove inaccurate version check in bridge
* remove some TODO [now]s
* fix fallout in statement distribution
* fmt
* fallout in gossip-support
* fix fallout in collator-protocol
* fix fallout in bitfield-distribution
* fix fallout in approval-distribution
* fmt
* use never!
* fmt
* gossip-support: be explicit about dimensions
* some guide updates
* update network-bridge to distinguish x and y dimensions
* get everything to compile
* beginnings
* some TODOs
* polkadot runtime: use relevant_authorities
* make gossip topologies per-session
* better formatting
* gossip support: use current session validators
* expand in comment
* adjust tests and fix index bug
* add past/present/future connection test and clean up code
* fmt
* network bridge: updated types
* update protocols to new gossip topology message
* guide updates
* add session to BlockApprovalMeta
* add session to block info
* refactor knowledge and remove most unify logic
* start replacing gossip_peers with new SessionTopologies
* add routing information to message state
* add some utilities to SessionTopology
* implement new gossip topology logic
* re-implement unify_with_peer
* distribute assignments according to topology
* finish grid topology implementation
* refactor network bridge slightly
* issue connection requests on all past/present/future
* fmt
* address grumbles
* tighten invariants in unify_with_peer
* implement random propagation
* refactor: extract required routing adjustment logic
* some block-age logic
* aggressively propagate messages when finality is slow
* overhaul aggression system to have 3 levels
* add aggression metrics
* remove aggression L3
* reduce random circulation
* remove PeerData
* get approval tests compiling
* use btree_map in known_by to make deterministic
* Revert "use btree_map in known_by to make deterministic"
This reverts commit 330d65343a7bb6fe4dd0f24bd8dbc15c0cbdbd9d.
* test XY grid propagation
* remove stray println
* test unshared dimension propagation
* add random gossip check
* test unify_with_peer better
* test sending after getting gossip topology
* test L1 aggression on originator
* test L1 aggression for non-originators
* test non-originator aggression L2
* fnt
* ~spellcheck
* fix statement-distribution tests
* fix flaky test
* fix metrics typo
* re-send periodically
* test resending
* typo
Co-authored-by: Bernhard Schuster <bernhard@ahoi.io>
* add more metrics about apd messages
* add back unify_with_peer logs
* make Resend an enum
* be more explicit when resending
* fmt
* fix error
* add a TODO for refactoring
* remove debug metrics
* add some guide stuff
* fmt
* update runtime API in test-runtim
Co-authored-by: Bernhard Schuster <bernhard@ahoi.io>
This issue happens when some peer sends a good but already known Seconded statement and the statement-distribution code does not update the statements_received field in the peer_knowledge structure. Subsequently, a Valid statement causes out-of-view message that is incorrectly emitted and causes reputation lose.
This PR also introduces a concept of passing the specific pseudo-random generator to subsystems to make it easier to write deterministic tests. This functionality is not really necessary for the specific issue and unit test but it can be useful for other tests and subsystems.
* Try to fix out-of-view messages in approval distribution
Suggested by: @ordian
* Cargo fmt
* Add a unit test for the proposed fix
* Spelling fix
* Use a simplier approach to fix the race condition as suggested by @rphmeier
* Cargo fmt run
* remove v0 primitives from polkadot-primitives
* first pass: remove v0
* fix fallout in erasure-coding
* remove v1 primitives, consolidate to v2
* the great import update
* update runtime_api_impl_v1 to v2 as well
* guide: add `Version` request for runtime API
* add version query to runtime API
* reintroduce OldV1SessionInfo in a limited way
* Companion PR for removing Prometheus metrics prefix
* Was missing some metrics
* Fix missing renames
* Fix test
* Fixes
* Update test
* Update Substrate
* Second time
* remove prefix from intergration test for zombienet
* update zombienet image
* Update Substrate
Co-authored-by: Bastian Köcher <info@kchr.de>
Co-authored-by: Javier Viola <pepoviola@gmail.com>
* Simplify some Option / Result / ? operator patterns
When they identically match a combinator on those types.
Tool-aided by [comby-rust](https://github.com/huitseeker/comby-rust).
* adjust review comments
Co-authored-by: Shawn Tabrizi <shawntabrizi@gmail.com>
* Factor out runtime module into utils.
* Add maybe_authority information to `PeerConnected` event.
We already gather this information in authority discovery, so we might
as well share it with others.
This opens up an easy path to trigger validators differently from normal
nodes, e.g. for prioritization. This change has become more important
now, that we just connect to all validators and therefore just have a
long peer list without any information about those nodes.
* Test fix.
* extract database from av-store itself
* generalize approval-voting over database type
* modes (without handling) and pruning old wakeups
* rework approval importing
* add our_approval_sig to ApprovalEntry
* import assignment
* guide updates for check-full-approval changes
* some aux functions
* send messages when becoming active.
* guide: network bridge sends view updates only when done syncing
* network bridge: send view updates only when done syncing
* tests for new network-bridge behavior
* add a test for updating approval entry with sig
* fix some warnings
* test load-all-blocks
* instantiate new parachains DB
* fix network-bridge empty view updates
* tweak
* fix wasm build, i think
* Update node/core/approval-voting/src/lib.rs
Co-authored-by: Andronik Ordian <write@reusable.software>
* add some versioning to parachains_db
* warnings
* fix merge changes
* remove versioning again
Co-authored-by: Andronik Ordian <write@reusable.software>
* approval-distribution: limit the amount of packets on unify
* guide: fix a typo
* compilation fix
* grammar
* Update roadmap/implementers-guide/src/node/approval/approval-distribution.md
Co-authored-by: David <dvdplm@gmail.com>
* more grammar
* propagate only local assignments/approvals after a certain depth
* increase the threshold
* guides update
Co-authored-by: David <dvdplm@gmail.com>
* add tracing to approval voting
* notify if session info is not working
* add dispute period to chain specs
* propagate genesis session to parachains runtime
* use `on_genesis_session`
* protect against zero cores in computation
* tweak voting rule to be based off of best and add logs
* genesis configuration should use VRF slots only
* swallow more keystore errors
* add some docs
* make validation-worker args non-optional and update clap
* better tracing for bitfield signing and provisioner
* pass amount of bits in bitfields to inclusion instead of recomputing
* debug -> warn for some logs
* better tracing for availability recovery
* a little av-store tracing
* bridge: forward availability recovery messages
* add missing try_from impl
* some more tracing
* improve approval distribution tracing
* guide: hold onto pending approval messages until NewBlocks
* Hold onto pending approval messages until NewBlocks
* guide: adjust comment
* process all actions for one wakeup at a time
* vec
* fix network bridge test
* replace randomness-collective-flip with Babe
* remove PairNotFound
* feat/view: assure heads in a view are sorted
Allows O(n) comparisons, adds an alternate equiv relation
which takes O(n^2) for integrity verification.
Ref #2133
* revert: remove custom PartialEq impl, there are no duplicates
* fix: do not sort the live_heads, that alters the local view
* refactor/view: heads should not be public
* chore/spellcheck: add unfinalized
* fix/view: add missing len() and is_empty() fns
* quirk
* vec is not view
* Update node/network/approval-distribution/src/tests.rs
Co-authored-by: Andronik Ordian <write@reusable.software>
* Update node/network/bridge/src/lib.rs
Co-authored-by: Andronik Ordian <write@reusable.software>
* Update node/network/protocol/src/lib.rs
Co-authored-by: Andronik Ordian <write@reusable.software>
* fixup comment
* fix botched test
Co-authored-by: Andronik Ordian <write@reusable.software>
* refactor/reputation: unify the values used
* chore/rep: rename Annoy* to Cost*, make duplicate message Cost*Repeated
* fix/reputation: lost and found, convert at the boundary to substrate
* refactor/rep: move conversion to base reputation one level down, left conversions
* fix/rep: order of magnitude adjustments
Thanks pierre!
* remove spaces
* chore/rep: give rationale for order of magnitude
* refactor/rep: move UnifiedReputationChange to separate file
* fix/rep: order of magnitudes correction
* skeleton
* skeleton aux-schema module
* start approval types
* start aux schema with aux store
* doc
* finish basic types
* start approval types
* doc
* finish basic types
* write out schema types
* add debug and codec impls to approval types
* add debug and codec impls to approval types
also add some key computation
* add debug and codec impls to approval types
* getters for block and candidate entries
* grumbles
* remove unused AssignmentId
* load_decode utility
* implement DB clearing
* function for adding new block entry to aux store
* start `canonicalize` implementation
* more skeleton
* finish implementing canonicalize
* tag TODO
* implement a test AuxStore
* add allow(unused)
* basic loading and deleting test
* block_entry test function
* add a test for `add_block_entry`
* ensure range is exclusive at end
* test clear()
* test that add_block sets children
* add a test for canonicalize
* extract Pre-digest from header
* utilities for extracting RelayVRFStory from the header-chain
* add approval voting message types
* approval distribution message type
* subsystem skeleton
* state struct
* add futures-timer
* prepare service for babe slot duration
* more skeleton
* better integrate AuxStore
* RelayVRF -> RelayVRFStory
* canonicalize
* implement some tick functionality
* guide: tweaks
* check_approval
* more tweaks and helpers
* guide: add core index to candidate event
* primitives: add core index to candidate event
* runtime: add core index to candidate events
* head handling (session window)
* implement `determine_new_blocks`
* add TODO
* change error type on functions
* compute RelayVRFModulo assignments
* compute RelayVRFDelay assignments
* fix delay tranche calc
* assignment checking
* pluralize
* some dummy code for fetching assignments
* guide: add babe epoch runtime API
* implement a current_epoch() runtime API
* compute assignments
* candidate events get backing group
* import blocks and assignments into DB
* push block approval meta
* add message types, no overseer integration yet
* notify approval distribution of new blocks
* refactor import into separate functions
* impl tranches_to_approve
* guide: improve function signatures
* guide: remove Tick from ApprovalEntry
* trigger and broadcast assignment
* most of approval launching
* remove byteorder crate
* load blocks back to finality, except on startup
* check unchecked assignments
* add claimed core to approval voting message
* fix checks
* assign only to backing group
* remove import_checked_assignment from guide
* newline
* import assignments
* abstract out a bit
* check and import approvals
* check full approvals from assignment import too
* comment
* create a Transaction utility
* must_use
* use transaction in `check_full_approvals`
* wire up wakeups
* add Ord to CandidateHash
* wakeup refactoring
* return candidate info from add_block_entry
* schedule wakeups
* background task: do candidate validation
* forward candidate validation requests
* issue approval votes when requested
* clean up a couple TODOs
* fix up session caching
* clean up last unimplemented!() items
* fix remaining warnings
* remove TODO
* implement handle_approved_ancestor
* update Cargo.lock
* fix runtime API tests
* guide: cleanup assignment checking
* use claimed candidate index instead of core
* extract time to a trait
* tests module
* write a mock clock for testing
* allow swapping out the clock
* make abstract over assignment criteria
* add some skeleton tests and simplify params
* fix backing group check
* do backing group check inside check_assignment_cert
* write some empty test functions to implement
* add a test for non-backing
* test that produced checks pass
* some empty test ideas
* runtime/inclusion: remove outdated TODO
* fix compilation
* av-store: fix tests
* dummy cert
* criteria tests
* move `TestStore` to main tests file
* fix unused warning
* test harness beginnings
* resolve slots renaming fallout
* more compilation fixes
* wip: extract pure data into a separate module
* wip: extract pure data into a separate module
* move types completely to v1
* add persisted_entries
* add conversion trait impls
* clean up some warnings
* extract import logic to own module
* schedule wakeups
* experiment with Actions
* uncomment approval-checking
* separate module for approval checking utilities
* port more code to use actions
* get approval pipeline using actions
* all logic is uncommented
* main loop processes actions
* all loop logic uncommented
* separate function for handling actions
* remove last unimplemented item
* clean up warnings
* State gives read-only access to underlying DB
* tests for approval checking
* tests for approval criteria
* skeleton test module for import
* list of import tests to do
* some test glue code
* test reject bad assignment
* test slot too far in future
* test reject assignment with unknown candidate
* remove loads_blocks tests
* determine_new_blocks back to finalized & harness
* more coverage for determining new blocks
* make `imported_block_info` have less reliance on State
* candidate_info tests
* tests for session caching
* remove println
* extricate DB and main TestStores
* rewrite approval checking logic to counteract early delays
* move state out of function
* update approval-checking tests
* tweak wakeups & scheduling logic
* rename check_full_approvals
* test that assignment import updates candidate
* some approval import tests
* some tests for check_and_apply_approval
* add 'full' qualifier to avoid confusion
* extract should-trigger logic to separate function
* some tests for all triggering
* tests for when we trigger assignments
* test wakeups
* add block utilities for testing
* some more tests for approval updates
* approved_ancestor tests
* new action type for launch approval
* process-wakeup tests
* clean up some warnings
* fix in_future test
* approval checking tests
* tighten up too-far-in-future
* special-case genesis when caching sessions
* fix bitfield len
Co-authored-by: Andronik Ordian <write@reusable.software>
* Move NetworkBridgeEvent to subsystem::messages.
It is not protocol related at all, it is in fact only part of the
subsystem communication as it gets wrapped into messages of each
subsystem.
* Request/response infrastructure is taking shape.
WIP: Does not compile.
* Multiplexer variant not supported by Rusts type system.
* request_response::request type checks.
* Cleanup.
* Minor fixes for request_response.
* Implement request sending + move multiplexer.
Request multiplexer is moved to bridge, as there the implementation is
more straight forward as we can specialize on `AllMessages` for the
multiplexing target.
Sending of requests is mostly complete, apart from a few `From`
instances. Receiving is also almost done, initializtion needs to be
fixed and the multiplexer needs to be invoked.
* Remove obsolete multiplexer.
* Initialize bridge with multiplexer.
* Finish generic request sending/receiving.
Subsystems are now able to receive and send requests and responses via
the overseer.
* Doc update.
* Fixes.
* Link issue for not yet implemented code.
* Fixes suggested by @ordian - thanks!
- start encoding at 0
- don't crash on zero protocols
- don't panic on not yet implemented request handling
* Update node/network/protocol/src/request_response/v1.rs
Use index 0 instead of 1.
Co-authored-by: Andronik Ordian <write@reusable.software>
* Update node/network/protocol/src/request_response.rs
Co-authored-by: Andronik Ordian <write@reusable.software>
* Fix existing tests.
* Better avoidance of division by zoro errors.
* Doc fixes.
* send_request -> start_request.
* Fix missing renamings.
* Update substrate.
* Pass TryConnect instead of true.
* Actually import `IfDisconnected`.
* Fix wrong import.
* Update node/network/bridge/src/lib.rs
typo
Co-authored-by: Pierre Krieger <pierre.krieger1708@gmail.com>
* Update node/network/bridge/src/multiplexer.rs
Remove redundant import.
Co-authored-by: Pierre Krieger <pierre.krieger1708@gmail.com>
* Stop doing tracing from within `From` instance.
Thanks for the catch @tomaka!
* Get rid of redundant import.
* Formatting cleanup.
* Fix tests.
* Add link to issue.
* Clarify comments some more.
* Fix tests.
* Formatting fix.
* tabs
* Fix link
Co-authored-by: Bernhard Schuster <bernhard@ahoi.io>
* Use map_err.
Co-authored-by: Bernhard Schuster <bernhard@ahoi.io>
* Improvements inspired by suggestions by @drahnr.
- Channel size is now determined by function.
- Explicitely scope NetworkService::start_request.
Co-authored-by: Andronik Ordian <write@reusable.software>
Co-authored-by: Pierre Krieger <pierre.krieger1708@gmail.com>
Co-authored-by: Bernhard Schuster <bernhard@ahoi.io>
* initial impl approval distribution
* initial tests and fixes
* batching seems difficult: different peers have different needs
* bridge: fix test after merge
* some guide updates
* only send assignments to peers who know about the block
* fix a test, add approvals test
* simplify
* do not send assignment to peers for finalized blocks
* guide: protocol input and output
* one more test
* more comments, logs, initial metrics
* fix a typo
* one more thing: early return when reimporting a thing locally