Commit Graph

130 Commits

Author SHA1 Message Date
Robert Habermeier e48c687504 Implement Approval Voting Subsystem (#2112)
* skeleton

* skeleton aux-schema module

* start approval types

* start aux schema with aux store

* doc

* finish basic types

* start approval types

* doc

* finish basic types

* write out schema types

* add debug and codec impls to approval types

* add debug and codec impls to approval types

also add some key computation

* add debug and codec impls to approval types

* getters for block and candidate entries

* grumbles

* remove unused AssignmentId

* load_decode utility

* implement DB clearing

* function for adding new block entry to aux store

* start `canonicalize` implementation

* more skeleton

* finish implementing canonicalize

* tag TODO

* implement a test AuxStore

* add allow(unused)

* basic loading and deleting test

* block_entry test function

* add a test for `add_block_entry`

* ensure range is exclusive at end

* test clear()

* test that add_block sets children

* add a test for canonicalize

* extract Pre-digest from header

* utilities for extracting RelayVRFStory from the header-chain

* add approval voting message types

* approval distribution message type

* subsystem skeleton

* state struct

* add futures-timer

* prepare service for babe slot duration

* more skeleton

* better integrate AuxStore

* RelayVRF -> RelayVRFStory

* canonicalize

* implement some tick functionality

* guide: tweaks

* check_approval

* more tweaks and helpers

* guide: add core index to candidate event

* primitives: add core index to candidate event

* runtime: add core index to candidate events

* head handling (session window)

* implement `determine_new_blocks`

* add TODO

* change error type on functions

* compute RelayVRFModulo assignments

* compute RelayVRFDelay assignments

* fix delay tranche calc

* assignment checking

* pluralize

* some dummy code for fetching assignments

* guide: add babe epoch runtime API

* implement a current_epoch() runtime API

* compute assignments

* candidate events get backing group

* import blocks and assignments into DB

* push block approval meta

* add message types, no overseer integration yet

* notify approval distribution of new blocks

* refactor import into separate functions

* impl tranches_to_approve

* guide: improve function signatures

* guide: remove Tick from ApprovalEntry

* trigger and broadcast assignment

* most of approval launching

* remove byteorder crate

* load blocks back to finality, except on startup

* check unchecked assignments

* add claimed core to approval voting message

* fix checks

* assign only to backing group

* remove import_checked_assignment from guide

* newline

* import assignments

* abstract out a bit

* check and import approvals

* check full approvals from assignment import too

* comment

* create a Transaction utility

* must_use

* use transaction in `check_full_approvals`

* wire up wakeups

* add Ord to CandidateHash

* wakeup refactoring

* return candidate info from add_block_entry

* schedule wakeups

* background task: do candidate validation

* forward candidate validation requests

* issue approval votes when requested

* clean up a couple TODOs

* fix up session caching

* clean up last unimplemented!() items

* fix remaining warnings

* remove TODO

* implement handle_approved_ancestor

* update Cargo.lock

* fix runtime API tests

* guide: cleanup assignment checking

* use claimed candidate index instead of core

* extract time to a trait

* tests module

* write a mock clock for testing

* allow swapping out the clock

* make abstract over assignment criteria

* add some skeleton tests and simplify params

* fix backing group check

* do backing group check inside check_assignment_cert

* write some empty test functions to implement

* add a test for non-backing

* test that produced checks pass

* some empty test ideas

* runtime/inclusion: remove outdated TODO

* fix compilation

* av-store: fix tests

* dummy cert

* criteria tests

* move `TestStore` to main tests file

* fix unused warning

* test harness beginnings

* resolve slots renaming fallout

* more compilation fixes

* wip: extract pure data into a separate module

* wip: extract pure data into a separate module

* move types completely to v1

* add persisted_entries

* add conversion trait impls

* clean up some warnings

* extract import logic to own module

* schedule wakeups

* experiment with Actions

* uncomment approval-checking

* separate module for approval checking utilities

* port more code to use actions

* get approval pipeline using actions

* all logic is uncommented

* main loop processes actions

* all loop logic uncommented

* separate function for handling actions

* remove last unimplemented item

* clean up warnings

* State gives read-only access to underlying DB

* tests for approval checking

* tests for approval criteria

* skeleton test module for import

* list of import tests to do

* some test glue code

* test reject bad assignment

* test slot too far in future

* test reject assignment with unknown candidate

* remove loads_blocks tests

* determine_new_blocks back to finalized & harness

* more coverage for determining new blocks

* make `imported_block_info` have less reliance on State

* candidate_info tests

* tests for session caching

* remove println

* extricate DB and main TestStores

* rewrite approval checking logic to counteract early delays

* move state out of function

* update approval-checking tests

* tweak wakeups & scheduling logic

* rename check_full_approvals

* test that assignment import updates candidate

* some approval import tests

* some tests for check_and_apply_approval

* add 'full' qualifier to avoid confusion

* extract should-trigger logic to separate function

* some tests for all triggering

* tests for when we trigger assignments

* test wakeups

* add block utilities for testing

* some more tests for approval updates

* approved_ancestor tests

* new action type for launch approval

* process-wakeup tests

* clean up some warnings

* fix in_future test

* approval checking tests

* tighten up too-far-in-future

* special-case genesis when caching sessions

* fix bitfield len

Co-authored-by: Andronik Ordian <write@reusable.software>
2021-02-11 10:21:47 -06:00
Andronik Ordian 6981a1c366 validator_discovery: pass PeerSet to the request (#2372)
* validator_discovery: pass PeerSet to the request

* validator_discovery: track PeerSet of connected peers

* validator_discovery: fix tests

* validator_discovery: fix long line

* some fixes

* some validator_discovery logs

* log validator discovery request

* Also connect to validators on `DistributePoV`.

* validator_discovery: store the whole state per peer_set

* bump spec versions in kusama, polkadot and westend

* Correcting doc.

* validator_discovery: bump channel capacity

* pov-distribution: some cleanup

* this should fix the test, but it does not

* I just got some brain damage while fixing this

Why are you even reading this???

* wrap long line

* address some review nits

Co-authored-by: Robert Klotzner <robert.klotzner@gmx.at>
2021-02-08 08:57:59 +01:00
Robert Klotzner 0cb1ccd122 Generic request/response infrastructure for Polkadot (#2352)
* Move NetworkBridgeEvent to subsystem::messages.

It is not protocol related at all, it is in fact only part of the
subsystem communication as it gets wrapped into messages of each
subsystem.

* Request/response infrastructure is taking shape.

WIP: Does not compile.

* Multiplexer variant not supported by Rusts type system.

* request_response::request type checks.

* Cleanup.

* Minor fixes for request_response.

* Implement request sending + move multiplexer.

Request multiplexer is moved to bridge, as there the implementation is
more straight forward as we can specialize on `AllMessages` for the
multiplexing target.

Sending of requests is mostly complete, apart from a few `From`
instances. Receiving is also almost done, initializtion needs to be
fixed and the multiplexer needs to be invoked.

* Remove obsolete multiplexer.

* Initialize bridge with multiplexer.

* Finish generic request sending/receiving.

Subsystems are now able to receive and send requests and responses via
the overseer.

* Doc update.

* Fixes.

* Link issue for not yet implemented code.

* Fixes suggested by @ordian - thanks!

- start encoding at 0
- don't crash on zero protocols
- don't panic on not yet implemented request handling

* Update node/network/protocol/src/request_response/v1.rs

Use index 0 instead of 1.

Co-authored-by: Andronik Ordian <write@reusable.software>

* Update node/network/protocol/src/request_response.rs

Co-authored-by: Andronik Ordian <write@reusable.software>

* Fix existing tests.

* Better avoidance of division by zoro errors.

* Doc fixes.

* send_request -> start_request.

* Fix missing renamings.

* Update substrate.

* Pass TryConnect instead of true.

* Actually import `IfDisconnected`.

* Fix wrong import.

* Update node/network/bridge/src/lib.rs

typo

Co-authored-by: Pierre Krieger <pierre.krieger1708@gmail.com>

* Update node/network/bridge/src/multiplexer.rs

Remove redundant import.

Co-authored-by: Pierre Krieger <pierre.krieger1708@gmail.com>

* Stop doing tracing from within `From` instance.

Thanks for the catch @tomaka!

* Get rid of redundant import.

* Formatting cleanup.

* Fix tests.

* Add link to issue.

* Clarify comments some more.

* Fix tests.

* Formatting fix.

* tabs

* Fix link

Co-authored-by: Bernhard Schuster <bernhard@ahoi.io>

* Use map_err.

Co-authored-by: Bernhard Schuster <bernhard@ahoi.io>

* Improvements inspired by suggestions by @drahnr.

- Channel size is now determined by function.
- Explicitely scope NetworkService::start_request.

Co-authored-by: Andronik Ordian <write@reusable.software>
Co-authored-by: Pierre Krieger <pierre.krieger1708@gmail.com>
Co-authored-by: Bernhard Schuster <bernhard@ahoi.io>
2021-02-03 20:21:09 +00:00
Peter Goodspeed-Niklaus 25cfb884af misbehavior: report multiple offenses per validator as necessary (#2222)
* use proper descriptive generic type names

* cleanup

* Table stores a list of detected misbehavior per authority

* add Table::drain_misbehaviors_for

* WIP: unify misbehavior types; report multiple misbehaviors per validator

Code checks, but tests don't yet pass.

* update drain_misbehaviors: return authority id as well as specific misbehavior

* enable unchecked construction of Signed structs in tests

* remove test-features feature & unnecessary generic

* fix backing tests

This took a while to figure out, because where we'd previously been
passing around `SignedFullStatement`s, we now needed to construct
those on the fly within the test, to take advantage of the signature-
checking in the constructor. That, in turn, necessitated changing the
iterable type of `drain_misbehaviors` to return the validator index,
and passing that validator index along within the misbehavior report.

Once that was sorted, however, it became relatively straightforward:
just needed to add appropriate methods to deconstruct the misbehavior
reports, and then we could construct the signed statements directly.

* fix bad merge
2021-01-28 15:19:05 +01:00
Andronik Ordian 3f1e1a6ff7 impl approval distribution (#2160)
* initial impl approval distribution

* initial tests and fixes

* batching seems difficult: different peers have different needs

* bridge: fix test after merge

* some guide updates

* only send assignments to peers who know about the block

* fix a test, add approvals test

* simplify

* do not send assignment to peers for finalized blocks

* guide: protocol input and output

* one more test

* more comments, logs, initial metrics

* fix a typo

* one more thing: early return when reimporting a thing locally
2021-01-25 18:14:32 -05:00
Sergei Shulepov 226af6a877 Remove TransientValidationData (#2272)
* collation-generation: use persisted validation data

* node: remote FullValidationData API

* runtime: remove FullValidationData API

* backing tests: use persisted validation data

* FullCandidateReceipt: use persisted validation data

This is not a big change since this type is not used anywhere

* Remove ValidationData and TransientValidationData

Also update the guide
2021-01-18 18:57:09 -05:00
Fedor Sakharov 90a686266f Availability recovery subsystem (#2122)
* Adds message types

* Add code skeleton

* Adds subsystem code.

* Adds a first test

* Adds interaction result to availability_lru

* Use LruCache instead of a HashMap

* Whitespaces to tabs

* Do not ignore errors

* Change error type

* Add a timeout to chunk requests

* Add custom errors and log them

* Adds replace_availability_recovery method

* recovery_threshold computed by erasure crate

* change core to std

* adds docs to error type

* Adds a test for invalid reconstruction

* refactors interaction run into multiple methods

* Cleanup AwaitedChunks

* Even more fixes

* Test that recovery with wrong root is an error

* Break to launch another requests

* Styling fixes

* Add SessionIndex to API

* Proper relay parents for MakeRequest

* Remove validator_discovery and use message

* Remove a stream on exhaustion

* On cleanup free the request streams

* Fix merge and refactor
2021-01-15 02:06:25 +00:00
Robert Habermeier 9dee08af2f Batch messages to network bridge and introduce a timeout to SubsystemContext::send_message (#2197)
* guide: batch network messages

* bridge: batch

* av-dist: batch outgoing messages

* time-out message sends in subsystem context

* Update node/subsystem/src/messages.rs

Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>

* Revert "time-out message sends in subsystem context"

This reverts commit d49be62557ec37c8a350b93718acad723df704ef.

* Update node/network/availability-distribution/src/lib.rs

Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>
2021-01-07 16:01:03 -05:00
Fedor Sakharov b97f52a4c8 Optimizations of av-store (#2223)
* Store all chunks and in a single transaction

* Adds chunks LRU to store

* Add pruning records metrics

* Use honest cache instead of LRU

* Remove unnecessary optional cache

* Fix review nits that are fixable
2021-01-07 21:00:30 +00:00
Bastian Köcher 5be092894e Add one Jaeger span per relay parent (#2196)
* Add one Jaeger span per relay parent

This adds one Jaeger span per relay parent, instead of always creating
new spans per relay parent. This should improve the UI view, because
subsystems are now grouped below one common span.

* Fix doc tests

* Replace `PerLeaveSpan` to `PerLeafSpan`

* More renaming

* Moare

* Update node/subsystem/src/lib.rs

Co-authored-by: Andronik Ordian <write@reusable.software>

* Skip the spans

* Increase `spec_version`

Co-authored-by: Andronik Ordian <write@reusable.software>
2021-01-05 15:09:25 +01:00
Robert Habermeier 41102a6ff9 Improve Network Spans (#2169)
* utility functions for erasure-coding threshold

* add candidate-hash tag to candidate jaeger spans

* debug implementation for jaeger span

* add a span to each live candidate in availability dist.

* availability span covers only our piece

* fix tests

* keep span alive slightly longer

* remove spammy bitfield-gossip-received log

* Revert "remove spammy bitfield-gossip-received log"

This reverts commit 831a2db506d66f64ea516af3caf891e8643f5c43.

* add claimed validator to bitfield-gossip span

* add peer-id to handle-incoming span

* add peer-id to availability distribution span

* Update node/network/availability-distribution/src/lib.rs

Co-authored-by: Bernhard Schuster <bernhard@ahoi.io>

* Update erasure-coding/src/lib.rs

Co-authored-by: Bernhard Schuster <bernhard@ahoi.io>

* Update node/subsystem/src/jaeger.rs

Co-authored-by: Bernhard Schuster <bernhard@ahoi.io>

Co-authored-by: Bernhard Schuster <bernhard@ahoi.io>
2021-01-04 17:29:53 +00:00
Robert Habermeier 01d271caeb Fix statement distribution: forward statements to other peers. (#2146)
* add candidate hash statement circulation span

* add relay-parent to hash-span

* Some typos and misspellings in docs I found, during my studies. (#2144)

* Fix stale link to overseer docs

* Some typos and mispellings in docs/comments

I found during studying how Polkadot works.

* Rococo V1 (#2141)

* Update to latest master and use 30 minutes sessions

* add bootnodes to chainspec

* Update Substrate

* Update chain-spec

* Update Cargo.lock

* GENESIS

* Change session length to one hour

* Bump spec_version to not fuck anything up ;)

Co-authored-by: Erin Grasmick <erin@parity.io>

* avoid creating duplicate unbacked spans when we see extra statements (#2145)

* improve jaeger spans for statement distribution

* tweak and add failing test for repropagation

* make a change that gets the test passing

* guide: clarify

* remove semicolon

Co-authored-by: Robert Klotzner <eskimor@users.noreply.github.com>
Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>
Co-authored-by: Erin Grasmick <erin@parity.io>
2020-12-20 18:52:07 -05:00
Robert Habermeier e95be77eb6 overseer: observe stalled subsystems and shut down (#2148)
* overseer: observe stalled subsystems and shut down

* notify on send_message failure as well
2020-12-20 20:30:02 +00:00
Pierre Krieger a141a6fbc9 Fix Jaeger service name (#2140) 2020-12-18 16:09:33 +00:00
Bastian Köcher d0c97539e4 Fix bug and further optimizations in availability distribution (#2104)
* Fix bug and further optimizations in availability distribution

- There was a bug that resulted in only getting one candidate per block
as the candidates were put into the hashmap with the relay block hash as
key. The solution for this is to use the candidate hash and the relay
block hash as key.
- We stored received/sent messages with the candidate hash and chunk
index as key. The candidate hash wasn't required in this case, as the
messages are already stored per candidate.

* Update node/core/bitfield-signing/src/lib.rs

Co-authored-by: Robert Habermeier <rphmeier@gmail.com>

* Remove the reverse map

* major refactor of receipts & query_live

* finish refactoring

remove ancestory mapping,

improve relay-parent cleanup & receipts-cache cleanup,
add descriptor to `PerCandidate`

* rename and rewrite query_pending_availability

* add a bunch of consistency tests

* Add some last changes

* xy

* fz

* Make it compile again

* Fix one test

* Fix logging

* Remove some buggy code

* Make tests work again

* Move stuff around

* Remove dbg

* Remove state from test_harness

* More refactor and new test

* New test and fixes

* Move metric

* Remove "duplicated code"

* Fix tests

* New test

* Change break to continue

* Update node/core/av-store/src/lib.rs

* Update node/core/av-store/src/lib.rs

* Update node/core/bitfield-signing/src/lib.rs

Co-authored-by: Fedor Sakharov <fedor.sakharov@gmail.com>

* update guide to match live_candidates changes

* add comment

* fix bitfield signing

Co-authored-by: Robert Habermeier <rphmeier@gmail.com>
Co-authored-by: Bernhard Schuster <bernhard@ahoi.io>
Co-authored-by: Fedor Sakharov <fedor.sakharov@gmail.com>
2020-12-17 19:09:17 +00:00
Andronik Ordian 3f5156e866 refactor View to include finalized_number (#2128)
* refactor View to include finalized_number

* guide: update the NetworkBridge on BlockFinalized

* av-store: fix the tests

* actually fix tests

* grumbles

* ignore macro doctest

* use Hash::repeat_bytes more consistently

* broadcast empty leaves updates as well

* fix issuing view updates on empty leaves updates
2020-12-17 12:50:58 -05:00
Pierre Krieger e6e3bda3f0 Improve Jaeger errors and debugging experience (#2127)
* Improve Jaeger errors and debugging experience

* Bind on 0.0.0.0:0 instead
2020-12-17 14:38:56 +00:00
Bernhard Schuster a5fe710cc6 initial jaeger integration (#2047)
Co-authored-by: Pierre Krieger <pierre.krieger1708@gmail.com>
Co-authored-by: Peter Goodspeed-Niklaus <coriolinus@users.noreply.github.com>
Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>
2020-12-11 17:38:55 +01:00
Bernhard Schuster 35c71bf315 addition error definitions (#2107)
* remove low information density error doc comments

* another round of error dancing

* fix compilation

* remove stale `None` argument

* adjust test, minor slip in command

* only add AvailabilityError for full node features

* another None where none shuld be
2020-12-10 16:57:36 +01:00
Peter Goodspeed-Niklaus e7e9605f87 do not store backed candidates in the provisioner (#1909)
* guide: non-semantic changes

* guide: update per the issue description

* GetBackedCandidates operates on multiple hashes now

* GetBackedCandidates still needs a relay parent

* implement changes specified in guide

* distinguish between various occasions for canceled oneshots

* add tracing info to getbackedcandidates

* REVERT ME: add tracing messages for GetBackedCandidates

Note that these messages are only sometimes actually passed on to the
candidate backing subsystem, with the consequence that it is
unexpectedly frequent that the provisioner fails to create its
provisionable data.

* REVERT ME: more tracing logging

* REVERT ME: log when CandidateBackingJob receives any message at all

* REVERT ME: log when send_msg sends a message to a job

* fix candidate-backing tests

* streamline GetBackedCandidates

This uses table.attested_candidate instead of table.get_candidate, because
it's not obvious how to get a BackedCandidate from just a
CommittedCandidateReceipt.

* REVERT ME: more logging tracing job lifespans

* promote warning about job premature demise

* don't terminate CandiateBackingJob::run_loop in event of failure to process message

* Revert "REVERT ME: more logging tracing job lifespans"

This reverts commit 7365f2fb3dec988d95cfcd317eba75587fe7fd16.

* Revert "REVERT ME: log when send_msg sends a message to a job"

This reverts commit 58e46aad038e6517d6d56390c8be65b046a21884.

* Revert "REVERT ME: log when CandidateBackingJob receives any message at all"

This reverts commit 0d6f38413c7c66b5e9e81dabc587906fa9f82656.

* Revert "REVERT ME: more tracing logging"

This reverts commit 675fd2628e84d1596965280e7314155ef21b28e6.

* Revert "REVERT ME: add tracing messages for GetBackedCandidates"

This reverts commit e09e156493430b33b6c8ab4b5cedb3f2f91afd51.

* formatting

* add logging message to CandidateBackingJob::run_loop start

* REVERT ME: add tracing to candidate-backing job creation

* run candidatebacking loop even if no assignment

* use unique error variants for each canceled oneshot

* Revert "REVERT ME: add tracing to candidate-backing job creation"

This reverts commit 8ce5f4f0bd7186dade134b118751480f72ea1fd6.

* try_runtime_api more to reduce silent exits

* add sanity check that returned backed candidates preserve ordering

* remove redundant err attribute
2020-12-04 11:24:59 +01:00
Bastian Köcher 536dceb4f6 Simplify subsystem jobs (#2037)
* Simplify subsystem jobs

This pr simplifies the subsystem jobs interface. Instead of requiring an
extra message that is used to signal that a job should be ended, a job
now ends when the receiver returns `None`. Besides that it changes the
interface to enforce that messages to a job provide a relay parent.

* Drop ToJobTrait

* Remove FromJob

We always convert this message to FromJobCommand anyway.
2020-11-30 17:01:26 +01:00
Robert Habermeier 0a79d663e4 Move erasure root out of candidate commitments and into descriptor (#2010)
* guide: move erasure-root to candidate descriptor

* primitives: move erasure root to descriptor

* guide: unify candidate commitments and validation outputs

* primitives: unify validation outputs and candidate commitments

* parachains-runtime: fix fallout

* runtimes: fix fallout

* collation generation: fix fallout

* fix stray reference in primitives

* fix fallout in node-primitives

* fix remaining fallout in collation generation

* fix fallout in candidate validation

* fix fallout in runtime API subsystem

* fix fallout in subsystem messages

* fix fallout in candidate backing

* fix fallout in availability distribution

* don't clone

* clone

Co-authored-by: Sergei Shulepov <sergei@parity.io>
2020-11-27 16:39:42 +00:00
Andronik Ordian 39a12b68f6 past-session validator discovery APIs (#2009)
* guide: fix formatting for SessionInfo module

* primitives: SessionInfo type

* punt on approval keys

* ah, revert the type alias

* session info runtime module skeleton

* update the guide

* runtime/configuration: sync with the guide

* runtime/configuration: setters for newly added fields

* runtime/configuration: set codec indexes

* runtime/configuration: update test

* primitives: fix SessionInfo definition

* runtime/session_info: initial impl

* runtime/session_info: use initializer for session handling (wip)

* runtime/session_info: mock authority discovery trait

* guide: update the initializer's order

* runtime/session_info: tests skeleton

* runtime/session_info: store n_delay_tranches in Configuration

* runtime/session_info: punt on approval keys

* runtime/session_info: add some basic tests

* Update primitives/src/v1.rs

* small fixes

* remove codec index annotation on structs

* fix off-by-one error

* validator_discovery: accept a session index

* runtime: replace validator_discovery api with session_info

* Update runtime/parachains/src/session_info.rs

Co-authored-by: Sergei Shulepov <sergei@parity.io>

* runtime/session_info: add a comment about missing entries

* runtime/session_info: define the keys

* util: expose connect_to_past_session_validators

* util: allow session_info requests for jobs

* runtime-api: add mock test for session_info

* collator-protocol: add session_index to test state

* util: fix error message for runtime error

* fix compilation

* fix tests after merge with master

Co-authored-by: Sergei Shulepov <sergei@parity.io>
2020-11-26 11:02:50 +00:00
Bastian Köcher 4ce744818c Some code cleanup in overseer (#2008)
* Some code cleanup in overseer

- Switches to select! in the overseer run loop to be more fair about
message processing between the different sources.
- Added a check to only send `ActiveLeaves` if the update actually
contains any data.

* Move the check

* Restore old behavior

* Simplify message sending and signal sending to subsystems

* Update node/subsystem/src/lib.rs
2020-11-25 09:27:50 +00:00
Andronik Ordian 69b103b1d5 overseer: send_msg should not return an error (#1995)
* send_message should not return an error

* Apply suggestions from code review

Co-authored-by: Peter Goodspeed-Niklaus <coriolinus@users.noreply.github.com>

* s/send_logging_error/send_and_log_error

Co-authored-by: Peter Goodspeed-Niklaus <coriolinus@users.noreply.github.com>
2020-11-23 12:42:14 +01:00
Andronik Ordian 97f5bd9047 cleanup validator discovery (#1992)
* use snake_case for log targets

* remove unused continue

* validator_discovery: when disconnecting, use all addresses

* validator_discovery: simplify request revokation

* fix a typo
2020-11-20 18:34:57 +00:00
Peter Goodspeed-Niklaus e49989971d Add tracing support to node (#1940)
* drop in tracing to replace log

* add structured logging to trace messages

* add structured logging to debug messages

* add structured logging to info messages

* add structured logging to warn messages

* add structured logging to error messages

* normalize spacing and Display vs Debug

* add instrumentation to the various 'fn run'

* use explicit tracing module throughout

* fix availability distribution test

* don't double-print errors

* remove further redundancy from logs

* fix test errors

* fix more test errors

* remove unused kv_log_macro

* fix unused variable

* add tracing spans to collation generation

* add tracing spans to av-store

* add tracing spans to backing

* add tracing spans to bitfield-signing

* add tracing spans to candidate-selection

* add tracing spans to candidate-validation

* add tracing spans to chain-api

* add tracing spans to provisioner

* add tracing spans to runtime-api

* add tracing spans to availability-distribution

* add tracing spans to bitfield-distribution

* add tracing spans to network-bridge

* add tracing spans to collator-protocol

* add tracing spans to pov-distribution

* add tracing spans to statement-distribution

* add tracing spans to overseer

* cleanup
2020-11-20 12:02:04 +01:00
Sergei Shulepov fb6b94eaf0 Don't just swallow message in dummy-subsystem, but log (#1935)
* Don't just swallow error in dummy-subsystem, but log

* Add target to log.
2020-11-10 12:17:27 +01:00
Sergei Shulepov c96f8cfcca Implement HRMP (#1900)
* HRMP: Update the impl guide

* HRMP: Incorporate the channel notifications into the guide

* HRMP: Renaming in the impl guide

* HRMP: Constrain the maximum number of HRMP messages per candidate

This commit addresses the HRMP part of https://github.com/paritytech/polkadot/issues/1869

* XCM: Introduce HRMP related message types

* HRMP: Data structures and plumbing

* HRMP: Configuration

* HRMP: Data layout

* HRMP: Acceptance & Enactment

* HRMP: Test base logic

* Update adder collator

* HRMP: Runtime API for accessing inbound messages

Also, removing some redundant fully-qualified names.

* HRMP: Add diagnostic logging in acceptance criteria

* HRMP: Additional tests

* Self-review fixes

* save test refactorings for the next time

* Missed a return statement.

* a formatting blip

* Add missing logic for appending HRMP digests

* Remove the channel contents vectors which became empty

* Tighten HRMP channel digests invariants.

* Apply suggestions from code review

Co-authored-by: Peter Goodspeed-Niklaus <coriolinus@users.noreply.github.com>

* Remove a note about sorting for channel id

* Add missing rustdocs to the configuration

* Clarify and update the invariant for HrmpChannelDigests

* Make the onboarding invariant less sloppy

Namely, introduce `Paras::is_valid_para` (in fact, it already is present
in the implementation) and hook up the invariant to that.

Note that this says "within a session" because I don't want to make it
super strict on the session boundary. The logic on the session boundary
should be extremely careful.

* Make `CandidateCheckContext` use T::BlockNumber for hrmp_watermark

Co-authored-by: Peter Goodspeed-Niklaus <coriolinus@users.noreply.github.com>
2020-11-06 15:35:36 +00:00
Bastian Köcher 640264f38b Make CandidateHash a real type (#1916)
* Make `CandidateHash` a real type

This pr adds a new type `CandidateHash` that is used instead of the
opaque `Hash` type. This helps to ensure on the type system level that
we are passing the correct types.

This pr also fixes wrong usage of `relay_parent` as `candidate_hash`
when communicating with the av storage.

* Update core-primitives/src/lib.rs

Co-authored-by: Peter Goodspeed-Niklaus <coriolinus@users.noreply.github.com>

* Wrap the lines

Co-authored-by: Peter Goodspeed-Niklaus <coriolinus@users.noreply.github.com>
2020-11-05 16:28:45 +01:00
Bastian Köcher 04da99de58 Moare fixes for parachains (#1911)
* Moare fixes for parachains

- Sending data to a job should always contain a relay parent. Done this
for the provisioner
- Fixed the `select_availability_bitfields` function. It was assuming we
have one core per validator, while we only have one core per parachain.
- Drive by async "rewrite" in proposer

* Make tests compile

* Update primitives/src/v1.rs

Co-authored-by: Peter Goodspeed-Niklaus <coriolinus@users.noreply.github.com>

Co-authored-by: Peter Goodspeed-Niklaus <coriolinus@users.noreply.github.com>
2020-11-03 18:16:47 +01:00
Robert Habermeier 1f4121c444 Runtime API for historical validation code (#1893)
* fix: ensure candidate validation gets code based on occupied core assumption

* guide: runtime API for historical validation code

* add historical runtime API

* integrate into runtime API subsystem

* remove blocked TODO

* fix service build: enable notifications protocol only under real overseer

* Update node/subsystem/src/messages.rs

Co-authored-by: Sergei Shulepov <sergei@parity.io>

* fix compilation

Co-authored-by: Robert Habermeier <robert@Roberts-MacBook-Pro.local>
Co-authored-by: Sergei Shulepov <sergei@parity.io>
2020-11-02 12:19:53 -06:00
Fedor Sakharov 935fcd1666 Change SpawnedSubsystem type to log subsystem errors (#1878)
* Change SpawnedSubsystem type to log subsystem errors

* Remove clone
2020-10-28 20:57:06 +01:00
Sergei Shulepov 9903bca259 Downward Message Processing implementation (#1859)
* DMP: data structures and plumbing

* DMP: Implement DMP logic in the router module

DMP: Integrate DMP parts into the inclusion module

* DMP: Introduce the max size limit for the size of a downward message

* DMP: Runtime API for accessing inbound messages

* OCD

Small clean ups

* DMP: fix the naming of the error

* DMP: add caution about a non-existent recipient
2020-10-28 10:41:42 +00:00
Peter Goodspeed-Niklaus 1a25c41277 start working on building the real overseer (#1795)
* start working on building the real overseer

Unfortunately, this fails to compile right now due to an upstream
failure to compile which is probably brought on by a recent upgrade
to rustc v1.47.

* fill in AllSubsystems internal constructors

* replace fn make_metrics with Metrics::attempt_to_register

* update to account for #1740

* remove Metrics::register, rename Metrics::attempt_to_register

* add 'static bounds to real_overseer type params

* pass authority_discovery and network_service to real_overseer

It's not straightforwardly obvious that this is the best way to handle
the case when there is no authority discovery service, but it seems
to be the best option available at the moment.

* select a proper database configuration for the availability store db

* use subdirectory for av-store database path

* apply Basti's patch which avoids needing to parameterize everything on Block

* simplify path extraction

* get all tests to compile

* Fix Prometheus double-registry error

for debugging purposes, added this to node/subsystem-util/src/lib.rs:472-476:

```rust
Some(registry) => Self::try_register(registry).map_err(|err| {
	eprintln!("PrometheusError calling {}::register: {:?}", std::any::type_name::<Self>(), err);
	err
}),
```

That pointed out where the registration was failing, which led to
this fix. The test still doesn't pass, but it now fails in a new
and different way!

* authorities must have authority discovery, but not necessarily overseer handlers

* fix broken SpawnedSubsystem impls

detailed logging determined that using the `Box::new` style of
future generation, the `self.run` method was never being called,
leading to dropped receivers / closed senders for those subsystems,
causing the overseer to shut down immediately.

This is not the final fix needed to get things working properly,
but it's a good start.

* use prometheus properly

Prometheus lets us register simple counters, which aren't very
interesting. It also allows us to register CounterVecs, which are.
With a CounterVec, you can provide a set of labels, which can
later be used to filter the counts.

We were using them wrong, though. This pattern was repeated in a
variety of places in the code:

```rust
// panics with an cardinality mismatch
let my_counter = register(CounterVec::new(opts, &["succeeded", "failed"])?, registry)?;
my_counter.with_label_values(&["succeeded"]).inc()
```

The problem is that the labels provided in the constructor are not
the set of legal values which can be annotated, but a set of individual
label names which can have individual, arbitrary values.

This commit fixes that.

* get av-store subsystem to actually run properly and not die on first signal

* typo fix: incomming -> incoming

* don't disable authority discovery in test nodes

* Fix rococo-v1 missing session keys

* Update node/core/av-store/Cargo.toml

* try dummying out av-store on non-full-nodes

* overseer and subsystems are required only for full nodes

* Reduce the amount of warnings on browser target

* Fix two more warnings

* InclusionInherent should actually have an Inherent module on rococo

* Ancestry: don't return genesis' parent hash

* Update Cargo.lock

* fix broken test

* update test script: specify chainspec as script argument

* Apply suggestions from code review

Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>

* Update node/service/src/lib.rs

Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>

* node/service/src/lib: Return error via ? operator

* post-merge blues

* add is_collator flag

* prevent occasional av-store test panic

* simplify fix; expand application

* run authority_discovery in Role::Discover when collating

* distinguish between proposer closed channel errors

* add IsCollator enum, remove is_collator CLI flag

* improve formatting

* remove nop loop

* Fix some stuff

Co-authored-by: Andronik Ordian <write@reusable.software>
Co-authored-by: Bastian Köcher <git@kchr.de>
Co-authored-by: Fedor Sakharov <fedor.sakharov@gmail.com>
Co-authored-by: Robert Habermeier <robert@Roberts-MBP.lan1>
Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>
Co-authored-by: Max Inden <mail@max-inden.de>
2020-10-28 10:26:50 +00:00
Bernhard Schuster f345123748 introduce errors with info (#1834) 2020-10-27 08:10:03 +01:00
Sergei Shulepov abb282dfd0 Runtime API for checking validation outputs (#1842)
* annoying whitespaces

* update guide

Add `CheckValidationOutputs` runtime api and also change the
candidate-validation stuff

* promote ValidationOutputs to global primitives

i.e. move it from node specific primitives to global v1 primitives. This
will be needed when we share it later in the runtime inclusion module

* refactor acceptance checks in the inclusion module

factor out the common code to share it during the block inclusion and
for the forthcoming CheckValidationOutputs runtime api.

Also note that the acceptance criteria was updated to incorporate checks
that exist now in candidate-validation

* plumb the runtime api outside

* extract validation_data from ValidationOutputs

* use runtime-api to check validation outputs

apart from that refactor, update docs and tidy a bit

* Update the maxium code size

This is to fix a test that performs an upgrade.
2020-10-24 01:48:36 -05:00
Fedor Sakharov ec49d81307 Availability store pruning (#1820)
* Initial commit

* Move tests to separate module

* Move timestamps to the newtype

* Change idx name

* Use Duration for consts and update chunk records

* Ordering::Equal

* Update node/core/av-store/src/lib.rs

Co-authored-by: Peter Goodspeed-Niklaus <coriolinus@users.noreply.github.com>

* put_ methods do the array sorting

* Fix get_block_number

* Change StoreChunk message type and relay parent method

* Add chunk tests

* Fix block number computation for StoreChunk

* Duration instead of u64 everywhere

* Add a clarifying comment

Co-authored-by: Peter Goodspeed-Niklaus <coriolinus@users.noreply.github.com>
2020-10-24 06:03:57 +00:00
Bastian Köcher f2d7b6f5ac Make AllSubsystems usage easier in tests (#1794)
* Make `AllSubsystems` usage easier in tests

This makes the usage of `AllSubsystems` easier in tests by introducing
new methods.

- `dummy` initializes `AllSubsystems` with all systems set to dummy
- `replace_*` to replace any subsystem

Besides that this pr adds a `ForwardSubsystem` that is also useful for
tests. This subsystem will forward all incoming messages to the given channel.

* Update node/overseer/src/lib.rs

Co-authored-by: Andronik Ordian <write@reusable.software>

* Update node/subsystem/src/lib.rs

Co-authored-by: Andronik Ordian <write@reusable.software>

* Update node/subsystem/src/lib.rs

Co-authored-by: Andronik Ordian <write@reusable.software>

* Move ForwardSubsystem and add a test

* Break some lines

Co-authored-by: Andronik Ordian <write@reusable.software>
2020-10-08 11:27:19 +02:00
Andronik Ordian d00bdfef08 chain-api subsystem: implement BlockHeader messsage (#1778)
* chain-api subsystem: implement BlockHeader messsage

* update the guide
2020-10-06 16:46:54 +02:00
Andronik Ordian ca89e3edbe NetworkBridge: validator (authorities) discovery api (#1699)
* stupid, but it compiles

* redo

* cleanup

* add ValidatorDiscovery to msgs

* sketch network bridge code

* ConnectToAuthorities instead of validators

* more stuff

* cleanup

* more stuff

* complete ConnectToAuthoritiesState

* Update node/network/bridge/src/lib.rs

Co-authored-by: Peter Goodspeed-Niklaus <coriolinus@users.noreply.github.com>

* Collator protocol subsystem (#1659)

* WIP

* The initial implementation of the collator side.

* Improve comments

* Multiple collation requests

* Add more tests and comments to validator side

* Add comments, remove dead code

* Apply suggestions from code review

Co-authored-by: Peter Goodspeed-Niklaus <coriolinus@users.noreply.github.com>

* Fix build after suggested changes

* Also connect to the next validator group

* Remove a Future impl and move TimeoutExt to util

* Minor nits

* Fix build

* Change FetchCollations back to FetchCollation

* Try this

* Final fixes

* Fix build

Co-authored-by: Peter Goodspeed-Niklaus <coriolinus@users.noreply.github.com>

* handle multiple in-flight connection requests

* handle cancelled requests

* Update node/core/runtime-api/src/lib.rs

Co-authored-by: Bernhard Schuster <bernhard@ahoi.io>

* redo it again

* more stuff

* redo it again

* update comments

* workaround Future is not Send

* fix trailing spaces

* clarify comments

* bridge: fix compilation in tests

* update more comments

* small fixes

* port collator protocol to new validator discovery api

* collator tests compile

* collator tests pass

* do not revoke a request when the stream receiver is closed

* make revoking opt-in

* fix is_fulfilled

* handle request revokation in collator

* tests

* wait for validator connections asyncronously

* fix compilation

* relabel my todos

* apply Fedor's patch

* resolve reconnection TODO

* resolve revoking TODO

* resolve channel capacity TODO

* resolve peer cloning TODO

* resolve peer disconnected TODO

* resolve PeerSet TODO

* wip tests

* more tests

* resolve Arc TODO

* rename pending to non_revoked

* one more test

* extract utility function into util crate

* fix compilation in tests

* Apply suggestions from code review

Co-authored-by: Fedor Sakharov <fedor.sakharov@gmail.com>

* revert pin_project removal

* fix while let loop

* Revert "revert pin_project removal"

This reverts commit ae7f529d8de982ef66c3007dd1ff74c6ddce80d2.

* fix compilation

* Update node/subsystem/src/messages.rs

* docs on pub items

* guide updates

* remove a TODO

* small guide update

* fix a typo

* link to the issue

* validator discovery: on_request docs

Co-authored-by: Peter Goodspeed-Niklaus <coriolinus@users.noreply.github.com>
Co-authored-by: Fedor Sakharov <fedor.sakharov@gmail.com>
Co-authored-by: Bernhard Schuster <bernhard@ahoi.io>
2020-10-06 13:34:57 +02:00
Bastian Köcher 5d8ae8d024 Derive From for AllMessages and simplify send_msg (#1774) 2020-10-01 16:22:15 +00:00
Andronik Ordian de05bec4d6 move Metrics to utils (#1765) 2020-09-29 11:42:20 +00:00
Fedor Sakharov 9756b2d676 Register listeners in statement distribution (#1759)
* Register listeners in statement distribution

* Review fixes
2020-09-28 13:55:59 +00:00
Fedor Sakharov 98660cbd94 Collator protocol subsystem (#1659)
* WIP

* The initial implementation of the collator side.

* Improve comments

* Multiple collation requests

* Add more tests and comments to validator side

* Add comments, remove dead code

* Apply suggestions from code review

Co-authored-by: Peter Goodspeed-Niklaus <coriolinus@users.noreply.github.com>

* Fix build after suggested changes

* Also connect to the next validator group

* Remove a Future impl and move TimeoutExt to util

* Minor nits

* Fix build

* Change FetchCollations back to FetchCollation

* Try this

* Final fixes

* Fix build

Co-authored-by: Peter Goodspeed-Niklaus <coriolinus@users.noreply.github.com>
2020-09-10 16:54:59 +03:00
Peter Goodspeed-Niklaus d1b1c17285 implement candidate selection subsystem (#1645)
* choose the straightforward candidate selection algorithm for now

* add draft implementation of candidate selection

* fix typo in summary

* more properly report misbehaving collators

* describe how CandidateSelection subsystem becomes aware of candidates

* revise candidate selection / collator protocol interaction pattern

* implement rest of candidate selection per the guide

* review: resolve nits

* start writing test suite, harness

* implement first test

* add second test

* implement third test

Co-authored-by: Bernhard Schuster <bernhard@ahoi.io>
2020-09-08 09:48:48 +00:00
Robert Habermeier 262574fc49 Implement validation data refactor (#1585)
* update primitives

* correct parent_head field

* make hrmp field pub

* refactor validation data: runtime

* refactor validation data: messages

* add arguments to full_validation_data runtime API

* port runtime API

* mostly port over candidate validation

* remove some parameters from ValidationParams

* guide: update candidate validation

* update candidate outputs

* update ValidationOutputs in primitives

* port over candidate validation

* add a new test for no-transient behavior

* update util runtime API wrappers

* candidate backing

* fix missing imports

* change some fields of validation data around

* runtime API impl

* update candidate validation

* fix backing tests

* grumbles from review

* fix av-store tests

* fix some more crates

* fix provisioner tests

* fix availability distribution tests

* port collation-generation to new validation data

* fix overseer tests

* Update roadmap/implementers-guide/src/node/utility/candidate-validation.md

Co-authored-by: Peter Goodspeed-Niklaus <coriolinus@users.noreply.github.com>

Co-authored-by: Peter Goodspeed-Niklaus <coriolinus@users.noreply.github.com>
2020-08-18 14:41:40 +02:00
Andronik Ordian e7ead40255 initial prometheus metrics (#1536)
* service-new: cosmetic changes

* overseer: draft of prometheus metrics

* metrics: update active_leaves metrics

* metrics: extract into functions

* metrics: resolve XXX

* metrics: it's ugly, but it works

* Bump Substrate

* metrics: move a bunch of code around

* Bumb substrate again

* metrics: fix a warning

* fix a warning in runtime

* metrics: statements signed

* metrics: statements impl RegisterMetrics

* metrics: refactor Metrics trait

* metrics: add Metrics assoc type to JobTrait

* metrics: move Metrics trait to util

* metrics: fix overseer

* metrics: fix backing

* metrics: fix candidate validation

* metrics: derive Default

* metrics: docs

* metrics: add stubs for other subsystems

* metrics: add more stubs and fix compilation

* metrics: fix doctest

* metrics: move to subsystem

* metrics: fix candidate validation

* metrics: bitfield signing

* metrics: av store

* metrics: chain API

* metrics: runtime API

* metrics: stub for avad

* metrics: candidates seconded

* metrics: ok I gave up

* metrics: provisioner

* metrics: remove a clone by requiring Metrics: Sync

* metrics: YAGNI

* metrics: remove another TODO

* metrics: for later

* metrics: add parachain_ prefix

* metrics: s/signed_statement/signed_statements

* utils: add a comment for job metrics

* metrics: address review comments

* metrics: oops

* metrics: make sure to save files before commit 😅

* use _total suffix for requests metrics

Co-authored-by: Max Inden <mail@max-inden.de>

* metrics: add tests for overseer

* update Cargo.lock

* overseer: add a test for CollationGeneration

* collation-generation: impl metrics

* collation-generation: use kebab-case for name

* collation-generation: add a constructor

Co-authored-by: Gav Wood <gavin@parity.io>
Co-authored-by: Ashley Ruglys <ashley.ruglys@gmail.com>
Co-authored-by: Max Inden <mail@max-inden.de>
2020-08-18 11:18:54 +02:00
Andronik Ordian dffefac54e move AssignmentKind and CoreAssigment to scheduler (#1571) 2020-08-17 19:26:13 +02:00
Peter Goodspeed-Niklaus 54bec3bfc0 implement collation generation subsystem (#1557)
* start sketching out a collation generation subsystem

* invent a basic strategy for double initialization

* clean up warnings

* impl util requests from runtime assuming a context instead of a FromJob sender

* implement collation generation algorithm from guide

* update AllMessages in tests

* fix trivial review comments

* remove another redundant declaration from merge

* filter availability cores by para_id

* handle new activations each in their own async task

* update guide according to the actual current implementation

* add initialization to guide

* add general-purpose subsystem_test_harness helper

* write first handle_new_activations test

* add test that handle_new_activations filters local_validation_data requests

* add (failing) test of collation distribution message sending

* rustfmt

* broken: work on fixing sender test

Unfortunately, for reasons that are not yet clear, despite the public key
and checked data being identical, the signer is not producing an identical
signature. This commit produces this output (among more):

signing with  Public(c4733ab0bbe3ba4c096685d1737a7f498cdbdd167a767d04a21dc7df12b8c858 (5GWHUNm5...))
checking with Public(c4733ab0bbe3ba4c096685d1737a7f498cdbdd167a767d04a21dc7df12b8c858 (5GWHUNm5...))
signed payload:  [4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 10, 0, 0, 0, c7, e5, c0, 64, 7a, db, fe, 44, 81, e5, 51, 11, 79, 9f, a5, 63, 93, 94, 3c, c4, 36, c6, 30, 36, c2, c5, 44, a2, 1b, db, b7, 82, 3, 17, a, 2e, 75, 97, b7, b7, e3, d8, 4c, 5, 39, 1d, 13, 9a, 62, b1, 57, e7, 87, 86, d8, c0, 82, f2, 9d, cf, 4c, 11, 13, 14]
checked payload: [4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 10, 0, 0, 0, c7, e5, c0, 64, 7a, db, fe, 44, 81, e5, 51, 11, 79, 9f, a5, 63, 93, 94, 3c, c4, 36, c6, 30, 36, c2, c5, 44, a2, 1b, db, b7, 82, 3, 17, a, 2e, 75, 97, b7, b7, e3, d8, 4c, 5, 39, 1d, 13, 9a, 62, b1, 57, e7, 87, 86, d8, c0, 82, f2, 9d, cf, 4c, 11, 13, 14]

* fix broken test

* collation function returns commitments hash

It doesn't look like we use the actual commitments data anywhere, and
it's not obvious if there are any fields of `CandidateCommitments`
not available to the collator, so this commit just assigns them the
entire responsibility of generating the hash.

* add missing overseer impls

* calculating erasure coding is polkadot's responsibility, not cumulus

* concurrentize per-relay_parent requests
2020-08-17 14:27:37 +02:00