Commit Graph

172 Commits

Author SHA1 Message Date
Andronik Ordian 9ac35d9f2b gossip: choose a random subset on send instead of limiting connections (#2776)
* gossip: choose random subset on send

* naming bikeshed
2021-03-30 20:59:53 +02:00
Andronik Ordian a3115401c3 network-bridge: elevate log level for connections (#2772) 2021-03-30 20:01:57 +02:00
Robert Habermeier 08d5b268a0 Retry availability until the receiver of the request is dropped (#2763)
* guide updates

* keep interactions alive until receivers drop

* retry indefinitely

* cancel approval tasks on finality

* use swap_remove instead of remove
2021-03-30 17:33:38 +02:00
Robert Klotzner 6514e00144 Add tags to pov-fetcher. (#2768)
* Add tags to pov-fetcher.

* Add stage as well.

* Get rid of redundant tags.
2021-03-30 15:07:07 +02:00
Andronik Ordian bdee5a3923 approval-distribution: add an assertion (#2761) 2021-03-30 14:18:34 +02:00
Robert Klotzner 0bc42785b4 availability-distribution: Retry failed fetches on next block. (#2762)
* availability-distribution: Retry on fail on next block.

Retry failed fetches on next block when still pending availability.

* Update node/network/availability-distribution/src/requester/fetch_task/mod.rs

Co-authored-by: Andronik Ordian <write@reusable.software>

* Fix existing tests.

* Add test for trying all validators.

* Add test for testing retries.

Co-authored-by: Andronik Ordian <write@reusable.software>
2021-03-30 00:28:43 +02:00
Robert Habermeier e906598e94 tracing for pending_known map (#2755)
* tracing for pending_known map

* fix condition in retain

* add block hash to pending tracing
2021-03-29 21:38:03 +02:00
Robert Habermeier 54074d2d76 send assignments even when we have an approval (#2757) 2021-03-29 16:34:14 +00:00
Robert Klotzner 0a9fe852df Move non runtime related stuff into node/primitives (#2743)
* Remove stuff out of the runtime that does not belong there.

There might be more, but it is a start.

* White space fixes.

* Fix tests.

* Leave whitespace in ui tests alone.

* Add back zstd for no reason.

* Fix browser wasm (hopefully)
2021-03-29 02:15:44 +02:00
Robert Habermeier 8ebbe19d10 Split NetworkBridge and break cycles with Unbounded (#2736)
* overseer: pass messages directly between subsystems

* test that message is held on to

* Update node/overseer/src/lib.rs

Co-authored-by: Peter Goodspeed-Niklaus <coriolinus@users.noreply.github.com>

* give every subsystem an unbounded sender too

* remove metered_channel::name

1. we don't provide good names
2. these names are never used anywhere

* unused mut

* remove unnecessary &mut

* subsystem unbounded_send

* remove unused MaybeTimer

We have channel size metrics that serve the same purpose better now and the implementation of message timing was pretty ugly.

* remove comment

* split up senders and receivers

* update metrics

* fix tests

* fix test subsystem context

* use SubsystemSender in jobs system now

* refactor of awful jobs code

* expose public `run` on JobSubsystem

* update candidate backing to new jobs & use unbounded

* bitfield signing

* candidate-selection

* provisioner

* approval voting: send unbounded for assignment/approvals

* async not needed

* begin bridge split

* split up network tasks into background worker

* port over network bridge

* Update node/network/bridge/src/lib.rs

Co-authored-by: Andronik Ordian <write@reusable.software>

* rename ValidationWorkerNotifications

Co-authored-by: Peter Goodspeed-Niklaus <coriolinus@users.noreply.github.com>
Co-authored-by: Andronik Ordian <write@reusable.software>
2021-03-29 01:18:53 +02:00
Andronik Ordian 6f464a360f approval-distribution: limit the amount of assignments on unify (#2737)
* approval-distribution: limit the amount of packets on unify

* guide: fix a typo

* compilation fix

* grammar

* Update roadmap/implementers-guide/src/node/approval/approval-distribution.md

Co-authored-by: David <dvdplm@gmail.com>

* more grammar

* propagate only local assignments/approvals after a certain depth

* increase the threshold

* guides update

Co-authored-by: David <dvdplm@gmail.com>
2021-03-28 23:30:06 +02:00
Pierre Krieger e3dc9024ce Call NetworkService::add_known_address before sending a request (#2726)
* Call NetworkService::add_known_address before sending a request

* Better doc

* Update Substrate

* Update Substrate

* Restore the import 🤷‍♀️ I don't know why it compiles locally

* imports correctly

Co-authored-by: Robert Habermeier <rphmeier@gmail.com>
2021-03-28 16:01:49 +00:00
Robert Habermeier 5952e790fa Overseer: subsystems communicate directly (#2227)
* overseer: pass messages directly between subsystems

* test that message is held on to

* Update node/overseer/src/lib.rs

Co-authored-by: Peter Goodspeed-Niklaus <coriolinus@users.noreply.github.com>

* give every subsystem an unbounded sender too

* remove metered_channel::name

1. we don't provide good names
2. these names are never used anywhere

* unused mut

* remove unnecessary &mut

* subsystem unbounded_send

* remove unused MaybeTimer

We have channel size metrics that serve the same purpose better now and the implementation of message timing was pretty ugly.

* remove comment

* split up senders and receivers

* update metrics

* fix tests

* fix test subsystem context

* fix flaky test

* fix docs

* doc

* use select_biased to favor signals

* Update node/subsystem/src/lib.rs

Co-authored-by: Andronik Ordian <write@reusable.software>

Co-authored-by: Peter Goodspeed-Niklaus <coriolinus@users.noreply.github.com>
Co-authored-by: Andronik Ordian <write@reusable.software>
2021-03-28 15:55:10 +00:00
Robert Klotzner c6f07d8f31 Request based PoV distribution (#2640)
* Indentation fix.

* Prepare request-response for PoV fetching.

* Drop old PoV distribution.

* WIP: Fetch PoV directly from backing.

* Backing compiles.

* Runtime access and connection management for PoV distribution.

* Get rid of seemingly dead code.

* Implement PoV fetching.

Backing does not yet use it.

* Don't send `ConnectToValidators` for empty list.

* Even better - no need to check over and over again.

* PoV fetching implemented.

+ Typechecks
+ Should work

Missing:

- Guide
- Tests
- Do fallback fetching in case fetching from seconding validator fails.

* Check PoV hash upon reception.

* Implement retry of PoV fetching in backing.

* Avoid pointless validation spawning.

* Add jaeger span to pov requesting.

* Add back tracing.

* Review remarks.

* Whitespace.

* Whitespace again.

* Cleanup + fix tests.

* Log to log target in overseer.

* Fix more tests.

* Don't fail if group cannot be found.

* Simple test for PoV fetcher.

* Handle missing group membership better.

* Add test for retry functionality.

* Fix flaky test.

* Spaces again.

* Guide updates.

* Spaces.
2021-03-28 17:11:38 +02:00
Andronik Ordian dce20644c8 approval-distribution: moar metrics (#2734) 2021-03-28 00:23:32 +01:00
Andronik Ordian 71f1985172 approval-distribution: moar logs (#2732) 2021-03-27 23:21:25 +01:00
Robert Habermeier c503fbc2a0 duplicate logging fix (#2729)
* duplicate logging fix

* remove duplicate peer IDs
2021-03-27 16:17:35 +01:00
Robert Klotzner 6ea6299bca Reduce network bridge logging verbosity (#2717)
* Those should really be trace.

- Very spammy
- And they in fact trace the execution
- Should not be enabled lightly - will slow network bridge down.

* Make report peers debug again.
2021-03-27 00:19:43 +01:00
Robert Habermeier 064df81ee4 Add block number to activated leaves and associated fixes (#2718)
* add number to `ActivatedLeavesUpdate`

* update subsystem util and overseer

* use new ActivatedLeaf everywhere

* sort view

* sorted and limited view in network bridge

* use live block hash only if it's newer

* grumples
2021-03-26 13:06:40 +01:00
Robert Habermeier 8a396c678f Port availability recovery to use req/res (#2694)
* add AvailableDataFetchingRequest

* rename AvailabilityFetchingRequest to ChunkFetchingRequest

* rename AvailabilityFetchingResponse to Chunk_

* add AvailableDataFetching request

* add available data fetching request to availability recovery message

* remove availability recovery message

* fix

* update network bridge

* port availability recovery to request/response

* use validators.len(), not shuffling

* fix availability recovery tests

* update guide

* Update node/network/availability-recovery/src/lib.rs

Co-authored-by: Bernhard Schuster <bernhard@ahoi.io>

* Update node/network/availability-recovery/src/lib.rs

Co-authored-by: Arkadiy Paronyan <arkady.paronyan@gmail.com>

* remove println

Co-authored-by: Bernhard Schuster <bernhard@ahoi.io>
Co-authored-by: Arkadiy Paronyan <arkady.paronyan@gmail.com>
2021-03-25 15:34:24 +01:00
André Silva bfbb078525 collator-protocol: add message authentication (#2635)
* collator: authenticate collator protocol messages

* fix tests compilation

* node: verify collator protocol signatures in tests

* collator: fix tests

* implementers-guide: update CollatorProtocol messages

* collator: add test for verification of collator protocol signatures

* node: remove fixmes

* node: remove signature from advertisecollation message

* node: add magic constant to Declare message signature payload
2021-03-24 22:13:32 +01:00
Arkadiy Paronyan de85c05102 Tweaked logging (#2695)
* Tweaked logging

* Debug for Statement
2021-03-24 18:06:44 +00:00
Arkadiy Paronyan d78e2fbf86 Additional logging (#2693) 2021-03-24 16:24:54 +00:00
Robert Klotzner fa11c6d785 Unify maximum supported PoV size a bit. (#2691)
* Unify maximum supported PoV size a bit.

* Use MAX_POV_SIZE also in `HostConfiguration`.

* Fix types.
2021-03-24 15:48:36 +01:00
Robert Habermeier b8867d71bc Evict inactive peers from the collator protocol peer-set (#2680)
* malicious reputation cost is fatal

* make ReportBad a malicious cost

* futures control-flow for cleaning up inactive collator peers

* guide: network bridge updates

* add `PeerDisconnected` message

* guide: update

* reverse order

* remember to match

* implement disconnect peer in network bridge

* implement disconnect_inactive_peers

* test

* remove println

* don't hardcore policy

* add fuse outside of loop

* use default eviction policy
2021-03-24 13:32:28 +01:00
Robert Klotzner 0f8b6f2f6e Bigger is better. (#2687)
* Bigger is better.

Made all request response sizes 10 times bigger.

* The smaller the better.

* Update comment.

* Ah, bigger is still better.

Max PoV size for rococo is around 50Meg, compression ratio is about 3.4.
With 30 Meg we should be fine, even with crypto kitties in the PoV.
2021-03-24 13:29:59 +01:00
Arkadiy Paronyan 5929d1ef15 Additional logging for polkadot network protocols (#2684)
* Additional logging for polkadot network protocols

* Additional log

* Update node/network/bitfield-distribution/src/lib.rs

Co-authored-by: Robert Habermeier <rphmeier@gmail.com>

* Update node/network/availability-distribution/src/responder.rs

* Added additional chunk info

* Added additional peer info

Co-authored-by: Robert Habermeier <rphmeier@gmail.com>
2021-03-24 11:55:50 +00:00
Bastian Köcher edb36153b1 Improve logging (#2669)
* Improve logging

* Review feedback

* Fix some warning and some further logging changes
2021-03-23 11:57:59 +01:00
Pierre Krieger bbc3ad3cfc Add debug messaes to the bridge actions (#2668) 2021-03-23 10:52:30 +01:00
Bernhard Schuster ea6294fa79 restructure polkadot-node-jaeger (#2642) 2021-03-19 16:51:16 +01:00
Robert Klotzner 59640a38bc Don't accept incoming connections for collators (#2644)
* Don't accept incoming connections for collators

on the `Collation` peer set.

* Better docs.
2021-03-19 07:20:38 +00:00
Bastian Köcher 15ae5dd410 Improve the logging (#2645) 2021-03-18 23:28:43 +00:00
Robert Klotzner 503e2b74f9 Request based collation fetching (#2621)
* Introduce collation fetching protocol

also move to mod.rs

* Allow `PeerId`s in requests to network bridge.

* Fix availability distribution tests.

* Move CompressedPoV to primitives.

* Request based collator protocol: validator side

- Missing: tests
- Collator side
- don't connect, if not connected

* Fixes.

* Basic request based collator side.

* Minor fix on collator side.

* Don't connect in requests in collation protocol.

Also some cleanup.

* Fix PoV distribution

* Bump substrate

* Add back metrics + whitespace fixes.

* Add back missing spans.

* More cleanup.

* Guide update.

* Fix tests

* Handle results in tests.

* Fix weird compilation issue.

* Add missing )

* Get rid of dead code.

* Get rid of redundant import.

* Fix runtime build.

* Cleanup.

* Fix wasm build.

* Format fixes.

Thanks @andronik !
2021-03-18 09:06:36 +01:00
Robert Habermeier 94d50afd4e Backing and collator protocol traces including para-id (#2620)
* improve backing/provisioner spans

* span for collation requests

* add para_id to unbacked candidate spans

* differentiate validation-construction and find-assignment in selection

* better find-assignment spans

* organize unbacked-candidate spans directly under job root

* Update node/core/provisioner/src/lib.rs

Co-authored-by: Andronik Ordian <write@reusable.software>

Co-authored-by: Andronik Ordian <write@reusable.software>
2021-03-14 16:51:14 +00:00
Andronik Ordian a543b1d6c3 availability distribution: don't early return on runtime errors (#2606)
* availability distribution: don't early return on runtime errors

* log error

* extract runtime api error from Error

* uh

* oh
2021-03-11 12:47:56 -06:00
Robert Habermeier b105d9acc0 more tracing for av-store (#2604)
* more tracing for av-store

* Update node/core/av-store/src/lib.rs

Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>

* Update node/core/av-store/src/lib.rs

Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>

* Update node/core/av-store/src/lib.rs

Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>

* Update tracing everywhere

* Fix build

* More fixes

* Push cargo.lock

* Update

Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>
Co-authored-by: Bastian Köcher <info@kchr.de>
2021-03-11 13:12:34 +01:00
Robert Habermeier 9331e06eda remove statement::invalid (#2597) 2021-03-10 10:31:17 -06:00
Andronik Ordian baa691deb1 prefix parachain log targets with parachain:: (#2600)
* prefix parachain log targets with parachain::

* even more consistent
2021-03-10 17:07:56 +01:00
Robert Habermeier 30e4a67f0c Add some magic to signed statements and approval votes (#2585)
* add a magic number to backing statements encoded

* fix fallout in statement table

* fix some fallout in backing

* add magic to approval votes

* remove last references to Candidate variant

* update size-hint
2021-03-09 17:17:30 +00:00
Robert Klotzner c0347f026a Jaeger spans for availability distribution (#2559)
* Logging functionality for spans.

* Jaeger spans for availability distribution.

* Fix instrumentation to use log target properly.

* Add some tracing instrumentation macros.

* Use int_tags instead of logs.

* Add span per iteration.

* Remove span::log functionality.

* Fix instrumentation log target for real.

* Add jaeger span to responding side as well.

* Revert "Fix instrumentation log target for real."

This reverts commit e1c2a2e6ff6f257e702f07d8a77c2668af92b0ef.

* Revert "Fix instrumentation to use log target properly."

This reverts commit 7caa0bd1acc6fe9727bb3a91851560d756c40ab8.

* target -> subsystem in instrumentatio macro

target is not correct either, and the correct way of using a top level
target = LOG_TARGET does not work, as the macro expects a string literal
and gets confused by the constant `LOG_TARGET`.

* Use kebab-case for spa names.

Co-authored-by: Andronik Ordian <write@reusable.software>

Co-authored-by: Andronik Ordian <write@reusable.software>
2021-03-04 17:03:24 +00:00
Robert Klotzner 78ac4b7add Whole subsystem test for new availability-distribution (#2552)
* WIP: Whole subsystem test.

* New tests compile.

* Avoid needless runtime queries for no validator nodes.

* Make tx and rx publicly accessible in virtual overseer.

This simplifies mocking in some cases, as tx can be cloned, but rx can
not.

* Whole subsystem test working.

* Update node/network/availability-distribution/src/session_cache.rs

Co-authored-by: Andronik Ordian <write@reusable.software>

* Update node/network/availability-distribution/src/session_cache.rs

Co-authored-by: Andronik Ordian <write@reusable.software>

* Document better what `None` return value means.

* Get rid of BitVec dependency.

* Update Cargo.lock

* Hopefully fixed implementers guide build.

Co-authored-by: Andronik Ordian <write@reusable.software>
2021-03-03 16:23:15 +01:00
Andronik Ordian 4c1de66d5d subsystem for issuing background connection requests (#2538)
* initial subsystem for issuing connection requests

* finish the initial impl

* integrate with the overseer

* rename to gossip-support

* fix renamings leftover

* remove run_inner

* fix compilation

* random subset of sqrt
2021-03-02 10:40:06 +00:00
Andronik Ordian 7530a0c432 validator_discovery: some logs to help identify authority discovery issues (#2533) 2021-02-26 12:06:23 -06:00
Robert Klotzner 48409e5548 Request based availability distribution (#2423)
* WIP

* availability distribution, still very wip.

Work on the requesting side of things.

* Some docs on what I intend to do.

* Checkpoint of session cache implementation

as I will likely replace it with something smarter.

* More work, mostly on cache

and getting things to type check.

* Only derive MallocSizeOf and Debug for std.

* availability-distribution: Cache feature complete.

* Sketch out logic in `FetchTask` for actual fetching.

- Compile fixes.
- Cleanup.

* Format cleanup.

* More format fixes.

* Almost feature complete `fetch_task`.

Missing:

- Check for cancel
- Actual querying of peer ids.

* Finish FetchTask so far.

* Directly use AuthorityDiscoveryId in protocol and cache.

* Resolve `AuthorityDiscoveryId` on sending requests.

* Rework fetch_task

- also make it impossible to check the wrong chunk index.
- Export needed function in validator_discovery.

* From<u32> implementation for `ValidatorIndex`.

* Fixes and more integration work.

* Make session cache proper lru cache.

* Use proper lru cache.

* Requester finished.

* ProtocolState -> Requester

Also make sure to not fetch our own chunk.

* Cleanup + fixes.

* Remove unused functions

- FetchTask::is_finished
- SessionCache::fetch_session_info

* availability-distribution responding side.

* Cleanup + Fixes.

* More fixes.

* More fixes.

adder-collator is running!

* Some docs.

* Docs.

* Fix reporting of bad guys.

* Fix tests

* Make all tests compile.

* Fix test.

* Cleanup + get rid of some warnings.

* state -> requester

* Mostly doc fixes.

* Fix test suite.

* Get rid of now redundant message types.

* WIP

* Rob's review remarks.

* Fix test suite.

* core.relay_parent -> leaf for session request.

* Style fix.

* Decrease request timeout.

* Cleanup obsolete errors.

* Metrics + don't fail on non fatal errors.

* requester.rs -> requester/mod.rs

* Panic on invalid BadValidator report.

* Fix indentation.

* Use typed default timeout constant.

* Make channel size 0, as each sender gets one slot anyways.

* Fix incorrect metrics initialization.

* Fix build after merge.

* More fixes.

* Hopefully valid metrics names.

* Better metrics names.

* Some tests that already work.

* Slightly better docs.

* Some more tests.

* Fix network bridge test.
2021-02-26 11:58:07 -06:00
Bernhard Schuster 31327eb0c7 test: add unit test to catch missing distribution to subsystems faster (#2495)
* test: add unit test to catch missing distribution to subsystems faster

* add a simple count

* introduce proc macro to generate dispatch type

* refactor

* refactor

* chore: add license

* fixup unit test

* fixup merge

* better errors

* better fmt

* fix error spans

* better docs

* better error messages

* ui test foo

* Update node/subsystem/dispatch-gen/src/lib.rs

Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>

* Update node/network/bridge/src/lib.rs

Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>

* Update node/subsystem/Cargo.toml

Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>

* Update node/subsystem/dispatch-gen/src/lib.rs

Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>

* Update node/subsystem/dispatch-gen/src/lib.rs

Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>

* Update node/network/bridge/src/lib.rs

Co-authored-by: Andronik Ordian <write@reusable.software>

* fix compilation

* use find_map

* drop the silly 2, use _inner instead

* Update node/network/bridge/src/lib.rs

Co-authored-by: Andronik Ordian <write@reusable.software>

* Update node/subsystem/dispatch-gen/src/lib.rs

Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>

* nail deps down

* more into()

* flatten

* missing use statement

* fix messages order

Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>
Co-authored-by: Andronik Ordian <write@reusable.software>
2021-02-26 08:10:41 +00:00
Robert Habermeier 5683f11628 fix master: use view.iter() instead of view.heads (#2510) 2021-02-23 21:02:16 +00:00
Robert Habermeier 3300b53306 Approval Checking Improvements Omnibus (#2480)
* add tracing to approval voting

* notify if session info is not working

* add dispute period to chain specs

* propagate genesis session to parachains runtime

* use `on_genesis_session`

* protect against zero cores in computation

* tweak voting rule to be based off of best and add logs

* genesis configuration should use VRF slots only

* swallow more keystore errors

* add some docs

* make validation-worker args non-optional and update clap

* better tracing for bitfield signing and provisioner

* pass amount of bits in bitfields to inclusion instead of recomputing

* debug -> warn for some logs

* better tracing for availability recovery

* a little av-store tracing

* bridge: forward availability recovery messages

* add missing try_from impl

* some more tracing

* improve approval distribution tracing

* guide: hold onto pending approval messages until NewBlocks

* Hold onto pending approval messages until NewBlocks

* guide: adjust comment

* process all actions for one wakeup at a time

* vec

* fix network bridge test

* replace randomness-collective-flip with Babe

* remove PairNotFound
2021-02-23 14:12:28 -06:00
Bernhard Schuster e3f776abed feat/view: assure heads in a view are sorted (#2493)
* feat/view: assure heads in a view are sorted

Allows O(n) comparisons, adds an alternate equiv relation
which takes O(n^2) for integrity verification.

Ref #2133

* revert: remove custom PartialEq impl, there are no duplicates

* fix: do not sort the live_heads, that alters the local view

* refactor/view: heads should not be public

* chore/spellcheck: add unfinalized

* fix/view: add missing len() and is_empty() fns

* quirk

* vec is not view

* Update node/network/approval-distribution/src/tests.rs

Co-authored-by: Andronik Ordian <write@reusable.software>

* Update node/network/bridge/src/lib.rs

Co-authored-by: Andronik Ordian <write@reusable.software>

* Update node/network/protocol/src/lib.rs

Co-authored-by: Andronik Ordian <write@reusable.software>

* fixup comment

* fix botched test

Co-authored-by: Andronik Ordian <write@reusable.software>
2021-02-23 15:39:57 +00:00
Bastian Köcher 2584c121fb Substrate companion for #8163 (#2492)
* Substrate companion for #8163

https://github.com/paritytech/substrate/pull/8163

* "Update Substrate"

Co-authored-by: parity-processbot <>
2021-02-22 14:52:43 +00:00
Bernhard Schuster 49c6aa9a76 feat/jaeger: more spans, more stages (#2477)
* feat/jaeger: more spans, more stages

Stage numbers are still arbitrarily picked.

* feat/jaeger: additional spans

* chore/spellcheck: improve the dictionary

* fix/jaeger JaegerSpan -> jaeger::Span
2021-02-19 14:19:43 +00:00