Commit Graph

585 Commits

Author SHA1 Message Date
Robert Habermeier ec5ad35e14 Network bridge metrics (#2818)
* add metrics (unused) to network bridge

* fix test compilation

* trigger metrics messages

* add some more metrics

* track sent and received notifications

* restore metrics import

* integrate into service

* Update node/network/bridge/src/lib.rs

Co-authored-by: Andronik Ordian <write@reusable.software>

* Update node/network/bridge/src/lib.rs

Co-authored-by: Andronik Ordian <write@reusable.software>

Co-authored-by: Andronik Ordian <write@reusable.software>
2021-04-05 01:07:05 +02:00
Andronik Ordian 4df29e71ab bitfield-dist: fix state update on gossip (#2817)
* bitfield-dist: fix state update on gossip

* fixes

* doc fixes

* oops

* 2 lines of code change
2021-04-04 22:25:40 +00:00
Robert Habermeier bfc8f4fcf3 Collators: Declare to all peers (#2816)
* fix tests

* add test for rejecting declares on collators

* fix bad test
2021-04-04 16:59:00 +00:00
Robert Habermeier 11b8e4c821 Collation protocol: stricter validators (#2810)
* guide: declare one para as a collator

* add ParaId to Declare messages and clean up

* fix build

* fix the testerinos

* begin adding keystore to collator-protocol

* remove request_x_ctx

* add core_for_group

* add bump_rotation

* add some more helpers to subsystem-util

* change signing_key API to take ref

* determine current and next para assignments

* disconnect collators who are not on current or next para

* add collator peer count metric

* notes for later

* some fixes

* add data & keystore to test state

* add a test utility for answering runtime API requests

* fix existing collator tests

* add new tests

* remove sc_keystore

* update cargo lock

Co-authored-by: Andronik Ordian <write@reusable.software>
2021-04-03 21:48:58 +02:00
Andronik Ordian 94b0ccc8f1 approval-distribution: split peer knowledge into sent and received (#2809)
* approval-distribution: split peer knowledge into sent and received

* guide updates

* fixes

* revert doc changes
2021-04-03 04:29:15 +02:00
Andronik Ordian 98082c5326 gossip: move authorities request to runtime api subsystem (#2798) 2021-04-01 23:51:01 +02:00
Robert Habermeier 57b56770e0 Approval Voting improvements (#2781)
* extract database from av-store itself

* generalize approval-voting over database type

* modes (without handling) and pruning old wakeups

* rework approval importing

* add our_approval_sig to ApprovalEntry

* import assignment

* guide updates for check-full-approval changes

* some aux functions

* send messages when becoming active.

* guide: network bridge sends view updates only when done syncing

* network bridge: send view updates only when done syncing

* tests for new network-bridge behavior

* add a test for updating approval entry with sig

* fix some warnings

* test load-all-blocks

* instantiate new parachains DB

* fix network-bridge empty view updates

* tweak

* fix wasm build, i think

* Update node/core/approval-voting/src/lib.rs

Co-authored-by: Andronik Ordian <write@reusable.software>

* add some versioning to parachains_db

* warnings

* fix merge changes

* remove versioning again

Co-authored-by: Andronik Ordian <write@reusable.software>
2021-04-01 17:33:52 +00:00
Pierre Krieger 01badafba6 Companion PR for substrate#8510 (#2795)
* Companion PR for substrate#8510

* update Substrate

Co-authored-by: parity-processbot <>
2021-04-01 19:33:43 +02:00
Andronik Ordian 7a2e1ef6c1 gossip: do not try to connect if we are not validators (#2786)
* gossip: do not issue a connection request if we are not a validator

* guide updates

* use all relevant authorities when issuing a request

* use AuthorityDiscoveryApi instead

* update comments to the status quo
2021-04-01 18:11:43 +02:00
Robert Habermeier 5da762e728 Avoid querying the local validator in availability recovery (#2792)
* guide: don't request availability data from ourselves

* add QueryAllChunks message

* implement QueryAllChunks

* remove unused relay_parent from StoreChunk

* test QueryAllChunks

* fast paths make short roads

* test early exit behavior
2021-04-01 15:57:41 +02:00
Andronik Ordian caebf642dd statement-distribution: do not use OurViewChange (#2790)
* quickfix for statement-distribution

* some logs
2021-03-31 23:28:17 +02:00
Robert Klotzner eb6786ad05 Better timeout values now that we are going to be connected to all nodes. (#2778)
* Better timeout values.

* Fix typo.

* Fix validator bandwidth.

* Fix compilation.
2021-03-31 22:34:12 +02:00
Robert Habermeier e65cad69ec Fix future-polling loop in availability and add a better early-exit (#2779)
* onto the front

* fix early exit for waiting for requests

* add logging back
2021-03-31 17:35:17 +02:00
Andronik Ordian 9ac35d9f2b gossip: choose a random subset on send instead of limiting connections (#2776)
* gossip: choose random subset on send

* naming bikeshed
2021-03-30 20:59:53 +02:00
Andronik Ordian a3115401c3 network-bridge: elevate log level for connections (#2772) 2021-03-30 20:01:57 +02:00
Robert Habermeier 08d5b268a0 Retry availability until the receiver of the request is dropped (#2763)
* guide updates

* keep interactions alive until receivers drop

* retry indefinitely

* cancel approval tasks on finality

* use swap_remove instead of remove
2021-03-30 17:33:38 +02:00
Robert Klotzner 6514e00144 Add tags to pov-fetcher. (#2768)
* Add tags to pov-fetcher.

* Add stage as well.

* Get rid of redundant tags.
2021-03-30 15:07:07 +02:00
Andronik Ordian bdee5a3923 approval-distribution: add an assertion (#2761) 2021-03-30 14:18:34 +02:00
Robert Klotzner 0bc42785b4 availability-distribution: Retry failed fetches on next block. (#2762)
* availability-distribution: Retry on fail on next block.

Retry failed fetches on next block when still pending availability.

* Update node/network/availability-distribution/src/requester/fetch_task/mod.rs

Co-authored-by: Andronik Ordian <write@reusable.software>

* Fix existing tests.

* Add test for trying all validators.

* Add test for testing retries.

Co-authored-by: Andronik Ordian <write@reusable.software>
2021-03-30 00:28:43 +02:00
Robert Habermeier e906598e94 tracing for pending_known map (#2755)
* tracing for pending_known map

* fix condition in retain

* add block hash to pending tracing
2021-03-29 21:38:03 +02:00
Robert Habermeier 54074d2d76 send assignments even when we have an approval (#2757) 2021-03-29 16:34:14 +00:00
Robert Klotzner 0a9fe852df Move non runtime related stuff into node/primitives (#2743)
* Remove stuff out of the runtime that does not belong there.

There might be more, but it is a start.

* White space fixes.

* Fix tests.

* Leave whitespace in ui tests alone.

* Add back zstd for no reason.

* Fix browser wasm (hopefully)
2021-03-29 02:15:44 +02:00
Robert Habermeier 8ebbe19d10 Split NetworkBridge and break cycles with Unbounded (#2736)
* overseer: pass messages directly between subsystems

* test that message is held on to

* Update node/overseer/src/lib.rs

Co-authored-by: Peter Goodspeed-Niklaus <coriolinus@users.noreply.github.com>

* give every subsystem an unbounded sender too

* remove metered_channel::name

1. we don't provide good names
2. these names are never used anywhere

* unused mut

* remove unnecessary &mut

* subsystem unbounded_send

* remove unused MaybeTimer

We have channel size metrics that serve the same purpose better now and the implementation of message timing was pretty ugly.

* remove comment

* split up senders and receivers

* update metrics

* fix tests

* fix test subsystem context

* use SubsystemSender in jobs system now

* refactor of awful jobs code

* expose public `run` on JobSubsystem

* update candidate backing to new jobs & use unbounded

* bitfield signing

* candidate-selection

* provisioner

* approval voting: send unbounded for assignment/approvals

* async not needed

* begin bridge split

* split up network tasks into background worker

* port over network bridge

* Update node/network/bridge/src/lib.rs

Co-authored-by: Andronik Ordian <write@reusable.software>

* rename ValidationWorkerNotifications

Co-authored-by: Peter Goodspeed-Niklaus <coriolinus@users.noreply.github.com>
Co-authored-by: Andronik Ordian <write@reusable.software>
2021-03-29 01:18:53 +02:00
Andronik Ordian 6f464a360f approval-distribution: limit the amount of assignments on unify (#2737)
* approval-distribution: limit the amount of packets on unify

* guide: fix a typo

* compilation fix

* grammar

* Update roadmap/implementers-guide/src/node/approval/approval-distribution.md

Co-authored-by: David <dvdplm@gmail.com>

* more grammar

* propagate only local assignments/approvals after a certain depth

* increase the threshold

* guides update

Co-authored-by: David <dvdplm@gmail.com>
2021-03-28 23:30:06 +02:00
Pierre Krieger e3dc9024ce Call NetworkService::add_known_address before sending a request (#2726)
* Call NetworkService::add_known_address before sending a request

* Better doc

* Update Substrate

* Update Substrate

* Restore the import 🤷‍♀️ I don't know why it compiles locally

* imports correctly

Co-authored-by: Robert Habermeier <rphmeier@gmail.com>
2021-03-28 16:01:49 +00:00
Robert Habermeier 5952e790fa Overseer: subsystems communicate directly (#2227)
* overseer: pass messages directly between subsystems

* test that message is held on to

* Update node/overseer/src/lib.rs

Co-authored-by: Peter Goodspeed-Niklaus <coriolinus@users.noreply.github.com>

* give every subsystem an unbounded sender too

* remove metered_channel::name

1. we don't provide good names
2. these names are never used anywhere

* unused mut

* remove unnecessary &mut

* subsystem unbounded_send

* remove unused MaybeTimer

We have channel size metrics that serve the same purpose better now and the implementation of message timing was pretty ugly.

* remove comment

* split up senders and receivers

* update metrics

* fix tests

* fix test subsystem context

* fix flaky test

* fix docs

* doc

* use select_biased to favor signals

* Update node/subsystem/src/lib.rs

Co-authored-by: Andronik Ordian <write@reusable.software>

Co-authored-by: Peter Goodspeed-Niklaus <coriolinus@users.noreply.github.com>
Co-authored-by: Andronik Ordian <write@reusable.software>
2021-03-28 15:55:10 +00:00
Robert Klotzner c6f07d8f31 Request based PoV distribution (#2640)
* Indentation fix.

* Prepare request-response for PoV fetching.

* Drop old PoV distribution.

* WIP: Fetch PoV directly from backing.

* Backing compiles.

* Runtime access and connection management for PoV distribution.

* Get rid of seemingly dead code.

* Implement PoV fetching.

Backing does not yet use it.

* Don't send `ConnectToValidators` for empty list.

* Even better - no need to check over and over again.

* PoV fetching implemented.

+ Typechecks
+ Should work

Missing:

- Guide
- Tests
- Do fallback fetching in case fetching from seconding validator fails.

* Check PoV hash upon reception.

* Implement retry of PoV fetching in backing.

* Avoid pointless validation spawning.

* Add jaeger span to pov requesting.

* Add back tracing.

* Review remarks.

* Whitespace.

* Whitespace again.

* Cleanup + fix tests.

* Log to log target in overseer.

* Fix more tests.

* Don't fail if group cannot be found.

* Simple test for PoV fetcher.

* Handle missing group membership better.

* Add test for retry functionality.

* Fix flaky test.

* Spaces again.

* Guide updates.

* Spaces.
2021-03-28 17:11:38 +02:00
Andronik Ordian dce20644c8 approval-distribution: moar metrics (#2734) 2021-03-28 00:23:32 +01:00
Andronik Ordian 71f1985172 approval-distribution: moar logs (#2732) 2021-03-27 23:21:25 +01:00
Robert Habermeier c503fbc2a0 duplicate logging fix (#2729)
* duplicate logging fix

* remove duplicate peer IDs
2021-03-27 16:17:35 +01:00
Robert Klotzner 6ea6299bca Reduce network bridge logging verbosity (#2717)
* Those should really be trace.

- Very spammy
- And they in fact trace the execution
- Should not be enabled lightly - will slow network bridge down.

* Make report peers debug again.
2021-03-27 00:19:43 +01:00
Robert Habermeier 064df81ee4 Add block number to activated leaves and associated fixes (#2718)
* add number to `ActivatedLeavesUpdate`

* update subsystem util and overseer

* use new ActivatedLeaf everywhere

* sort view

* sorted and limited view in network bridge

* use live block hash only if it's newer

* grumples
2021-03-26 13:06:40 +01:00
Robert Habermeier 8a396c678f Port availability recovery to use req/res (#2694)
* add AvailableDataFetchingRequest

* rename AvailabilityFetchingRequest to ChunkFetchingRequest

* rename AvailabilityFetchingResponse to Chunk_

* add AvailableDataFetching request

* add available data fetching request to availability recovery message

* remove availability recovery message

* fix

* update network bridge

* port availability recovery to request/response

* use validators.len(), not shuffling

* fix availability recovery tests

* update guide

* Update node/network/availability-recovery/src/lib.rs

Co-authored-by: Bernhard Schuster <bernhard@ahoi.io>

* Update node/network/availability-recovery/src/lib.rs

Co-authored-by: Arkadiy Paronyan <arkady.paronyan@gmail.com>

* remove println

Co-authored-by: Bernhard Schuster <bernhard@ahoi.io>
Co-authored-by: Arkadiy Paronyan <arkady.paronyan@gmail.com>
2021-03-25 15:34:24 +01:00
André Silva bfbb078525 collator-protocol: add message authentication (#2635)
* collator: authenticate collator protocol messages

* fix tests compilation

* node: verify collator protocol signatures in tests

* collator: fix tests

* implementers-guide: update CollatorProtocol messages

* collator: add test for verification of collator protocol signatures

* node: remove fixmes

* node: remove signature from advertisecollation message

* node: add magic constant to Declare message signature payload
2021-03-24 22:13:32 +01:00
Arkadiy Paronyan de85c05102 Tweaked logging (#2695)
* Tweaked logging

* Debug for Statement
2021-03-24 18:06:44 +00:00
Arkadiy Paronyan d78e2fbf86 Additional logging (#2693) 2021-03-24 16:24:54 +00:00
Robert Klotzner fa11c6d785 Unify maximum supported PoV size a bit. (#2691)
* Unify maximum supported PoV size a bit.

* Use MAX_POV_SIZE also in `HostConfiguration`.

* Fix types.
2021-03-24 15:48:36 +01:00
Robert Habermeier b8867d71bc Evict inactive peers from the collator protocol peer-set (#2680)
* malicious reputation cost is fatal

* make ReportBad a malicious cost

* futures control-flow for cleaning up inactive collator peers

* guide: network bridge updates

* add `PeerDisconnected` message

* guide: update

* reverse order

* remember to match

* implement disconnect peer in network bridge

* implement disconnect_inactive_peers

* test

* remove println

* don't hardcore policy

* add fuse outside of loop

* use default eviction policy
2021-03-24 13:32:28 +01:00
Robert Klotzner 0f8b6f2f6e Bigger is better. (#2687)
* Bigger is better.

Made all request response sizes 10 times bigger.

* The smaller the better.

* Update comment.

* Ah, bigger is still better.

Max PoV size for rococo is around 50Meg, compression ratio is about 3.4.
With 30 Meg we should be fine, even with crypto kitties in the PoV.
2021-03-24 13:29:59 +01:00
Arkadiy Paronyan 5929d1ef15 Additional logging for polkadot network protocols (#2684)
* Additional logging for polkadot network protocols

* Additional log

* Update node/network/bitfield-distribution/src/lib.rs

Co-authored-by: Robert Habermeier <rphmeier@gmail.com>

* Update node/network/availability-distribution/src/responder.rs

* Added additional chunk info

* Added additional peer info

Co-authored-by: Robert Habermeier <rphmeier@gmail.com>
2021-03-24 11:55:50 +00:00
Bastian Köcher edb36153b1 Improve logging (#2669)
* Improve logging

* Review feedback

* Fix some warning and some further logging changes
2021-03-23 11:57:59 +01:00
Pierre Krieger bbc3ad3cfc Add debug messaes to the bridge actions (#2668) 2021-03-23 10:52:30 +01:00
Bernhard Schuster ea6294fa79 restructure polkadot-node-jaeger (#2642) 2021-03-19 16:51:16 +01:00
Robert Klotzner 59640a38bc Don't accept incoming connections for collators (#2644)
* Don't accept incoming connections for collators

on the `Collation` peer set.

* Better docs.
2021-03-19 07:20:38 +00:00
Bastian Köcher 15ae5dd410 Improve the logging (#2645) 2021-03-18 23:28:43 +00:00
Robert Klotzner 503e2b74f9 Request based collation fetching (#2621)
* Introduce collation fetching protocol

also move to mod.rs

* Allow `PeerId`s in requests to network bridge.

* Fix availability distribution tests.

* Move CompressedPoV to primitives.

* Request based collator protocol: validator side

- Missing: tests
- Collator side
- don't connect, if not connected

* Fixes.

* Basic request based collator side.

* Minor fix on collator side.

* Don't connect in requests in collation protocol.

Also some cleanup.

* Fix PoV distribution

* Bump substrate

* Add back metrics + whitespace fixes.

* Add back missing spans.

* More cleanup.

* Guide update.

* Fix tests

* Handle results in tests.

* Fix weird compilation issue.

* Add missing )

* Get rid of dead code.

* Get rid of redundant import.

* Fix runtime build.

* Cleanup.

* Fix wasm build.

* Format fixes.

Thanks @andronik !
2021-03-18 09:06:36 +01:00
Robert Habermeier 94d50afd4e Backing and collator protocol traces including para-id (#2620)
* improve backing/provisioner spans

* span for collation requests

* add para_id to unbacked candidate spans

* differentiate validation-construction and find-assignment in selection

* better find-assignment spans

* organize unbacked-candidate spans directly under job root

* Update node/core/provisioner/src/lib.rs

Co-authored-by: Andronik Ordian <write@reusable.software>

Co-authored-by: Andronik Ordian <write@reusable.software>
2021-03-14 16:51:14 +00:00
Andronik Ordian a543b1d6c3 availability distribution: don't early return on runtime errors (#2606)
* availability distribution: don't early return on runtime errors

* log error

* extract runtime api error from Error

* uh

* oh
2021-03-11 12:47:56 -06:00
Robert Habermeier b105d9acc0 more tracing for av-store (#2604)
* more tracing for av-store

* Update node/core/av-store/src/lib.rs

Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>

* Update node/core/av-store/src/lib.rs

Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>

* Update node/core/av-store/src/lib.rs

Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>

* Update tracing everywhere

* Fix build

* More fixes

* Push cargo.lock

* Update

Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>
Co-authored-by: Bastian Köcher <info@kchr.de>
2021-03-11 13:12:34 +01:00
Robert Habermeier 9331e06eda remove statement::invalid (#2597) 2021-03-10 10:31:17 -06:00