Request based availability distribution (#2423)

* WIP

* availability distribution, still very wip.

Work on the requesting side of things.

* Some docs on what I intend to do.

* Checkpoint of session cache implementation

as I will likely replace it with something smarter.

* More work, mostly on cache

and getting things to type check.

* Only derive MallocSizeOf and Debug for std.

* availability-distribution: Cache feature complete.

* Sketch out logic in `FetchTask` for actual fetching.

- Compile fixes.
- Cleanup.

* Format cleanup.

* More format fixes.

* Almost feature complete `fetch_task`.

Missing:

- Check for cancel
- Actual querying of peer ids.

* Finish FetchTask so far.

* Directly use AuthorityDiscoveryId in protocol and cache.

* Resolve `AuthorityDiscoveryId` on sending requests.

* Rework fetch_task

- also make it impossible to check the wrong chunk index.
- Export needed function in validator_discovery.

* From<u32> implementation for `ValidatorIndex`.

* Fixes and more integration work.

* Make session cache proper lru cache.

* Use proper lru cache.

* Requester finished.

* ProtocolState -> Requester

Also make sure to not fetch our own chunk.

* Cleanup + fixes.

* Remove unused functions

- FetchTask::is_finished
- SessionCache::fetch_session_info

* availability-distribution responding side.

* Cleanup + Fixes.

* More fixes.

* More fixes.

adder-collator is running!

* Some docs.

* Docs.

* Fix reporting of bad guys.

* Fix tests

* Make all tests compile.

* Fix test.

* Cleanup + get rid of some warnings.

* state -> requester

* Mostly doc fixes.

* Fix test suite.

* Get rid of now redundant message types.

* WIP

* Rob's review remarks.

* Fix test suite.

* core.relay_parent -> leaf for session request.

* Style fix.

* Decrease request timeout.

* Cleanup obsolete errors.

* Metrics + don't fail on non fatal errors.

* requester.rs -> requester/mod.rs

* Panic on invalid BadValidator report.

* Fix indentation.

* Use typed default timeout constant.

* Make channel size 0, as each sender gets one slot anyways.

* Fix incorrect metrics initialization.

* Fix build after merge.

* More fixes.

* Hopefully valid metrics names.

* Better metrics names.

* Some tests that already work.

* Slightly better docs.

* Some more tests.

* Fix network bridge test.
This commit is contained in:
Robert Klotzner
2021-02-26 18:58:07 +01:00
committed by GitHub
parent 241b1f12a7
commit 48409e5548
45 changed files with 2037 additions and 1523 deletions
+10 -13
View File
@@ -117,7 +117,7 @@ impl<N, AD> NetworkBridge<N, AD> {
impl<Net, AD, Context> Subsystem<Context> for NetworkBridge<Net, AD>
where
Net: Network + validator_discovery::Network,
Net: Network + validator_discovery::Network + Sync,
AD: validator_discovery::AuthorityDiscovery,
Context: SubsystemContext<Message=NetworkBridgeMessage>,
{
@@ -237,7 +237,10 @@ where
Action::SendRequests(reqs) => {
for req in reqs {
bridge.network_service.start_request(req);
bridge
.network_service
.start_request(&mut bridge.authority_discovery_service, req)
.await;
}
},
@@ -605,7 +608,6 @@ mod tests {
use polkadot_subsystem::{ActiveLeavesUpdate, FromOverseer, OverseerSignal};
use polkadot_subsystem::messages::{
AvailabilityDistributionMessage,
AvailabilityRecoveryMessage,
ApprovalDistributionMessage,
BitfieldDistributionMessage,
@@ -624,6 +626,7 @@ mod tests {
use polkadot_node_network_protocol::{ObservedRole, request_response::request::Requests};
use crate::network::{Network, NetworkAction};
use crate::validator_discovery::AuthorityDiscovery;
// The subsystem's view of the network - only supports a single call to `event_stream`.
struct TestNetwork {
@@ -663,6 +666,7 @@ mod tests {
)
}
#[async_trait]
impl Network for TestNetwork {
fn event_stream(&mut self) -> BoxStream<'static, NetworkEvent> {
self.net_events.lock()
@@ -677,7 +681,7 @@ mod tests {
Box::pin((&mut self.action_tx).sink_map_err(Into::into))
}
fn start_request(&self, _: Requests) {
async fn start_request<AD: AuthorityDiscovery>(&self, _: &mut AD, _: Requests) {
}
}
@@ -801,13 +805,6 @@ mod tests {
) if e == event.focus().expect("could not focus message")
);
assert_matches!(
virtual_overseer.recv().await,
AllMessages::AvailabilityDistribution(
AvailabilityDistributionMessage::NetworkBridgeUpdateV1(e)
) if e == event.focus().expect("could not focus message")
);
assert_matches!(
virtual_overseer.recv().await,
AllMessages::AvailabilityRecovery(
@@ -1527,7 +1524,7 @@ mod tests {
fn spread_event_to_subsystems_is_up_to_date() {
// Number of subsystems expected to be interested in a network event,
// and hence the network event broadcasted to.
const EXPECTED_COUNT: usize = 6;
const EXPECTED_COUNT: usize = 5;
let mut cnt = 0_usize;
for msg in AllMessages::dispatch_iter(NetworkBridgeEvent::PeerDisconnected(PeerId::random())) {
@@ -1538,7 +1535,7 @@ mod tests {
AllMessages::ChainApi(_) => unreachable!("Not interested in network events"),
AllMessages::CollatorProtocol(_) => unreachable!("Not interested in network events"),
AllMessages::StatementDistribution(_) => { cnt += 1; }
AllMessages::AvailabilityDistribution(_) => { cnt += 1; }
AllMessages::AvailabilityDistribution(_) => unreachable!("Not interested in network events"),
AllMessages::AvailabilityRecovery(_) => { cnt += 1; }
AllMessages::BitfieldDistribution(_) => { cnt += 1; }
AllMessages::BitfieldSigning(_) => unreachable!("Not interested in network events"),
+45 -5
View File
@@ -17,6 +17,7 @@
use std::pin::Pin;
use std::sync::Arc;
use async_trait::async_trait;
use futures::future::BoxFuture;
use futures::prelude::*;
use futures::stream::BoxStream;
@@ -24,7 +25,7 @@ use futures::stream::BoxStream;
use parity_scale_codec::Encode;
use sc_network::Event as NetworkEvent;
use sc_network::{NetworkService, IfDisconnected};
use sc_network::{IfDisconnected, NetworkService, OutboundFailure, RequestFailure};
use polkadot_node_network_protocol::{
peer_set::PeerSet,
@@ -34,6 +35,8 @@ use polkadot_node_network_protocol::{
use polkadot_primitives::v1::{Block, Hash};
use polkadot_subsystem::{SubsystemError, SubsystemResult};
use crate::validator_discovery::{peer_id_from_multiaddr, AuthorityDiscovery};
use super::LOG_TARGET;
/// Send a message to the network.
@@ -92,6 +95,7 @@ pub enum NetworkAction {
}
/// An abstraction over networking for the purposes of this subsystem.
#[async_trait]
pub trait Network: Send + 'static {
/// Get a stream of all events occurring on the network. This may include events unrelated
/// to the Polkadot protocol - the user of this function should filter only for events related
@@ -105,7 +109,11 @@ pub trait Network: Send + 'static {
) -> Pin<Box<dyn Sink<NetworkAction, Error = SubsystemError> + Send + 'a>>;
/// Send a request to a remote peer.
fn start_request(&self, req: Requests);
async fn start_request<AD: AuthorityDiscovery>(
&self,
authority_discovery: &mut AD,
req: Requests,
);
/// Report a given peer as either beneficial (+) or costly (-) according to the given scalar.
fn report_peer(
@@ -137,6 +145,7 @@ pub trait Network: Send + 'static {
}
}
#[async_trait]
impl Network for Arc<NetworkService<Block, Hash>> {
fn event_stream(&mut self) -> BoxStream<'static, NetworkEvent> {
NetworkService::event_stream(self, "polkadot-network-bridge").boxed()
@@ -189,7 +198,11 @@ impl Network for Arc<NetworkService<Block, Hash>> {
Box::pin(ActionSink(&**self))
}
fn start_request(&self, req: Requests) {
async fn start_request<AD: AuthorityDiscovery>(
&self,
authority_discovery: &mut AD,
req: Requests,
) {
let (
protocol,
OutgoingRequest {
@@ -199,8 +212,35 @@ impl Network for Arc<NetworkService<Block, Hash>> {
},
) = req.encode_request();
NetworkService::start_request(&*self,
peer,
let peer_id = authority_discovery
.get_addresses_by_authority_id(peer)
.await
.and_then(|addrs| {
addrs
.into_iter()
.find_map(|addr| peer_id_from_multiaddr(&addr))
});
let peer_id = match peer_id {
None => {
tracing::debug!(target: LOG_TARGET, "Discovering authority failed");
match pending_response
.send(Err(RequestFailure::Network(OutboundFailure::DialFailure)))
{
Err(_) => tracing::debug!(
target: LOG_TARGET,
"Sending failed request response failed."
),
Ok(_) => {}
}
return;
}
Some(peer_id) => peer_id,
};
NetworkService::start_request(
&*self,
peer_id,
protocol.into_protocol_name(),
payload,
pending_response,
@@ -131,7 +131,7 @@ fn on_revoke(map: &mut HashMap<AuthorityDiscoveryId, u64>, id: AuthorityDiscover
None
}
fn peer_id_from_multiaddr(addr: &Multiaddr) -> Option<PeerId> {
pub(crate) fn peer_id_from_multiaddr(addr: &Multiaddr) -> Option<PeerId> {
addr.iter().last().and_then(|protocol| if let Protocol::P2p(multihash) = protocol {
PeerId::from_multihash(multihash).ok()
} else {