Request based availability distribution (#2423)

* WIP

* availability distribution, still very wip.

Work on the requesting side of things.

* Some docs on what I intend to do.

* Checkpoint of session cache implementation

as I will likely replace it with something smarter.

* More work, mostly on cache

and getting things to type check.

* Only derive MallocSizeOf and Debug for std.

* availability-distribution: Cache feature complete.

* Sketch out logic in `FetchTask` for actual fetching.

- Compile fixes.
- Cleanup.

* Format cleanup.

* More format fixes.

* Almost feature complete `fetch_task`.

Missing:

- Check for cancel
- Actual querying of peer ids.

* Finish FetchTask so far.

* Directly use AuthorityDiscoveryId in protocol and cache.

* Resolve `AuthorityDiscoveryId` on sending requests.

* Rework fetch_task

- also make it impossible to check the wrong chunk index.
- Export needed function in validator_discovery.

* From<u32> implementation for `ValidatorIndex`.

* Fixes and more integration work.

* Make session cache proper lru cache.

* Use proper lru cache.

* Requester finished.

* ProtocolState -> Requester

Also make sure to not fetch our own chunk.

* Cleanup + fixes.

* Remove unused functions

- FetchTask::is_finished
- SessionCache::fetch_session_info

* availability-distribution responding side.

* Cleanup + Fixes.

* More fixes.

* More fixes.

adder-collator is running!

* Some docs.

* Docs.

* Fix reporting of bad guys.

* Fix tests

* Make all tests compile.

* Fix test.

* Cleanup + get rid of some warnings.

* state -> requester

* Mostly doc fixes.

* Fix test suite.

* Get rid of now redundant message types.

* WIP

* Rob's review remarks.

* Fix test suite.

* core.relay_parent -> leaf for session request.

* Style fix.

* Decrease request timeout.

* Cleanup obsolete errors.

* Metrics + don't fail on non fatal errors.

* requester.rs -> requester/mod.rs

* Panic on invalid BadValidator report.

* Fix indentation.

* Use typed default timeout constant.

* Make channel size 0, as each sender gets one slot anyways.

* Fix incorrect metrics initialization.

* Fix build after merge.

* More fixes.

* Hopefully valid metrics names.

* Better metrics names.

* Some tests that already work.

* Slightly better docs.

* Some more tests.

* Fix network bridge test.
This commit is contained in:
Robert Klotzner
2021-02-26 18:58:07 +01:00
committed by GitHub
parent 241b1f12a7
commit 48409e5548
45 changed files with 2037 additions and 1523 deletions
@@ -0,0 +1,97 @@
// Copyright 2021 Parity Technologies (UK) Ltd.
// This file is part of Polkadot.
// Polkadot is free software: you can redistribute it and/or modify
// it under the terms of the GNU General Public License as published by
// the Free Software Foundation, either version 3 of the License, or
// (at your option) any later version.
// Polkadot is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
// GNU General Public License for more details.
// You should have received a copy of the GNU General Public License
// along with Polkadot. If not, see <http://www.gnu.org/licenses/>.
//! Answer requests for availability chunks.
use futures::channel::oneshot;
use polkadot_node_network_protocol::request_response::{request::IncomingRequest, v1};
use polkadot_primitives::v1::{CandidateHash, ErasureChunk, ValidatorIndex};
use polkadot_subsystem::{
messages::{AllMessages, AvailabilityStoreMessage},
SubsystemContext,
};
use crate::error::{Error, Result};
use crate::{LOG_TARGET, metrics::{Metrics, SUCCEEDED, FAILED, NOT_FOUND}};
/// Variant of `answer_request` that does Prometheus metric and logging on errors.
///
/// Any errors of `answer_request` will simply be logged.
pub async fn answer_request_log<Context>(
ctx: &mut Context,
req: IncomingRequest<v1::AvailabilityFetchingRequest>,
metrics: &Metrics,
) -> ()
where
Context: SubsystemContext,
{
let res = answer_request(ctx, req).await;
match res {
Ok(result) =>
metrics.on_served(if result {SUCCEEDED} else {NOT_FOUND}),
Err(err) => {
tracing::warn!(
target: LOG_TARGET,
err= ?err,
"Serving chunk failed with error"
);
metrics.on_served(FAILED);
}
}
}
/// Answer an incoming chunk request by querying the av store.
///
/// Returns: Ok(true) if chunk was found and served.
pub async fn answer_request<Context>(
ctx: &mut Context,
req: IncomingRequest<v1::AvailabilityFetchingRequest>,
) -> Result<bool>
where
Context: SubsystemContext,
{
let chunk = query_chunk(ctx, req.payload.candidate_hash, req.payload.index).await?;
let result = chunk.is_some();
let response = match chunk {
None => v1::AvailabilityFetchingResponse::NoSuchChunk,
Some(chunk) => v1::AvailabilityFetchingResponse::Chunk(chunk.into()),
};
req.send_response(response).map_err(|_| Error::SendResponse)?;
Ok(result)
}
/// Query chunk from the availability store.
#[tracing::instrument(level = "trace", skip(ctx), fields(subsystem = LOG_TARGET))]
async fn query_chunk<Context>(
ctx: &mut Context,
candidate_hash: CandidateHash,
validator_index: ValidatorIndex,
) -> Result<Option<ErasureChunk>>
where
Context: SubsystemContext,
{
let (tx, rx) = oneshot::channel();
ctx.send_message(AllMessages::AvailabilityStore(
AvailabilityStoreMessage::QueryChunk(candidate_hash, validator_index, tx),
))
.await;
rx.await.map_err(|e| Error::QueryChunkResponseChannel(e))
}