Fixes the dead lock when any of the channels get at capacity. (#5297)

The PVF host is designed to avoid spawning tasks to minimize knowledge
of outer code. Using `async_std::task::spawn` (or Tokio's counterpart)
deemed unacceptable, `SpawnNamed` undesirable. Instead there is only one
task returned that is spawned by the candidate-validation subsystem.
The tasks from the sub-components are polled by that root task.

However, the way the tasks are bundled was incorrect. There was a giant
select that was polling those tasks. Particularly, that implies that as soon as
one of the arms of that select goes into await those sub-tasks stop
getting polled. This is a recipe for a deadlock which indeed happened
here.

Specifically, the deadlock happened during sending messages to the
execute queue by calling
[`send_execute`](https://github.com/paritytech/polkadot/blob/a68d9be35656dcd96e378fd9dd3d613af754d48a/node/core/pvf/src/host.rs#L601).
When the channel to the queue reaches the capacity, the control flow is
suspended until the queue handles those messages. Since this code is
essentially reached from [one of the select
arms](https://github.com/paritytech/polkadot/blob/a68d9be35656dcd96e378fd9dd3d613af754d48a/node/core/pvf/src/host.rs#L371),
the queue won't be given the control and thus no further progress can be
made.

This problem is solved by bundling the tasks one level higher instead,
by `selecting` over those long-running tasks.

We also stop treating returning from those long-running tasks as error
conditions, since that can happen during legit shutdown.
This commit is contained in:
Sergei Shulepov
2022-04-09 12:57:34 +02:00
committed by GitHub
parent a68d9be356
commit b89dc00ec0
2 changed files with 83 additions and 70 deletions
+30
View File
@@ -112,3 +112,33 @@ async fn execute_bad_on_parent() {
.await
.unwrap_err();
}
#[async_std::test]
async fn stress_spawn() {
let host = std::sync::Arc::new(TestHost::new());
async fn execute(host: std::sync::Arc<TestHost>) {
let parent_head = HeadData { number: 0, parent_hash: [0; 32], post_state: hash_state(0) };
let block_data = BlockData { state: 0, add: 512 };
let ret = host
.validate_candidate(
adder::wasm_binary_unwrap(),
ValidationParams {
parent_head: GenericHeadData(parent_head.encode()),
block_data: GenericBlockData(block_data.encode()),
relay_parent_number: 1,
relay_parent_storage_root: Default::default(),
},
)
.await
.unwrap();
let new_head = HeadData::decode(&mut &ret.head_data.0[..]).unwrap();
assert_eq!(new_head.number, 1);
assert_eq!(new_head.parent_hash, parent_head.hash());
assert_eq!(new_head.post_state, hash_state(512));
}
futures::future::join_all((0..100).map(|_| execute(host.clone()))).await;
}