Executor Environment parameterization (#6161)

* Re-apply changes without Diener, rebase to the lastest master

* Cache pruning

* Bit-pack InstantiationStrategy

* Move ExecutorParams version inside the structure itself

* Rework runtime API and executor parameters storage

* Pass executor parameters through backing subsystem

* Update Cargo.lock

* Introduce `ExecutorParams` to approval voting subsys

* Introduce `ExecutorParams` to dispute coordinator

* `cargo fmt`

* Simplify requests from backing subsys

* Fix tests

* Replace manual config cloning with `.clone()`

* Move constants to module

* Parametrize executor performing PVF pre-check

* Fix Malus

* Fix test runtime

* Introduce session executor params as a constant defined by session info
pallet

* Use Parity SCALE codec instead of hand-crafted binary encoding

* Get rid of constants; Add docs

* Get rid of constants

* Minor typo

* Fix Malus after rebase

* `cargo fmt`

* Use transparent SCALE encoding instead of explicit

* Clean up

* Get rid of relay parent to session index mapping

* Join environment type and version in a single enum element

* Use default execution parameters if running an old runtime

* `unwrap()` -> `expect()`

* Correct API version

* Constants are back in town

* Use constants for execution environment types

* Artifact separation, first try

* Get rid of explicit version

* PVF execution queue worker separation

* Worker handshake

* Global renaming

* Minor fixes resolving discussions

* Two-stage requesting of executor params to make use of runtime API cache

* Proper error handling in pvf-checker

* Executor params storage bootstrapping

* Propagate migration to v3 network runtimes

* Fix storage versioning

* Ensure `ExecutorParams` serialization determinism; Add comments

* Rename constants to make things a bit more deterministic
Get rid of stale code

* Tidy up a structure of active PVFs

* Minor formatting

* Fix comment

* Add try-runtime hooks

* Add storage version write on upgrade

Co-authored-by: Andronik <write@reusable.software>

* Add pre- and post-upgrade assertions

* Require to specify environment type; Remove redundant `impl`s

* Add `ExecutorParamHash` creation from `H256`

* Fix candidate validation subsys tests

* Return splittable error from executor params request fn

* Revert "Return splittable error from executor params request fn"

This reverts commit a0b274177d8bb2f6e13c066741892ecd2e72a456.

* Decompose approval voting metrics

* Use more relevant errors

* Minor formatting fix

* Assert a valid environment type instead of checking

* Fix `try-runtime` hooks

* After-merge fixes

* Add migration logs

* Remove dead code

* Fix tests

* Fix tests

* Back to the strongly typed implementation

* Promote strong types to executor interface

* Remove stale comment

* Move executor params to `SessionInfo`: primitives and runtime

* Move executor params to `SessionInfo`: node

* Try to bump primitives and API version

* Get rid of `MallocSizeOf`

* Bump target API version to v4

* Make use of session index already in place

* Back to v3

* Fix all the tests

* Add migrations to all the runtimes

* Make use of existing `SessionInfo` in approval voting subsys

* Rename `TARGET` -> `LOG_TARGET`

* Bump all the primitives to v3

* Fix Rococo ParachainHost API version

* Use `RollingSessionWindow` to acquire `ExecutorParams` in disputes

* Fix nits from discussions; add comments

* Re-evaluate queue logic

* Rework job assignment in execution queue

* Add documentation

* Use `RuntimeInfo` to obtain `SessionInfo` (with blackjack and caching)

* Couple `Pvf` with `ExecutorParams` wherever possible

* Put members of `PvfWithExecutorParams` under `Arc` for cheap cloning

* Fix comment

* Fix CI tests

* Fix clippy warnings

* Address nits from discussions

* Add a placeholder for raw data

* Fix non exhaustive match

* Remove redundant reexports and fix imports

* Keep only necessary semantic features, as discussed

* Rework `RuntimeInfo` to support mock implementation for tests

* Remove unneeded bound

* `cargo fmt`

* Revert "Remove unneeded bound"

This reverts commit 932463f26b00ce290e1e61848eb9328632ef8a61.

* Fix PVF host tests

* Fix PVF checker tests

* Fix overseer declarations

* Simplify tests

* `MAX_KEEP_WAITING` timeout based on `BACKGING_EXECUTION_TIMEOUT`

* Add a unit test for varying executor parameters

* Minor fixes from discussions

* Add prechecking max. memory parameter (see paritytech/srlabs_findings#110)

* Fix and improve a test

* Remove `ExecutionEnvironment` and `RawData`

* New primitives versioning in parachain host API

* `disputes()` implementation for Kusama and Polkadot

* Move `ExecutorParams` from `vstaging` to stable primitives

* Move disputes from `vstaging` to stable implementation

* Fix `try-runtime`

* Fixes after merge

* Move `ExecutorParams` to the bottom of `SessionInfo`

* Revert "Move executor params to `SessionInfo`: primitives and runtime"

This reverts commit dfcfb85fefd1c5be6c8a8f72dc09fd1809cfa9ce.

* Always use fresh activated live hash in pvf precheck
(re-apply 34b09a4c20de17e7926ed942cd0d657d18f743fa)

* Fixing tests (broken commit)

* Fix candidate validation tests

* Fix PVF host test

* Minor fixes

* Address discussions

* Restore migration

* Fix `use` to only include what is needed instead of `*`

* Add comment to never touch `DEFAULT_CONFIG`

* Update migration to set default `ExecutorParams` for `dispute_period`
sessions back

* Use `earliest_stored_session` instead of calculations

* Nit

* Add logs

* Treat any runtime error as `NotSupported` again

* Always return default executor params if not available

* Revert "Always return default executor params if not available"

This reverts commit b58ac4482ef444c67a9852d5776550d08e312f30.

* Add paritytech/substrate#9997 workaround

* `cargo fmt`

* Remove migration (again!)

* Bump executor params to API v4 (backport from #6698)

---------

Co-authored-by: Andronik <write@reusable.software>
This commit is contained in:
s0me0ne-unkn0wn
2023-02-15 12:26:09 +01:00
committed by GitHub
parent 7f6b8e6df9
commit dd0a556665
40 changed files with 1243 additions and 330 deletions
+15 -4
View File
@@ -25,6 +25,7 @@ use always_assert::never;
use futures::{
channel::mpsc, future::BoxFuture, stream::FuturesUnordered, Future, FutureExt, StreamExt,
};
use polkadot_primitives::vstaging::ExecutorParams;
use slotmap::HopSlotMap;
use std::{
fmt,
@@ -69,6 +70,7 @@ pub enum ToPool {
worker: Worker,
code: Arc<Vec<u8>>,
artifact_path: PathBuf,
executor_params: ExecutorParams,
preparation_timeout: Duration,
},
}
@@ -214,7 +216,7 @@ fn handle_to_pool(
metrics.prepare_worker().on_begin_spawn();
mux.push(spawn_worker_task(program_path.to_owned(), spawn_timeout).boxed());
},
ToPool::StartWork { worker, code, artifact_path, preparation_timeout } => {
ToPool::StartWork { worker, code, artifact_path, executor_params, preparation_timeout } => {
if let Some(data) = spawned.get_mut(worker) {
if let Some(idle) = data.idle.take() {
let preparation_timer = metrics.time_preparation();
@@ -226,6 +228,7 @@ fn handle_to_pool(
code,
cache_path.to_owned(),
artifact_path,
executor_params,
preparation_timeout,
preparation_timer,
)
@@ -275,12 +278,20 @@ async fn start_work_task<Timer>(
code: Arc<Vec<u8>>,
cache_path: PathBuf,
artifact_path: PathBuf,
executor_params: ExecutorParams,
preparation_timeout: Duration,
_preparation_timer: Option<Timer>,
) -> PoolEvent {
let outcome =
worker::start_work(&metrics, idle, code, &cache_path, artifact_path, preparation_timeout)
.await;
let outcome = worker::start_work(
&metrics,
idle,
code,
&cache_path,
artifact_path,
executor_params,
preparation_timeout,
)
.await;
PoolEvent::StartWork(worker, outcome)
}
+61 -28
View File
@@ -17,7 +17,10 @@
//! A queue that handles requests for PVF preparation.
use super::pool::{self, Worker};
use crate::{artifacts::ArtifactId, metrics::Metrics, PrepareResult, Priority, Pvf, LOG_TARGET};
use crate::{
artifacts::ArtifactId, metrics::Metrics, PrepareResult, Priority, PvfWithExecutorParams,
LOG_TARGET,
};
use always_assert::{always, never};
use futures::{channel::mpsc, stream::StreamExt as _, Future, SinkExt};
use std::{
@@ -33,7 +36,11 @@ pub enum ToQueue {
///
/// Note that it is incorrect to enqueue the same PVF again without first receiving the
/// [`FromQueue`] response.
Enqueue { priority: Priority, pvf: Pvf, preparation_timeout: Duration },
Enqueue {
priority: Priority,
pvf_with_params: PvfWithExecutorParams,
preparation_timeout: Duration,
},
}
/// A response from queue.
@@ -78,7 +85,7 @@ slotmap::new_key_type! { pub struct Job; }
struct JobData {
/// The priority of this job. Can be bumped.
priority: Priority,
pvf: Pvf,
pvf_with_params: PvfWithExecutorParams,
/// The timeout for the preparation job.
preparation_timeout: Duration,
worker: Option<Worker>,
@@ -208,8 +215,8 @@ impl Queue {
async fn handle_to_queue(queue: &mut Queue, to_queue: ToQueue) -> Result<(), Fatal> {
match to_queue {
ToQueue::Enqueue { priority, pvf, preparation_timeout } => {
handle_enqueue(queue, priority, pvf, preparation_timeout).await?;
ToQueue::Enqueue { priority, pvf_with_params, preparation_timeout } => {
handle_enqueue(queue, priority, pvf_with_params, preparation_timeout).await?;
},
}
Ok(())
@@ -218,19 +225,19 @@ async fn handle_to_queue(queue: &mut Queue, to_queue: ToQueue) -> Result<(), Fat
async fn handle_enqueue(
queue: &mut Queue,
priority: Priority,
pvf: Pvf,
pvf_with_params: PvfWithExecutorParams,
preparation_timeout: Duration,
) -> Result<(), Fatal> {
gum::debug!(
target: LOG_TARGET,
validation_code_hash = ?pvf.code_hash,
validation_code_hash = ?pvf_with_params.code_hash(),
?priority,
?preparation_timeout,
"PVF is enqueued for preparation.",
);
queue.metrics.prepare_enqueued();
let artifact_id = pvf.as_artifact_id();
let artifact_id = pvf_with_params.as_artifact_id();
if never!(
queue.artifact_id_to_job.contains_key(&artifact_id),
"second Enqueue sent for a known artifact"
@@ -247,7 +254,10 @@ async fn handle_enqueue(
return Ok(())
}
let job = queue.jobs.insert(JobData { priority, pvf, preparation_timeout, worker: None });
let job =
queue
.jobs
.insert(JobData { priority, pvf_with_params, preparation_timeout, worker: None });
queue.artifact_id_to_job.insert(artifact_id, job);
if let Some(available) = find_idle_worker(queue) {
@@ -338,7 +348,7 @@ async fn handle_worker_concluded(
// this can't be None;
// qed.
let job_data = never_none!(queue.jobs.remove(job));
let artifact_id = job_data.pvf.as_artifact_id();
let artifact_id = job_data.pvf_with_params.as_artifact_id();
queue.artifact_id_to_job.remove(&artifact_id);
@@ -424,7 +434,7 @@ async fn spawn_extra_worker(queue: &mut Queue, critical: bool) -> Result<(), Fat
async fn assign(queue: &mut Queue, worker: Worker, job: Job) -> Result<(), Fatal> {
let job_data = &mut queue.jobs[job];
let artifact_id = job_data.pvf.as_artifact_id();
let artifact_id = job_data.pvf_with_params.as_artifact_id();
let artifact_path = artifact_id.path(&queue.cache_path);
job_data.worker = Some(worker);
@@ -435,8 +445,9 @@ async fn assign(queue: &mut Queue, worker: Worker, job: Job) -> Result<(), Fatal
&mut queue.to_pool_tx,
pool::ToPool::StartWork {
worker,
code: job_data.pvf.code.clone(),
code: job_data.pvf_with_params.code(),
artifact_path,
executor_params: job_data.pvf_with_params.executor_params(),
preparation_timeout: job_data.preparation_timeout,
},
)
@@ -503,8 +514,8 @@ mod tests {
use std::task::Poll;
/// Creates a new PVF which artifact id can be uniquely identified by the given number.
fn pvf(descriminator: u32) -> Pvf {
Pvf::from_discriminator(descriminator)
fn pvf_with_params(descriminator: u32) -> PvfWithExecutorParams {
PvfWithExecutorParams::from_discriminator(descriminator)
}
async fn run_until<R>(
@@ -613,7 +624,7 @@ mod tests {
test.send_queue(ToQueue::Enqueue {
priority: Priority::Normal,
pvf: pvf(1),
pvf_with_params: pvf_with_params(1),
preparation_timeout: PRECHECK_PREPARATION_TIMEOUT,
});
assert_eq!(test.poll_and_recv_to_pool().await, pool::ToPool::Spawn);
@@ -626,7 +637,10 @@ mod tests {
result: Ok(PrepareStats::default()),
});
assert_eq!(test.poll_and_recv_from_queue().await.artifact_id, pvf(1).as_artifact_id());
assert_eq!(
test.poll_and_recv_from_queue().await.artifact_id,
pvf_with_params(1).as_artifact_id()
);
}
#[tokio::test]
@@ -635,12 +649,20 @@ mod tests {
let priority = Priority::Normal;
let preparation_timeout = PRECHECK_PREPARATION_TIMEOUT;
test.send_queue(ToQueue::Enqueue { priority, pvf: pvf(1), preparation_timeout });
test.send_queue(ToQueue::Enqueue { priority, pvf: pvf(2), preparation_timeout });
test.send_queue(ToQueue::Enqueue {
priority,
pvf_with_params: PvfWithExecutorParams::from_discriminator(1),
preparation_timeout,
});
test.send_queue(ToQueue::Enqueue {
priority,
pvf_with_params: PvfWithExecutorParams::from_discriminator(2),
preparation_timeout,
});
// Start a non-precheck preparation for this one.
test.send_queue(ToQueue::Enqueue {
priority,
pvf: pvf(3),
pvf_with_params: PvfWithExecutorParams::from_discriminator(3),
preparation_timeout: LENIENT_PREPARATION_TIMEOUT,
});
@@ -669,7 +691,7 @@ mod tests {
// Enqueue a critical job.
test.send_queue(ToQueue::Enqueue {
priority: Priority::Critical,
pvf: pvf(4),
pvf_with_params: PvfWithExecutorParams::from_discriminator(4),
preparation_timeout,
});
@@ -685,7 +707,7 @@ mod tests {
test.send_queue(ToQueue::Enqueue {
priority: Priority::Normal,
pvf: pvf(1),
pvf_with_params: PvfWithExecutorParams::from_discriminator(1),
preparation_timeout,
});
assert_eq!(test.poll_and_recv_to_pool().await, pool::ToPool::Spawn);
@@ -696,7 +718,7 @@ mod tests {
// Enqueue a critical job, which warrants spawning over the soft limit.
test.send_queue(ToQueue::Enqueue {
priority: Priority::Critical,
pvf: pvf(2),
pvf_with_params: PvfWithExecutorParams::from_discriminator(2),
preparation_timeout,
});
assert_eq!(test.poll_and_recv_to_pool().await, pool::ToPool::Spawn);
@@ -722,12 +744,20 @@ mod tests {
let priority = Priority::Normal;
let preparation_timeout = PRECHECK_PREPARATION_TIMEOUT;
test.send_queue(ToQueue::Enqueue { priority, pvf: pvf(1), preparation_timeout });
test.send_queue(ToQueue::Enqueue { priority, pvf: pvf(2), preparation_timeout });
test.send_queue(ToQueue::Enqueue {
priority,
pvf_with_params: PvfWithExecutorParams::from_discriminator(1),
preparation_timeout,
});
test.send_queue(ToQueue::Enqueue {
priority,
pvf_with_params: PvfWithExecutorParams::from_discriminator(2),
preparation_timeout,
});
// Start a non-precheck preparation for this one.
test.send_queue(ToQueue::Enqueue {
priority,
pvf: pvf(3),
pvf_with_params: PvfWithExecutorParams::from_discriminator(3),
preparation_timeout: LENIENT_PREPARATION_TIMEOUT,
});
@@ -753,7 +783,10 @@ mod tests {
// Since there is still work, the queue requested one extra worker to spawn to handle the
// remaining enqueued work items.
assert_eq!(test.poll_and_recv_to_pool().await, pool::ToPool::Spawn);
assert_eq!(test.poll_and_recv_from_queue().await.artifact_id, pvf(1).as_artifact_id());
assert_eq!(
test.poll_and_recv_from_queue().await.artifact_id,
pvf_with_params(1).as_artifact_id()
);
}
#[tokio::test]
@@ -762,7 +795,7 @@ mod tests {
test.send_queue(ToQueue::Enqueue {
priority: Priority::Normal,
pvf: pvf(1),
pvf_with_params: PvfWithExecutorParams::from_discriminator(1),
preparation_timeout: PRECHECK_PREPARATION_TIMEOUT,
});
@@ -787,7 +820,7 @@ mod tests {
test.send_queue(ToQueue::Enqueue {
priority: Priority::Normal,
pvf: pvf(1),
pvf_with_params: PvfWithExecutorParams::from_discriminator(1),
preparation_timeout: PRECHECK_PREPARATION_TIMEOUT,
});
+26 -7
View File
@@ -34,6 +34,7 @@ use crate::{
use cpu_time::ProcessTime;
use futures::{pin_mut, select_biased, FutureExt};
use parity_scale_codec::{Decode, Encode};
use polkadot_primitives::vstaging::ExecutorParams;
use sp_core::hexdisplay::HexDisplay;
use std::{
panic,
@@ -85,6 +86,7 @@ pub async fn start_work(
code: Arc<Vec<u8>>,
cache_path: &Path,
artifact_path: PathBuf,
executor_params: ExecutorParams,
preparation_timeout: Duration,
) -> Outcome {
let IdleWorker { stream, pid } = worker;
@@ -97,7 +99,9 @@ pub async fn start_work(
);
with_tmp_file(stream, pid, cache_path, |tmp_file, mut stream| async move {
if let Err(err) = send_request(&mut stream, code, &tmp_file, preparation_timeout).await {
if let Err(err) =
send_request(&mut stream, code, &tmp_file, &executor_params, preparation_timeout).await
{
gum::warn!(
target: LOG_TARGET,
worker_pid = %pid,
@@ -271,15 +275,19 @@ async fn send_request(
stream: &mut UnixStream,
code: Arc<Vec<u8>>,
tmp_file: &Path,
executor_params: &ExecutorParams,
preparation_timeout: Duration,
) -> io::Result<()> {
framed_send(stream, &code).await?;
framed_send(stream, path_to_bytes(tmp_file)).await?;
framed_send(stream, &executor_params.encode()).await?;
framed_send(stream, &preparation_timeout.encode()).await?;
Ok(())
}
async fn recv_request(stream: &mut UnixStream) -> io::Result<(Vec<u8>, PathBuf, Duration)> {
async fn recv_request(
stream: &mut UnixStream,
) -> io::Result<(Vec<u8>, PathBuf, ExecutorParams, Duration)> {
let code = framed_recv(stream).await?;
let tmp_file = framed_recv(stream).await?;
let tmp_file = bytes_to_path(&tmp_file).ok_or_else(|| {
@@ -288,6 +296,13 @@ async fn recv_request(stream: &mut UnixStream) -> io::Result<(Vec<u8>, PathBuf,
"prepare pvf recv_request: non utf-8 artifact path".to_string(),
)
})?;
let executor_params_enc = framed_recv(stream).await?;
let executor_params = ExecutorParams::decode(&mut &executor_params_enc[..]).map_err(|_| {
io::Error::new(
io::ErrorKind::Other,
"prepare pvf recv_request: failed to decode ExecutorParams".to_string(),
)
})?;
let preparation_timeout = framed_recv(stream).await?;
let preparation_timeout = Duration::decode(&mut &preparation_timeout[..]).map_err(|e| {
io::Error::new(
@@ -295,7 +310,7 @@ async fn recv_request(stream: &mut UnixStream) -> io::Result<(Vec<u8>, PathBuf,
format!("prepare pvf recv_request: failed to decode duration: {:?}", e),
)
})?;
Ok((code, tmp_file, preparation_timeout))
Ok((code, tmp_file, executor_params, preparation_timeout))
}
async fn send_response(stream: &mut UnixStream, result: PrepareResult) -> io::Result<()> {
@@ -347,7 +362,8 @@ pub fn worker_entrypoint(socket_path: &str) {
worker_event_loop("prepare", socket_path, |rt_handle, mut stream| async move {
loop {
let worker_pid = std::process::id();
let (code, dest, preparation_timeout) = recv_request(&mut stream).await?;
let (code, dest, executor_params, preparation_timeout) =
recv_request(&mut stream).await?;
gum::debug!(
target: LOG_TARGET,
%worker_pid,
@@ -372,7 +388,7 @@ pub fn worker_entrypoint(socket_path: &str) {
// Spawn another thread for preparation.
let prepare_fut = rt_handle
.spawn_blocking(move || {
let result = prepare_artifact(&code);
let result = prepare_artifact(&code, executor_params);
// Get the `ru_maxrss` stat. If supported, call getrusage for the thread.
#[cfg(target_os = "linux")]
@@ -454,14 +470,17 @@ pub fn worker_entrypoint(socket_path: &str) {
});
}
fn prepare_artifact(code: &[u8]) -> Result<CompiledArtifact, PrepareError> {
fn prepare_artifact(
code: &[u8],
executor_params: ExecutorParams,
) -> Result<CompiledArtifact, PrepareError> {
panic::catch_unwind(|| {
let blob = match crate::executor_intf::prevalidate(code) {
Err(err) => return Err(PrepareError::Prevalidation(format!("{:?}", err))),
Ok(b) => b,
};
match crate::executor_intf::prepare(blob) {
match crate::executor_intf::prepare(blob, executor_params) {
Ok(compiled_artifact) => Ok(CompiledArtifact::new(compiled_artifact)),
Err(err) => Err(PrepareError::Preparation(format!("{:?}", err))),
}