Retry failed PVF execution (AmbiguousWorkerDeath) (#6235)

* Fix a couple of typos

* Retry failed PVF execution

PVF execution that fails due to AmbiguousWorkerDeath should be retried once.
This should reduce the occurrence of failures due to transient conditions.

Closes #6195

* Address a couple of nits

* Write tests; refactor (add `validate_candidate_with_retry`)

* Update node/core/candidate-validation/src/lib.rs

Co-authored-by: Andronik <write@reusable.software>

Co-authored-by: eskimor <eskimor@users.noreply.github.com>
Co-authored-by: Andronik <write@reusable.software>
This commit is contained in:
Marcin S
2022-11-08 15:36:36 -05:00
committed by GitHub
parent c261353380
commit 6d7f33e612
5 changed files with 185 additions and 31 deletions
+3 -3
View File
@@ -252,8 +252,8 @@ fn handle_job_finish(
"execute worker concluded",
);
// First we send the result. It may fail due the other end of the channel being dropped, that's
// legitimate and we don't treat that as an error.
// First we send the result. It may fail due to the other end of the channel being dropped,
// that's legitimate and we don't treat that as an error.
let _ = result_tx.send(result);
// Then, we should deal with the worker:
@@ -305,7 +305,7 @@ async fn spawn_worker_task(program_path: PathBuf, spawn_timeout: Duration) -> Qu
Err(err) => {
gum::warn!(target: LOG_TARGET, "failed to spawn an execute worker: {:?}", err);
// Assume that the failure intermittent and retry after a delay.
// Assume that the failure is intermittent and retry after a delay.
Delay::new(Duration::from_secs(3)).await;
},
}