Automatic Example Collator (#67)

* add polkadot build script

* Add scripting to bring up a simple alice-bob example net

Demonstrated to produce blocks, but as of right now there's still
trouble getting it to respond to external queries on its ports.

* enable external rpc access to the nodes

Also shrink the build context by excluding some extraneous data.

* Ensure external RPC access works

Also set default branch appropriately, and have the stop command
clean itself up more thoroughly.

* Add multi-stage dockerfile for building the cumulus-test-parachain-collator

- Exclude the docker/ directory from build context because we're
  never going to build recursively, and this prevents spurious
  cache misses
- build the parachain collator in three stages. The build stage
  is discarded; the collator stage has a wrapper script to simplify
  generating the right bootnodes flags, and the default stage
  has just the binary in a small runtime.
- build_collator.sh collects appropriate build flags for the dockerfile
- inject_bootnodes.sh discovers the testnet node IDs and inserts them
  into the arguments list for cumulus-test-parachain-collator

* Add services which generate genesis state, run the collator

- Ignore the scripts directory to reduce spurious cache misses.
- Move inject_bootnodes.sh from the scripts directory into the root:
  It can't stay in the scripts directory, because that's ignored;
  I didn't want to invent _another_ top-level subdirectory for it.
  That decision could certainly be appealed, though.
- Move docker-compose.yml, add dc.sh, modify *_collator.sh: by
  taking docker-compose.yml out of the root directory, we can
  further reduce cache misses. However, docker-compose normally
  has a strong expectation that docker-compose.yml exist in the
  project root; it takes a moderately complicated invocation to
  override that expectation. That override is encoded in dc.sh;
  the updates to the other scripts are just to use the override.

The expectation as of now is that scripts/run_collator.sh runs
both chain nodes and the collator, generates the genesis state
into a volume with a transient container, and runs the collator
as specified in the repo README.

Upcoming work: Steps 5 and 6 from the readme.

* Launch the collator node

The biggest change here is adding the testing_net network to the
collator node's networks list. This lets it successfully connect
to the alice and bob nodes, which in turn lets it get their node IDs,
which was the blocker for a long time.

Remove httpie in favor of curl: makes for a smaller docker image,
and has fewer weird failure modes within docker.

Unfortunately this doesn't yet actually connect to the relay chain
nodes; that's the next area to figure out.

* enable external websocket access to indexer nodes

* Reorganize for improved caching, again

- Manually enumerate the set of source directories to copy when building.
  This bloats the cache a bit, but means that rebuilds on script changes
  don't bust that cache, which saves a _lot_ of time.
- Un-.dockerignore the scripts directory; it's small and will no longer
  trigger cache misses.
- Move inject_bootnodes.sh back into scripts directory for better organization.
- inject_bootnodes.sh: use rpc port for rpc call and p2p port for
  generating the bootnode string. I'm not 100% sure this is correct,
  but upwards of 80% at least.
- docker-compose.yml: reorganize the launch commands such that alice
  and bob still present the same external port mapping to the world,
  but within the docker-compose network, they both use the same
  (standard) p2p, rpc, and websocket ports. This makes life easier
  for inject_bootnodes.sh

The collator node still doesn't actually connect, but I think this
commit still represents real progress in that direction.

* Get the collator talking to the indexer nodes

In the end, it was four characters: -- and two = signs in the
launch arguments. They turn out to be critical characters for
correct operation, though!

Next up: automating step 5.

* Add runtime stage to collect runtime wasm blob into volume

We can't just copy the blob in the builder stage because the volumes
aren't available at that point.

Rewrite build_collator.sh into build_docker.sh and update for generality.

* WIP: add registrar service and partial work to actually register the collator

This is likely to be discarded; the Python library in use is 3rd party
and not well documented, while the official polkadot-js repo has a
CLI tool: https://github.com/polkadot-js/tools/tree/master/packages/api-cli

* Add a parachain registrar which should properly register the parachain

Doesn't work at the moment because it depends on two api-cli features
which I added today, which have not yet made it out into a published
release.

Next up: figure out how to add the `api-cli` at its `master` branch,
then run tests to ensure the collator is producing blocks. Then,
automate the block production tests.

* BROKEN attempt to demo registrar communication with the blockchain

This is a really weird bug. After running `scripts/run_collector.sh`,
which brings everything up, it's perfectly possible to get into
a state very much like what the registrar is in, and communicate
with the blockchain without issue:

```sh
$ docker run --rm --net cumulus_testing_net para-reg:latest polkadot-js-api --ws ws://172.28.1.1:9944 query.sudo.key
Thu 20 Feb 2020 12:19:20 PM CET
{
  "key": "5GrwvaEF5zXb26Fz9rcQpDWS57CtERHpNehXCPcNoHGKutQY"
}
```

However, the registrar itself, doing the same thing from within
`register_para.sh`, is failing to find the right place in the network:

```
/runtime/cumulus_test_parachain_runtime.compact.wasm found after 0 seconds
/genesis/genesis-state found after 0 seconds
2020-02-20 10:43:22          API-WS: disconnected from ws://172.28.1.1:9944 code: '1006' reason: 'connection failed'
_Event {
  type: 'error',
  isTrusted: false,
  _yaeti: true,
  target: W3CWebSocket {
    _listeners: {},
    addEventListener: [Function: _addEventListener],
    removeEventListener: [Function: _removeEventListener],
    dispatchEvent: [Function: _dispatchEvent],
    _url: 'ws://172.28.1.1:9944',
    _readyState: 3,
    _protocol: undefined,
    _extensions: '',
    _bufferedAmount: 0,
    _binaryType: 'arraybuffer',
    _connection: undefined,
    _client: WebSocketClient {
      _events: [Object: null prototype] {},
      _eventsCount: 0,
      _maxListeners: undefined,
      config: [Object],
      _req: null,
      protocols: [],
      origin: undefined,
      url: [Url],
      secure: false,
      base64nonce: 'aJ6J3pYDz8l5owVWHGbzHg==',
      [Symbol(kCapture)]: false
    },
    onclose: [Function (anonymous)],
    onerror: [Function (anonymous)],
    onmessage: [Function (anonymous)],
    onopen: [Function (anonymous)]
  },
  cancelable: true,
  stopImmediatePropagation: [Function (anonymous)]
}
```

They should be connected to the same network, running the same
image, doing the same call. The only difference is the file
existence checks, which really shouldn't be affecting the network
state at all.

Pushing this commit to ask for outside opinions on it, because
this is very weird and I clearly don't understand some part of
what's happening.

* Fix broken parachain registrar

The problem was that the registrar container was coming up too fast,
so the Alice node wasn't yet ready to receive connections. Using
a well-known wait script fixes the issue.

Next up: verify that the collator is in fact building blocks.

* fixes which cause the collator to correctly produce new parachain blocks

It didn't take much! The biggest issue was that the genesis state
was previously being double-encoded.

* add documentation for running the parachain automatically

* Add health check to collator

* minor scripting improvements

* Apply suggestions from code review

Co-Authored-By: Bastian Köcher <bkchr@users.noreply.github.com>

* Docker: copy the whole workspace in one go

Pro: future-proofing against the time we add or remove a directory
Con: changing any file in the workspace busts Rust's build cache,
     which takes a long time.

Co-authored-by: Bastian Köcher <bkchr@users.noreply.github.com>
This commit is contained in:
Peter Goodspeed-Niklaus
2020-02-21 16:20:22 +01:00
committed by GitHub
parent 669ea4864f
commit 5678c8a188
13 changed files with 457 additions and 1 deletions
+8
View File
@@ -0,0 +1,8 @@
.git
**/target/
**/*.txt
**/*.md
/docker/
# dotfiles in the repo root
/.*
+37 -1
View File
@@ -20,7 +20,7 @@ A planned Polkadot collator for the parachain.
## Running a collator ## Running a collator
1. Checkout polkadot at `cumulus-branch`. 1. Checkout polkadot at `96f5dc510ef770fd5c5ab57a90565bb5819bbbea`.
2. Run `Alice` and `Bob`: 2. Run `Alice` and `Bob`:
@@ -56,3 +56,39 @@ A planned Polkadot collator for the parachain.
them to the relay chain. them to the relay chain.
6. Now the `collator` should build blocks and the relay-chain should include them. You can check that the `parachain-header` for parachain `100` is changing. 6. Now the `collator` should build blocks and the relay-chain should include them. You can check that the `parachain-header` for parachain `100` is changing.
### Running the collator automatically
To simplify the above process, you can run steps 1-5 above automatically:
```sh
export BRANCH=96f5dc510ef770fd5c5ab57a90565bb5819bbbea
scripts/build_polkadot.sh
scripts/run_collator.sh
```
This will churn for several minutes, but should end with docker reporting that several containers have successfully been brought up.
To run step 6, first set up an alias which gives you quick access to the polkadot-js CLI:
```sh
docker build -f docker/parachain-registrar.dockerfile --target pjs -t parachain-registrar:pjs .
alias pjs='docker run --rm --net cumulus_testing_net parachain-registrar:pjs --ws ws://172.28.1.1:9944'
```
Those steps should complete very quickly. At that point, you can do things like:
```sh
$ pjs query.parachains.heads 100
{
"heads": "0xe1efbf8cc2e1304da927986f4cd6964ce0888ce3995948bf71fe427b1a9d39b02101d2dac9c5342d7e8c4f4de2f5277ef860b3a518c1cd823b9a8cee175dce11bf7f57c5016e8a60a6cec16244b2cbf81a67a1dc7a825c288fc694997bc70e2d456400"
}
```
The collator includes its own health check, which you can inspect with
```sh
docker inspect --format='{{json .State.Health}}' cumulus_collator_1
```
The check runs every 5 minutes, and takes about a minute to complete each time. Most of that time is spent sleeping; it remains a very lightweight process.
+129
View File
@@ -0,0 +1,129 @@
version: '3.7'
services:
node_alice:
image: "polkadot:${BRANCH:-cumulus-branch}"
ports:
- "30333:30333"
- "9933:9933"
- "9944:9944"
volumes:
- "polkadot-data-alice:/data"
- type: bind
source: ./test/parachain/res/polkadot_chainspec.json
target: /chainspec.json
read_only: true
command: >
polkadot
--chain=/chainspec.json
--base-path=/data
--port 30333
--rpc-port 9933
--ws-port 9944
--rpc-external
--rpc-cors all
--ws-external
--alice
networks:
testing_net:
ipv4_address: 172.28.1.1
aliases:
- alice
node_bob:
image: "polkadot:${BRANCH:-cumulus-branch}"
ports:
- "30344:30333"
- "9935:9933"
- "9945:9944"
volumes:
- "polkadot-data-bob:/data"
- type: bind
source: ./test/parachain/res/polkadot_chainspec.json
target: /chainspec.json
read_only: true
command: >
polkadot
--chain=/chainspec.json
--base-path=/data
--port 30333
--rpc-port 9933
--ws-port 9944
--rpc-external
--ws-external
--rpc-cors all
--bob
networks:
testing_net:
ipv4_address: 172.28.1.2
aliases:
- bob
genesis_state:
build:
context: .
dockerfile: ./docker/test-parachain-collator.dockerfile
image: "ctpc:latest"
volumes:
- "genesis-state:/data"
command: >
cumulus-test-parachain-collator
export-genesis-state
/data/genesis-state
collator:
build:
context: .
dockerfile: ./docker/test-parachain-collator.dockerfile
target: collator
image: "ctpc:collator"
volumes:
- "collator-data:/data"
depends_on:
- node_alice
- node_bob
command: >
inject_bootnodes.sh
--base-path=/data
networks:
testing_net:
runtime:
build:
context: .
dockerfile: ./docker/test-parachain-collator.dockerfile
target: runtime
image: "ctpc:runtime"
volumes:
- "parachain-runtime:/runtime"
registrar:
build:
context: .
dockerfile: ./docker/parachain-registrar.dockerfile
image: para-reg:latest
volumes:
- "genesis-state:/genesis"
- "parachain-runtime:/runtime"
depends_on:
- node_alice
- runtime
- genesis_state
networks:
testing_net:
volumes:
polkadot-data-alice:
polkadot-data-bob:
collator-data:
genesis-state:
parachain-runtime:
networks:
testing_net:
ipam:
driver: default
config:
- subnet: 172.28.0.0/16
@@ -0,0 +1,27 @@
FROM node:latest AS pjs
# It would be great to depend on a more stable tag, but we need some
# as-yet-unreleased features.
RUN yarn global add @polkadot/api-cli@0.10.0-beta.14
ENTRYPOINT [ "polkadot-js-api" ]
CMD [ "--version" ]
# To use the pjs build stage to access the blockchain from the host machine:
#
# docker build -f docker/parachain-registrar.dockerfile --target pjs -t parachain-registrar:pjs .
# alias pjs='docker run --rm --net cumulus_testing_net parachain-registrar:pjs --ws ws://172.28.1.1:9944'
#
# Then, as long as the chain is running, you can use the polkadot-js-api CLI like:
#
# pjs query.sudo.key
FROM pjs
RUN apt-get update && apt-get install curl netcat -y && \
curl -sSo /wait-for-it.sh https://raw.githubusercontent.com/vishnubob/wait-for-it/master/wait-for-it.sh && \
chmod +x /wait-for-it.sh
# the only thing left to do is to actually run the transaction.
COPY ./scripts/register_para.sh /usr/bin
# unset the previous stage's entrypoint
ENTRYPOINT []
CMD [ "/usr/bin/register_para.sh" ]
@@ -0,0 +1,61 @@
FROM rust:buster as builder
RUN apt-get update && apt-get install time clang libclang-dev llvm -y
RUN rustup toolchain install nightly
RUN rustup target add wasm32-unknown-unknown --toolchain nightly
RUN command -v wasm-gc || cargo +nightly install --git https://github.com/alexcrichton/wasm-gc --force
WORKDIR /paritytech/cumulus
# Ideally, we could just do something like `COPY . .`, but that doesn't work:
# it busts the cache every time non-source files like inject_bootnodes.sh change,
# as well as when non-`.dockerignore`'d transient files (*.log and friends)
# show up. There is no way to exclude particular files, or write a negative
# rule, using Docker's COPY syntax, which derives from go's filepath.Match rules.
#
# We can't combine these into a single big COPY operation like
# `COPY collator consensus network runtime test Cargo.* .`, because in that case
# docker will copy the _contents_ of each directory into the image workdir,
# not the actual directory. We're stuck just enumerating them.
COPY . .
RUN cargo build --release -p cumulus-test-parachain-collator
# the collator stage is normally built once, cached, and then ignored, but can
# be specified with the --target build flag. This adds some extra tooling to the
# image, which is required for a launcher script. The script simply adds two
# arguments to the list passed in:
#
# --bootnodes /ip4/127.0.0.1/tcp/30333/p2p/PEER_ID
#
# with the appropriate ip and ID for both Alice and Bob
FROM debian:buster-slim as collator
RUN apt-get update && apt-get install jq curl bash -y && \
curl -sSo /wait-for-it.sh https://raw.githubusercontent.com/vishnubob/wait-for-it/master/wait-for-it.sh && \
chmod +x /wait-for-it.sh && \
curl -sL https://deb.nodesource.com/setup_12.x | bash - && \
apt-get install -y nodejs && \
npm install --global yarn && \
yarn global add @polkadot/api-cli@0.10.0-beta.14
COPY --from=builder \
/paritytech/cumulus/target/release/cumulus-test-parachain-collator /usr/bin
COPY ./scripts/inject_bootnodes.sh /usr/bin
CMD ["/usr/bin/inject_bootnodes.sh"]
COPY ./scripts/healthcheck.sh /usr/bin/
HEALTHCHECK --interval=300s --timeout=75s --start-period=30s --retries=3 \
CMD ["/usr/bin/healthcheck.sh"]
# the runtime stage is normally built once, cached, and ignored, but can be
# specified with the --target build flag. This just preserves one of the builder's
# outputs, which can then be moved into a volume at runtime
FROM debian:buster-slim as runtime
COPY --from=builder \
/paritytech/cumulus/target/release/wbuild/cumulus-test-parachain-runtime/cumulus_test_parachain_runtime.compact.wasm \
/var/opt/
CMD ["cp", "-v", "/var/opt/cumulus_test_parachain_runtime.compact.wasm", "/runtime/"]
FROM debian:buster-slim
COPY --from=builder \
/paritytech/cumulus/target/release/cumulus-test-parachain-collator /usr/bin
CMD ["/usr/bin/cumulus-test-parachain-collator"]
+21
View File
@@ -0,0 +1,21 @@
#!/usr/bin/env bash
set -e
cd "$(cd "$(dirname "$0")" && git rev-parse --show-toplevel)"
dockerfile="$1"
if [ -z "$dockerfile" ]; then
dockerfile="./docker/test-parachain-collator.dockerfile"
else
shift 1
fi
image_name="$(basename "$dockerfile" | rev | cut -d. -f2- | rev)"
echo "building $dockerfile as $image_name..."
time docker build \
-f "$dockerfile" \
-t "$image_name":latest \
"$@" \
.
+23
View File
@@ -0,0 +1,23 @@
#!/usr/bin/env bash
set -e
cumulus_repo=$(cd "$(dirname "$0")" && git rev-parse --show-toplevel)
polkadot_repo=$(dirname "$cumulus_repo")/polkadot
if [ ! -d "$polkadot_repo/.git" ]; then
echo "please clone polkadot in parallel to this repo:"
echo " (cd .. && git clone git@github.com:paritytech/polkadot.git)"
exit 1
fi
if [ -z "$BRANCH" ]; then
BRANCH=cumulus-branch
fi
cd "$polkadot_repo"
git fetch
git checkout "$BRANCH"
time docker build \
-f ./docker/Dockerfile \
--build-arg PROFILE=release \
-t polkadot:"$BRANCH" .
+10
View File
@@ -0,0 +1,10 @@
#!/usr/bin/env bash
# helper function to run docker-compose using the docker/docker-compose.yml file while
# retaining a context from the root of the repository
set -e
dc () {
cd "$(cd "$(dirname "$0")" && git rev-parse --show-toplevel)"
docker-compose -f - "$@" < docker/docker-compose.yml
}
+14
View File
@@ -0,0 +1,14 @@
#!/usr/bin/env bash
set -e
head () {
polkadot-js-api --ws ws://172.28.1.1:9944 query.parachains.heads 100 |\
jq -r .heads
}
start=$(head)
sleep 60
end=$(head)
[ "$start" != "$end" ]
+50
View File
@@ -0,0 +1,50 @@
#!/usr/bin/env bash
# this script runs the cumulus-test-parachain-collator after fetching
# appropriate bootnode IDs
#
# this is _not_ a general-purpose script; it is closely tied to the
# root docker-compose.yml
set -e -o pipefail
ctpc="/usr/bin/cumulus-test-parachain-collator"
if [ ! -x "$ctpc" ]; then
echo "FATAL: $ctpc does not exist or is not executable"
exit 1
fi
# name the variable with the incoming args so it isn't overwritten later by function calls
args=( "$@" )
alice="172.28.1.1"
bob="172.28.1.2"
p2p_port="30333"
rpc_port="9933"
get_id () {
node="$1"
/wait-for-it.sh "$node:$rpc_port" -t 10 -s -- \
curl -sS \
-H 'Content-Type: application/json' \
--data '{"id":1,"jsonrpc":"2.0","method":"system_networkState"}' \
"$node:$rpc_port" |\
jq -r '.result.peerId'
}
bootnode () {
node="$1"
id=$(get_id "$node")
if [ -z "$id" ]; then
echo >&2 "failed to get id for $node"
exit 1
fi
echo "/ip4/$node/tcp/$p2p_port/p2p/$id"
}
args+=( "--" "--bootnodes=$(bootnode "$alice")" "--bootnodes=$(bootnode "$bob")" )
set -x
"$ctpc" "${args[@]}"
+58
View File
@@ -0,0 +1,58 @@
#!/usr/bin/env bash
set -e -o pipefail
sizeof () {
stat --printf="%s" "$1"
}
wait_for_file () {
# Wait for a file to have a stable, non-zero size.
# Takes at least 0.2 seconds per run, but there's no upper bound if the
# file grows continuously. If the file doesn't exist, or stably has 0 size,
# this will take up to 10 seconds by default; this limit can be adjusted by
# the second input parameter.
path="$1"
limit="$2"
if [ -z "$limit" ]; then
limit=10
fi
count=0
while [ "$count" -lt "$limit" ]; do
if [ -s "$path" ]; then
echo "$path found after $count seconds"
# now ensure that the file size is stable: it's not still being written
oldsize=0
size="$(sizeof "$path")"
while [ "$oldsize" -ne "$size" ]; do
sleep 0.2
oldsize="$size"
size="$(sizeof "$path")"
done
return
fi
count=$((count+1))
sleep 1
done
echo "$path not found after $count seconds"
exit 1
}
wait_for_file /runtime/cumulus_test_parachain_runtime.compact.wasm
wait_for_file /genesis/genesis-state
# this is now straightforward: just send the sudo'd tx to the alice node,
# as soon as the node is ready to receive connections
/wait-for-it.sh 172.28.1.1:9944 \
--strict \
--timeout=10 \
-- \
polkadot-js-api \
--ws ws://172.28.1.1:9944 \
--sudo \
--seed "//Alice" \
tx.registrar.registerPara \
100 \
'{"scheduling":"Always"}' \
@/runtime/cumulus_test_parachain_runtime.compact.wasm \
"$(cat /genesis/genesis-state)"
+10
View File
@@ -0,0 +1,10 @@
#!/usr/bin/env bash
set -e
cd "$(cd "$(dirname "$0")" && git rev-parse --show-toplevel)"
# shellcheck source=dc.sh
source scripts/dc.sh
dc build
dc up -d
+9
View File
@@ -0,0 +1,9 @@
#!/usr/bin/env bash
set -e
cd "$(cd "$(dirname "$0")" && git rev-parse --show-toplevel)"
# shellcheck source=dc.sh
source scripts/dc.sh
dc down --volumes --remove-orphans