Commit Graph

252 Commits

Author SHA1 Message Date
James Wilson 7ff9a316d1 First/Third party by genesis hash, not label. Make limit configurable and default to 1k (#424)
* First/Third party by genesis hash, not label. Make limit configurable

* Fix a test that relies on not being a node limit

* remove a now-invalid comment

* Cargo fmt

* Fix another naff comment

* Update backend/telemetry_core/src/state/chain.rs

Comment tweak

Co-authored-by: David <dvdplm@gmail.com>

Co-authored-by: David <dvdplm@gmail.com>
2021-10-13 13:08:53 +01:00
James Wilson 38c5dff0b7 Saturating sub timestamp from now to avoid possible undeflowr (#420) 2021-10-01 16:55:35 +01:00
James Wilson 7ac88a7e84 Expose total messages sent to aggregator, too (#416)
* Expose total messages sent to aggregator so we can make better graphs with total dropped msgs

* cargo fmt

* use write to hopefully avoid some allocating

* add 'core' namespace to telemetry metrics for better future clarity

* cargo fmt
2021-09-30 12:21:37 +01:00
James Wilson ed6e292d25 Clean up shard channel when shard disconnects (#411) 2021-09-24 10:48:54 +01:00
James Wilson b4b128f9fe Tidy up stale connections. (#406)
* If messageId changes and network ID doesn't, remove 'old' message_id

* Boot nodes/connection when no recent messages received for it

* Separate task needed for soketto recv to avoid cancel-safety issues with new interval

* Wee tidy up

* cargo fmt

* Add some logging around node adding/removing

* Another log info msg

* a bit of tidy up

* bump stale node timeout to 60s
2021-09-21 15:49:42 +01:00
James Wilson 0b0cec0512 Make soak tests work again now that we subscribe by genesis hash (#399)
* Make soak tests work again with genesis hash subscribing

* cargo fmt

* derive hex from hash

* actually compile code

* fmt

Co-authored-by: David Palm <dvdplm@gmail.com>
2021-09-10 11:28:34 +01:00
James Wilson fe19a75414 Un-brittle-ify backend E2E tests and have them run by default again in CI (#397)
* test running tests

* Add delay so that core knows about node before feed subscribes to limit chance of race

* move delaye to the right place

* Don't do expensive docker step untill we will be pushing the image

* docker test skipped as hoped, so push to 'true'

* just remove docker steps in github CI entirely since they aren't needed by anything (gitlab CI does this stuff now)

* run CI on pull requests too to catch PRs from forks
2021-09-06 11:30:17 +01:00
James Wilson a3ffaf3c44 Flume fix part 2: avoid using flume in a couple of cases, and revert attempted fix#1 (#396)
* A second attempt to avoid the flume memory leak, since the first didn't actually work

* No luck with second flume fix, so revert to futures::mpsc in a few places)

* cargo fmt

* Add a comment to cover use of into_stream
2021-09-03 15:55:25 +01:00
James Wilson 2932075783 Avoid using flume::Receiver::into_stream() to avoid memory leaks until the issue is resolved upstream (#394)
* Tweak rolling_total test to also confirm capacity doesn't go nuts

* Use Jemalloc

* Avoid flume's into_stream and use a workaround for now

* cargo fmt

* Improve comments now that there's an issue to point to
2021-09-03 08:40:43 +01:00
Maciej Hirsz a4069e4b3d Subscribe to chains by genesis hash (#395)
* Handle subscription by hash in the frontend

* Forward-ported backend changes

* Fix unit tests

* Remove unused `chains_by_label`

* fmt

* Updated but failing E2E tests

* subscribe by genesis hash in tests

* fmt

* Copy `BlockHash` instead of returning a ref

* Pin chains by genesisHash

Co-authored-by: James Wilson <james@jsdw.me>
2021-09-02 17:54:19 +02:00
James Wilson ec5db0fbbf Bump tokio to 1.10 and add a test to confirm memory usage of rolling_total (#392) 2021-08-31 20:18:46 +02:00
James Wilson 87866b2d42 Improve logging and error reporting around IP and location info (#386)
* Beef up error reporting of IP and location info

* Tidy up error reporting after some manual testing of it

* Don't cache erroneous locations; try again when asked again

* cargo fmt
2021-08-27 16:16:26 +01:00
James Wilson 7a3e30cb01 Don't remove all feeds subscribed to a chain when one disconnects (#383)
* Only remove the feed that disconnected to not break the rest...

* use multimap struct to avoid sync issues between feed and chain

* add a remove test, too

* cargo fmt

* fix name of test

* move multimap to common so we can doctest it and add 'unique' to name

* cargo fmt

* Return old key if value moved to make uniqueness more obvious
2021-08-27 08:05:44 +01:00
Chevdor 19db1a48ef Hardening of the Backend docker image (#379)
* Add script to build the backend
* harden the backend docker image
* fix docker-compose
* fix doc
2021-08-26 14:32:11 +02:00
James Wilson 46b0641dfd Clarify wording 2021-08-13 11:45:03 +01:00
James Wilson 77460ffc27 cargo fmt 2021-08-13 11:35:24 +01:00
James Wilson b842c7fc8b expose dropped message counts and fix some typos/wording 2021-08-13 11:33:53 +01:00
James Wilson 811babca27 Merge branch 'master' into jsdw-sharding-gatekeeper 2021-08-13 11:16:47 +01:00
James Wilson 18627a9f02 No e2e feature flag; just ignore and pattern match on 'e2e' to run 2021-08-12 16:58:35 +01:00
James Wilson 05a3ba3fef Fix/expand a few comments 2021-08-12 16:20:05 +01:00
James Wilson 230987036a cargo fmt 2021-08-12 16:01:35 +01:00
James Wilson 9017f328f0 Add comment explaining prometheus metrics endpoint body 2021-08-12 16:01:33 +01:00
James Wilson 92da674d4d Expose metrics in a format that prometheus understands 2021-08-12 16:01:32 +01:00
James Wilson 4f7b2c8ec5 Confirm that densemap len wont panic if lots of retired items 2021-08-12 16:01:30 +01:00
James Wilson 6db7f484ef Fix compile err with diagnostic msg 2021-08-12 16:01:29 +01:00
James Wilson ab2303ce5c more diagnostic logging 2021-08-12 16:01:27 +01:00
James Wilson 3319709f7b Add periodic interval to core loop and print debug info 2021-08-12 16:01:25 +01:00
James Wilson f72f8c1fd5 test runner: fix soak test for multiple ids per ndoe 2021-08-12 16:01:24 +01:00
James Wilson 20463ce159 test runner: enable tokio features 2021-08-12 16:01:19 +01:00
James Wilson e3fcd4e8c2 Clean up soak test runner and add more config options 2021-08-12 16:01:18 +01:00
James Wilson bd7a21ec39 Flumify everything 2021-08-12 16:01:17 +01:00
James Wilson 11b0b3a3c7 remove final use of futures::mpsc and replace with flume 2021-08-12 16:01:15 +01:00
James Wilson 703a9ddc4e use flume throughout telemetry_core 2021-08-12 16:01:14 +01:00
James Wilson 8268cf2afe print feed 1 msg len 2021-08-12 16:01:12 +01:00
James Wilson 98c9ccd278 fmt, clean warnings, tidy aggregator opts and add queue length limit 2021-08-12 16:01:11 +01:00
James Wilson 968dd2b957 Try to force new thread for msg counter to ensure it has time to print 2021-08-12 16:01:09 +01:00
James Wilson b97aec99a8 monitoring queue len 2021-08-12 16:01:07 +01:00
James Wilson 87c0ee7d0d monitor aggregator length (dont discard msgs yet) 2021-08-12 16:00:56 +01:00
James Wilson d4b5c2b0c8 split e2e tests out and run them separately, not blocking the build or marking it as failed 2021-08-12 13:38:29 +01:00
James Wilson 7aa4ad49b3 bump the msg timeout a little higher 2021-08-12 13:10:17 +01:00
James Wilson 7563909609 Fix typo
Co-authored-by: Niklas Adolfsson <niklasadolfsson1@gmail.com>
2021-08-12 12:59:39 +01:00
James Wilson 770dd04b57 invert logic to make name make sense and fix comment typo 2021-08-12 12:47:29 +01:00
James Wilson f887510beb remove lint warning on cargo test 2021-08-12 12:42:44 +01:00
James Wilson 4480bbe72a Allow errors as well as closes for now to remove some brittleness 2021-08-12 12:40:53 +01:00
James Wilson 80d6ad916e Address David's comments 2021-08-11 17:23:22 +01:00
James Wilson f26b39ac63 Address feedback from Niklas 2021-08-11 16:59:11 +01:00
James Wilson 9f76fabaed give tokio threads a more convenient name for monitoring purposes 2021-08-09 11:36:46 +01:00
James Wilson b22efc804a Fix comment typo 2021-08-09 10:56:01 +01:00
James Wilson 626fe95d89 1 aggregator loop by default for now 2021-08-09 10:13:10 +01:00
James Wilson c469ef8dfe make AggregatorSet close to zero cost when only 1 aggregator asked for 2021-08-09 10:09:17 +01:00