Bug fixes: ungraceful test crash fixes #637

miningexperiments · 2025-02-13T17:47:10Z

Bug fix

When running cargo tests the testing is crashed ungracefully leaving temp databases and such in tens of gigabytes, if they are run a few times, and still crashing.
This mainly due errors not propagating to the test functions, but panicking early.
tokio was underflowing UtxoChangeSubscription,

What I've done

Propagated errors from temp dir creation, create_temp_db macro, unwrapper errors in test functions.
New test functions with consensus.init() were not shutdown, and they were Arc'd so could possibly stay haunting after test is over?
connection_handler.rs serve function was forcefully panicked, I changed this to error
added math check to UtxoChangeSubscription

Additional thoughts

protocol/p2p/src/core/connection_handler.rs This kept going on for 15 tries, and then shut. Perhaps another check should be added so it can close earlier in the test.
- It is common for servers and e.g. VPNs to not use ipv6, so graceful error propagation is needed in testing (not panic)
cargo test without --release is still complaining in current master branch (before any pr changes). This is not tested in ci.yaml.

error: 2 targets failed:
    `-p kaspa-testing-integration --lib`
    `-p kaspa-wallet-core --lib`
    ```

Changed functions to use std::time::Instant which is monotonic, to avoid Rust panics with SystemTime. Replaced some unwraps with an expect. Removed redundant brackets, and secp256k1::

Changed fn clean_old_pending_outpoints to retain keys that are younger than an hour, instead of collecting older than an hour ones as a vector, and then using a new for loop to deleting them. linting with cargo fmt

# Optimization flow.rs ; async sync_missing_relay_past_headers # Fix - Save memory; changed jobs to pass an iterator to try_join_all instead of collecting a vector and then passing it. - Changed error check to return an explicit error, instead of implicit

This reverts commit 0bffb60.

create_temp_db now returns a Result

create_test_db returns a Result

fixes cargo test crashes (ungraceful termination)

create_temp_db now returns Result to fix test crashing ungracefully

create_temp_db returns result to prevent test crashing ungracefully, so we need to add error handling

tokio-runtime-worker subtracted below zero and caused overflow during cargo test, and ungraceful termination

serve_result causes testing to crash ungracefully using panic! when error recieved. Now error is propagated

create temp db returns Result

create_temp_db returns Result, add error handling

…eads to hitting File Descriptor Limit and failing tests due to panic in get_kaspa_tempdir func in database/src/utils.rs

… reaches System File Descriptor limit and crashes testing ungracefully leaving temp files in system

… reporting

miningexperiments · 2025-02-14T08:41:39Z

this is fixed in PR: test daemon_integration_tests::daemon_utxos_propagation_test ... ok

The other test failing is the

 thread 'tx::generator::test::test_generator_inputs_250k_outputs_2_sweep' has overflowed its stack
fatal runtime error: stack overflow

I think it might be just that the vector with 250k elements is in stack, so it overflows. You could allocate it in heap with fixed size, but then you are fixing the test, and not sure how it reflects to testing the making of utxo_entries in wallet/core/src/tx/generator/generator.rs @michaelsutton @coderofstuff @someone235

coderofstuff

Initial review. I'll probably on be able to get back to this after the next few weeks but please take a look.

coderofstuff · 2025-02-17T18:34:48Z

notify/src/subscription/single.rs

-            self
-        );
+        // Prevent underflow; [ERROR] thread 'tokio-runtime-worker' panicked at notify/src/subscription/single.rs:388:13: attempt to subtract with overflow
+        let _ = UTXOS_CHANGED_SUBSCRIPTIONS.fetch_update(Ordering::SeqCst, Ordering::SeqCst, |count| {


What if we just did:

UTXOS_CHANGED_SUBSCRIPTIONS.fetch_sub(1, Ordering::SeqCst).saturating_sub(1)

Also, I wonder what the root cause of UTXOS_CHANGED_SUBSCRIPTIONS being 0 is. The above fixes the underflow, but the incorrect usage that allowed the situation still needs to be determined. the expectation here is that UTXOS_CHANGED_SUBSCRIPTIONS should always be above 0 when drop occurs OR document the situations when it can be 0.

afaik saturating_sub only applies to old value from fetch_sub not the atomic itself, so it doesnt prevent actual atomic variable from becoming negative. Suggested fetch_update solution only applies if the value is greater than zero

btw this happens frequently in testing

coderofstuff · 2025-02-17T18:35:44Z

.github/workflows/ci.yaml

@@ -113,6 +113,9 @@ jobs:
            target/
          key: ${{ runner.os }}-cargo-${{ hashFiles('**/Cargo.lock') }}

+      - name: Run cargo tests non--release


We moved away from test and switched to using nextest. But I do wonder why nextest doesn't capture the errors that test is displaying here.

cargo test still shows tests failing that nextest does not, so what is a recommended solution?

coderofstuff · 2025-02-17T18:43:40Z

protocol/p2p/src/core/connection_handler.rs

            match serve_result {
                Ok(_) => info!("P2P Server stopped: {}", serve_address),
-                Err(err) => panic!("P2P, Server {serve_address} stopped with error: {err:?}"),
+                Err(err) => log::error!("P2P, Server {serve_address} stopped with error: {err:?}"),


Add error in the use at line 8 and just use that here.

coderofstuff · 2025-02-17T18:47:17Z

testing/integration/src/consensus_integration_tests.rs

@@ -1820,7 +1821,7 @@ async fn run_kip10_activation_test() {
    let mut genesis_multiset = MuHash::new();
    consensus.append_imported_pruning_point_utxos(&initial_utxo_collection, &mut genesis_multiset);
    consensus.import_pruning_point_utxo_set(config.genesis.hash, genesis_multiset).unwrap();
-    consensus.init();
+    let wait_handles = consensus.init();


Add comments on why this is necessary

this was done in previous tests as well.
Here

pub fn init(&self) -> Vec<JoinHandle<()>> { self.consensus.run_processors() }

and

pub fn shutdown(&mut self) { self.core.shutdown(); self.join(); }

where core is wrapped in Arc, so its reference counted. Without it some background threads might stay running, thought it might be related to getting System File Descriptor problems and not getting /tmp/rusty-kaspa cleanups early.

coderofstuff · 2025-02-17T18:47:43Z

testing/integration/src/consensus_integration_tests.rs

@@ -1872,6 +1873,8 @@ async fn run_kip10_activation_test() {
    let status = consensus.add_utxo_valid_block_with_parents((index + 1).into(), vec![index.into()], vec![spending_tx.clone()]).await;
    assert!(matches!(status, Ok(BlockStatus::StatusUTXOValid)));
    assert!(consensus.lkg_virtual_state.load().accepted_tx_ids.contains(&tx_id));
+    consensus.shutdown(wait_handles);


Same here and the rest of similar changes: Add comments on why this is necessary.

…fore even generator function call

miningexperiments added 30 commits January 24, 2025 19:53

Instant time instead of SystemTime

8aa1073

Changed functions to use std::time::Instant which is monotonic, to avoid Rust panics with SystemTime. Replaced some unwraps with an expect. Removed redundant brackets, and secp256k1::

fn clean_old_pending_outpoints + lint

58ba334

Changed fn clean_old_pending_outpoints to retain keys that are younger than an hour, instead of collecting older than an hour ones as a vector, and then using a new for loop to deleting them. linting with cargo fmt

Merge branch 'kaspanet:master' into master

a169b4d

Merge branch 'master' into master

200cda0

Revert "optimization: flow.rs; fn sync_missing_relay_past_headers"

c1a7c04

This reverts commit 0bffb60.

Update lib.rs

c677877

create_temp_db now returns a Result

Update test_consensus.rs

a8341c4

create_test_db returns a Result

Update test_consensus.rs

4e91574

Update relations.rs

328044d

Update tips.rs

6b08428

Update build.rs

c6d3b99

Update validate.rs

e1cf33e

Update inquirer.rs

5a6021e

Update relations.rs

29012b9

Update access.rs

80ec39e

Update set_access.rs

ef1b52e

Update utils.rs ; create_temp_db returns Result

74a370a

fixes cargo test crashes (ungraceful termination)

Update processor.rs ; create_temp_db returns Result

a6d39cf

create_temp_db now returns Result to fix test crashing ungracefully

Update index.rs ; create_temp_db returns Result

a68d477

create_temp_db returns result to prevent test crashing ungracefully, so we need to add error handling

Update single.rs ; UtxosChangedSubscription

9a6be4c

tokio-runtime-worker subtracted below zero and caused overflow during cargo test, and ungraceful termination

Update connection_handler.rs ; fn serve

68869e1

serve_result causes testing to crash ungracefully using panic! when error recieved. Now error is propagated

Update main.rs

69c2e54

create temp db returns Result

Update network.rs

724b3ed

Update consensus_integration_tests.rs

8fe8629

create_temp_db returns Result, add error handling

linting

75bec2c

tmp files created by db/consensus not dropped since Arc used, which l…

00756be

…eads to hitting File Descriptor Limit and failing tests due to panic in get_kaspa_tempdir func in database/src/utils.rs

add error reporting to get_temp_dir creation, since testing sometimes…

3cfdb74

… reaches System File Descriptor limit and crashes testing ungracefully leaving temp files in system

we can unwrap since error is reported earlier, otherwise double error…

40454b6

… reporting

linting

fc848e6

add cargo test non--release to check overflows

f19d651

added warn msg to UtxosChangedSubscription

78f194c

coderofstuff requested changes Feb 17, 2025

View reviewed changes

miningexperiments and others added 4 commits February 26, 2025 18:48

Merge branch 'kaspanet:master' into testing-fixes

9eeadf3

refactor: add error! instead of log::error

b0fdde1

cargo test to nextest

1770afa

Merge branch 'master' into testing-fixes

162ee59

miningexperiments requested a review from coderofstuff March 20, 2025 08:53

miningexperiments added 3 commits April 1, 2025 10:27

Merge branch 'kaspanet:master' into testing-fixes

b4b7e0f

allocate 250k input vector to heap, so it does not stack overflow, be…

52b65b6

…fore even generator function call

fix: missing run

fb2607a

miningexperiments force-pushed the testing-fixes branch from a0ae6d4 to fb2607a Compare April 1, 2025 14:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bug fixes: ungraceful test crash fixes #637

Bug fixes: ungraceful test crash fixes #637

Uh oh!

miningexperiments commented Feb 13, 2025

Uh oh!

miningexperiments commented Feb 14, 2025

Uh oh!

coderofstuff left a comment

Uh oh!

coderofstuff Feb 17, 2025

Uh oh!

miningexperiments Feb 26, 2025 •

edited

Loading

Uh oh!

miningexperiments Feb 26, 2025

Uh oh!

coderofstuff Feb 17, 2025

Uh oh!

miningexperiments Feb 26, 2025

Uh oh!

coderofstuff Feb 17, 2025

Uh oh!

coderofstuff Feb 17, 2025

Uh oh!

miningexperiments Feb 26, 2025

Uh oh!

coderofstuff Feb 17, 2025

Uh oh!

Uh oh!

Bug fixes: ungraceful test crash fixes #637

Are you sure you want to change the base?

Bug fixes: ungraceful test crash fixes #637

Uh oh!

Conversation

miningexperiments commented Feb 13, 2025

Bug fix

What I've done

Additional thoughts

Uh oh!

miningexperiments commented Feb 14, 2025

Uh oh!

coderofstuff left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

miningexperiments Feb 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

miningexperiments Feb 26, 2025 •

edited

Loading