Skip to content

Conversation

cypriansakwa
Copy link
Contributor

@cypriansakwa cypriansakwa commented Jul 24, 2025

Description

This pull request introduces a robust, circuit-friendly wrapper for substring search using the noir_string_search library. The circuit, substring_search, validates input lengths with assertions and then always calls the underlying library function, using selector logic to guarantee correct output for all cases. This approach ensures the circuit is safe for zero-knowledge proof systems by avoiding control flow branching and handling all edge cases arithmetically.

  • For an empty needle, the circuit returns (true, 0), following standard library behavior.
  • For a needle longer than the haystack, the circuit returns (false, 0).
  • For all other cases, it returns whether the substring was found and its index.

Selector logic is used to mask the result of the library call as appropriate, ensuring all outputs are circuit-safe and preventing assertion failures or panics.

Problem

Addresses the need for a practical example demonstrating integration with the noir_string_search library (see noir-examples#31). This implementation demonstrates best practices for integrating external Noir libraries into real-world applications, particularly in the context of zero-knowledge circuits.

Summary

  • Adds a wrapper circuit that asserts input validity and uses selector logic to determine output.
  • Always calls the underlying library, but uses selectors to mask outputs for edge cases.
  • Returns (true, 0) for empty needles (matching standard substring search behavior), and (false, 0) for needles longer than the haystack.
  • Maintains a public interface of (bool, u32) for clarity and compatibility with the noir_string_search contract.
  • Ensures all test cases, including edge cases, are handled gracefully with circuit-safe logic.

Additional Context

  • Selector logic ensures there are no control-flow branches, which is important for circuit determinism and security.
  • The approach is robust against future changes in the upstream library and can be easily adapted if the library's behavior changes.
  • All code and tests work without runtime assertion errors, panics, or constant return value warnings.

PR Checklist

  • I have tested the changes locally.
  • I have formatted the changes with Prettier and/or cargo fmt on default settings.

@cypriansakwa cypriansakwa marked this pull request as draft July 24, 2025 09:06
@cypriansakwa cypriansakwa marked this pull request as ready for review July 24, 2025 13:33
@cypriansakwa
Copy link
Contributor Author

Hi @critesjosh and @signorecello 👋,

Just a gentle reminder to please take a moment to review this PR when you get a chance.
Your feedback will be really helpful in moving it forward. 🙏

Thanks a lot for your time and support!

@critesjosh
Copy link
Collaborator

Just did a quick pass with Claude. do you agree with this analysis?

Issues Found

  1. Critical Logic Error in main.nr (lines 8-10, 25-37)

The documentation comments and implementation are contradictory and incorrect:

  • Comment says (line 9): "If needle is empty, found=true at index 0"
  • Implementation does (line 34): Empty needle returns found=true
  • Library expects: The library is called regardless of edge cases (line 23), then selector logic is applied afterward

Problem: The code calls haystack_body.substring_match(needle_body) on line 23 before checking edge cases. According to proper usage, the library should handle the search internally.
However, the selector logic afterward (lines 25-37) attempts to override the library's result for empty needles and "too long" cases.

Actual behavior with empty needle: An empty needle conventionally matches at position 0, but this depends on the library's implementation. The code returns found=true when
is_empty=true, which may be semantically incorrect for substring search (empty string is typically considered present at every position, not just index 0).

  1. Incorrect README Documentation (lines 58-63)

The README claims:
The circuit only attempts the underlying library search if:

  • The needle is not empty.
  • The needle length does not exceed the haystack length.

Otherwise, it returns (false, 0).

This is false. The code at main.nr:23 calls substring_match() unconditionally, then applies selector logic to potentially override the result. The library is always invoked.

  1. Missing Edge Case Tests

README line 69 claims tests cover "Needle longer than haystack" but no such test exists in test_inputs.nr. The tests only cover positive matches.

  1. Semantic Issue: Empty Needle Behavior

Line 9 documents that empty needle returns found=true, but this is questionable. In most string search semantics, an empty needle should either:

  • Match at every position (standard semantics)
  • Be considered invalid/error
  • Match at position 0 only (current behavior, but unusual)

The current implementation chooses the third option without justification.

  1. Redundant Field Conversions

Lines 29-35 use Field arithmetic for selector logic (bool_to_field, line 4-6), but this could be simplified with direct boolean logic since Noir supports conditional expressions.

What's Correct

  • Nargo.toml properly specifies the dependency with a version tag
  • Prover.toml has valid test inputs ("hello world" / "world")
  • The positive test cases that exist are correct
  • General structure and imports are appropriate
  • The library types (StringBody256, SubString32) are used correctly

Recommendations

  1. Fix the logic: Either remove the wrapper logic and use the library directly, OR guard the library call to not invoke it for edge cases
  2. Update README: Accurately describe what the code actually does
  3. Add missing tests: Empty needle, needle longer than haystack, needle not found
  4. Clarify semantics: Document why empty needle returns (true, 0) rather than other options
  5. Simplify if possible: Consider if the selector logic is even necessary or if the library handles these cases appropriately

@cypriansakwa
Copy link
Contributor Author

Just did a quick pass with Claude. do you agree with this analysis?

Issues Found

  1. Critical Logic Error in main.nr (lines 8-10, 25-37)

The documentation comments and implementation are contradictory and incorrect:

  • Comment says (line 9): "If needle is empty, found=true at index 0"
  • Implementation does (line 34): Empty needle returns found=true
  • Library expects: The library is called regardless of edge cases (line 23), then selector logic is applied afterward

Problem: The code calls haystack_body.substring_match(needle_body) on line 23 before checking edge cases. According to proper usage, the library should handle the search internally. However, the selector logic afterward (lines 25-37) attempts to override the library's result for empty needles and "too long" cases.

Actual behavior with empty needle: An empty needle conventionally matches at position 0, but this depends on the library's implementation. The code returns found=true when is_empty=true, which may be semantically incorrect for substring search (empty string is typically considered present at every position, not just index 0).

  1. Incorrect README Documentation (lines 58-63)

The README claims: The circuit only attempts the underlying library search if:

  • The needle is not empty.
  • The needle length does not exceed the haystack length.

Otherwise, it returns (false, 0).

This is false. The code at main.nr:23 calls substring_match() unconditionally, then applies selector logic to potentially override the result. The library is always invoked.

  1. Missing Edge Case Tests

README line 69 claims tests cover "Needle longer than haystack" but no such test exists in test_inputs.nr. The tests only cover positive matches.

  1. Semantic Issue: Empty Needle Behavior

Line 9 documents that empty needle returns found=true, but this is questionable. In most string search semantics, an empty needle should either:

  • Match at every position (standard semantics)
  • Be considered invalid/error
  • Match at position 0 only (current behavior, but unusual)

The current implementation chooses the third option without justification.

  1. Redundant Field Conversions

Lines 29-35 use Field arithmetic for selector logic (bool_to_field, line 4-6), but this could be simplified with direct boolean logic since Noir supports conditional expressions.

What's Correct

  • Nargo.toml properly specifies the dependency with a version tag
  • Prover.toml has valid test inputs ("hello world" / "world")
  • The positive test cases that exist are correct
  • General structure and imports are appropriate
  • The library types (StringBody256, SubString32) are used correctly

Recommendations

  1. Fix the logic: Either remove the wrapper logic and use the library directly, OR guard the library call to not invoke it for edge cases
  2. Update README: Accurately describe what the code actually does
  3. Add missing tests: Empty needle, needle longer than haystack, needle not found
  4. Clarify semantics: Document why empty needle returns (true, 0) rather than other options
  5. Simplify if possible: Consider if the selector logic is even necessary or if the library handles these cases appropriately

Thank you for taking the time to do such a detailed review and for running this through Claude 🙏.
I really appreciate the thorough analysis and the clear breakdown of issues — this helps me see where the implementation and documentation diverged.

I agree with your points on the logic error, README mismatch, missing edge case tests, and the semantic ambiguity around empty needles. I’ll:

  • Refactor the logic so edge cases are handled cleanly (either by guarding the library call or by removing redundant overrides).
  • Update the README to accurately reflect the actual behavior.
  • Add proper test coverage for empty needles, longer-needle-than-haystack, and not-found cases.
  • Clarify the semantic choice for empty needle behavior in both code and docs.
  • Simplify the selector logic if Noir’s conditional support makes it cleaner.

I’ll push these corrections in the next update.
Thanks again for the constructive feedback — it definitely makes the project stronger.

@cypriansakwa
Copy link
Contributor Author

cypriansakwa commented Oct 6, 2025

Just did a quick pass with Claude. do you agree with this analysis?

Issues Found

  1. Critical Logic Error in main.nr (lines 8-10, 25-37)

The documentation comments and implementation are contradictory and incorrect:

  • Comment says (line 9): "If needle is empty, found=true at index 0"
  • Implementation does (line 34): Empty needle returns found=true
  • Library expects: The library is called regardless of edge cases (line 23), then selector logic is applied afterward

Problem: The code calls haystack_body.substring_match(needle_body) on line 23 before checking edge cases. According to proper usage, the library should handle the search internally. However, the selector logic afterward (lines 25-37) attempts to override the library's result for empty needles and "too long" cases.
Actual behavior with empty needle: An empty needle conventionally matches at position 0, but this depends on the library's implementation. The code returns found=true when is_empty=true, which may be semantically incorrect for substring search (empty string is typically considered present at every position, not just index 0).

  1. Incorrect README Documentation (lines 58-63)

The README claims: The circuit only attempts the underlying library search if:

  • The needle is not empty.
  • The needle length does not exceed the haystack length.

Otherwise, it returns (false, 0).
This is false. The code at main.nr:23 calls substring_match() unconditionally, then applies selector logic to potentially override the result. The library is always invoked.

  1. Missing Edge Case Tests

README line 69 claims tests cover "Needle longer than haystack" but no such test exists in test_inputs.nr. The tests only cover positive matches.

  1. Semantic Issue: Empty Needle Behavior

Line 9 documents that empty needle returns found=true, but this is questionable. In most string search semantics, an empty needle should either:

  • Match at every position (standard semantics)
  • Be considered invalid/error
  • Match at position 0 only (current behavior, but unusual)

The current implementation chooses the third option without justification.

  1. Redundant Field Conversions

Lines 29-35 use Field arithmetic for selector logic (bool_to_field, line 4-6), but this could be simplified with direct boolean logic since Noir supports conditional expressions.

What's Correct

  • Nargo.toml properly specifies the dependency with a version tag
  • Prover.toml has valid test inputs ("hello world" / "world")
  • The positive test cases that exist are correct
  • General structure and imports are appropriate
  • The library types (StringBody256, SubString32) are used correctly

Recommendations

  1. Fix the logic: Either remove the wrapper logic and use the library directly, OR guard the library call to not invoke it for edge cases
  2. Update README: Accurately describe what the code actually does
  3. Add missing tests: Empty needle, needle longer than haystack, needle not found
  4. Clarify semantics: Document why empty needle returns (true, 0) rather than other options
  5. Simplify if possible: Consider if the selector logic is even necessary or if the library handles these cases appropriately

Thank you for taking the time to do such a detailed review and for running this through Claude 🙏. I really appreciate the thorough analysis and the clear breakdown of issues — this helps me see where the implementation and documentation diverged.

I agree with your points on the logic error, README mismatch, missing edge case tests, and the semantic ambiguity around empty needles. I’ll:

  • Refactor the logic so edge cases are handled cleanly (either by guarding the library call or by removing redundant overrides).
  • Update the README to accurately reflect the actual behavior.
  • Add proper test coverage for empty needles, longer-needle-than-haystack, and not-found cases.
  • Clarify the semantic choice for empty needle behavior in both code and docs.
  • Simplify the selector logic if Noir’s conditional support makes it cleaner.

I’ll push these corrections in the next update. Thanks again for the constructive feedback — it definitely makes the project stronger.

Hi @critesjosh

Thank you very much for your thoughtful review and valuable feedback on this project.
I truly appreciate the time and attention you dedicated to examining both the implementation and documentation.

Following your recommendations, I have carefully revised the project to ensure it aligns fully with Noir best practices and circuit safety standards. Specifically:

  • The main.nr file now includes explicit input validation and safe handling of edge cases (empty or oversized substrings) before invoking the noir_string_search library.
  • The logic now avoids panics or assertion failures, ensuring deterministic and verifiable circuit behavior.
  • The project README has been expanded with clear usage instructions, detailed feature explanations, and comprehensive test examples to enhance clarity and reproducibility.

These updates make the circuit more robust, compliant with your feedback, and better suited for both educational and practical zero-knowledge use cases.

Thank you once again for your insightful feedback and guidance — it has been instrumental in refining this work.
Please let me know if there are any further adjustments or clarifications you would like to see.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants