New trait to support work variables #89

tirix · 2022-05-23T04:45:40Z

This introduces a trait Resolver meant to handle work variables outside of metamath-knife.

The new trait shall be able to resolve work variable names like e.g. &W1 into a virtual symbol, as well as into the corresponding a virtual floating axiom, and vice-versa for printing back that symbol.

An implementation is provided in the form of Nameset, which does not resolve any additional work variables, but only regular symbols straight from the database.

And avoid to double-count typecodes

david-a-wheeler · 2022-05-31T17:55:02Z

I haven't had time to review the code, but the idea makes complete sense. If no one objects for a while, I'd say merge it. I'd like to hear from @digama0 if he has a chance to review it.

digama0 · 2022-05-31T19:48:38Z

src/formula.rs

+    }
+
+    fn label_for_symbol(&self, _: Symbol) -> (Label, TypeCode) {
+        panic!("Symbols in regular formulas shall only be regular symbols");


Why is this a panic? Even in the regular case, var -> varhyp lookups are a thing.

The only place where it is called currently is a case where it has been established the symbol tested is work variable, so an implementation is currently not required.
grammar.rs, line 1276:

let opt_node = if !rslv.is_work_variable(token.symbol) { map.get(&(SymbolType::Constant, token.symbol)) } else { // The next symbol is not a regular symbol, resolve it first let (label, typecode) = rslv.label_for_symbol(token.symbol);

But you are right that we could provide an implementation for the case where the symbol provided is a regular variable. I'll add that.

See commit.

digama0 · 2022-05-31T19:52:02Z

src/formula.rs

            });
        }
        dt.finish()
    }
 }

+type FlattenStack<'a> = Vec<(
+    Either<TokenIter<'a>, Symbol>,


I suggest using an enum here instead of Either. It is not obvious what the two variants mean from the type.

See commit: Enum for formula flattening iterator, and as_string

digama0 · 2022-05-31T19:55:09Z

src/formula.rs

-            self.stack.pop();
-            self.next()
+            (Either::Right(symbol), _) => {
+                self.stack.pop();


This seems wrong. You are popping the stack but you haven't done anything with the sibling_iter yet. Isn't that going to lose items following the work variable?

Actually whenever there is a single symbol on the stack, there are no siblings, so there is no sibling iterator to track.
This is actually better visible with the new enum you suggested above.

digama0 · 2022-05-31T19:56:24Z

src/formula.rs

+        match self.symbol_token(symbol) {
+            Cow::Borrowed(token) => as_str(token).into(),
+            Cow::Owned(token) => String::from_utf8(token)
+                .expect("TokenPtr is supposed to be UTF8")


Can this be modified to use as_str()?

When I try e.g. as_str(&token), Rust complains that I'm trying to return a reference to a local variable.
Indeed, as_str does not return an owned structure, but just a &str which is borrowed from its input token.

In any case there is duplicated code here, and I agree this is a concern. Maybe I could move this code to a new as_string function, right next to as_str ?
Or do you have a better suggestion ?

as_string sounds good.

See as_string commit

digama0 · 2022-05-31T19:57:34Z

src/grammar.rs

@@ -1102,29 +1100,6 @@ impl Grammar {
                {
                    // Empty parser command
                    match k.as_ref(buf) {
-                        b"syntax" => match rest {


Why did this get removed?

Actually I shall have put that in a separated PR.

This piece of code records the typecodes declared with the $j syntax commands.
However this is done both at the initialization phase, and again here during the handling of the special garden_path commands, which happens after a first grammar has been built.

As a result, the typecodes array contained duplicated elements.

digama0 · 2022-05-31T20:01:20Z

src/grammar_tests.rs

@@ -116,7 +116,7 @@ fn test_parse_formula() {
            &mut fmla_vec.clone().into_iter(),
            &[wff, class],
            false,
-            &names,
+            &names as &Nameset,


Suggested change

&names as &Nameset,

&**names,

Alternatively and additionally, Arc<T: Resolver>: Resolver seems like a reasonable impl and would allow you to pass &names directly, although it is probably better at runtime to undo the indirection here rather than in the method calls.

I'll check that - here this is only for test purpose, so we don't need to squeeze out the highest performance.

See Smart deref coercions instead of casts

digama0 · 2022-05-31T20:08:17Z

src/nameck.rs

@@ -63,7 +63,7 @@ use std::u32;
 ///
 /// [INC]: https://github.com/sorear/smetamath-rs/issues/11
 #[derive(Copy, Clone, Debug, PartialOrd, Ord, PartialEq, Eq, Default, Hash)]
-pub struct Atom(u32);
+pub struct Atom(pub u32);


What is the motivation for making this public? This will make garbage atoms user-constructible, and its relation to the other invariants of the library are not clear. I don't see why you would ever want to be able to write Atom(37). Read only access to the number can be done by a Atom::index() function.

This is the main drawback of this approach: an implementation of the Resolver trait needs to build and return its own Symbol for work variables, which are atoms.

Slightly better here might be to only provide a public constructor, this would at least make these atoms practically unmutable, since even with a mutable reference to one, a library caller could not modify the underlying identifier.

If the resolver needs to create Symbols, then it should have access to a name generator which can ensure that the Atoms are globally unique, and it should not be directly responsible for manipulating the number.

See Atom generator, avoid exposing atom's internal

digama0 · 2022-06-05T02:25:15Z

src/nameck.rs

    #[must_use]
-    pub fn last_atom(&self) -> Atom {
-        Atom(self.atom_table.table.len() as u32)
+    pub fn is_extra_atom(&self, atom: Atom) -> bool {


I'm not sure that we should promise that extra atoms always come in a block after regular atoms, or even that there is a way to distinguish them. What if the parser is restarted after some extra atoms are created, in incremental use cases?

So right now I'm kind of eating the cake starting from both sides:
Nameset naturally allocates atoms from 0 up, continuously, while I'm getting the identifiers from MAX down, assuming the passed ID stays small.

What my client currently does to compute the ID, is that it also ranks them and assumes that small indices are used first (&W1, &S1, &C1, &W2, &S2, &C2, &W3, etc.) This allows a direct computation without any lookup.

There are currently around 100'000 atoms (99'577) in set.mm, so there is still plenty of space until 2^32.

Beyond the fact that this method works with direct computations and no lookup, which shall be faster, it most importantly works with an immutable Nameset instance. If Nameset were to reserve internal spots for the "extra" atoms, this means it needs to be mutated, which goes against the current philosophy of "once the DB is parsed, everything is immutable", which allows very fast execution since we only pass around Arcs to the different structures of the DB.

Let's assume we went up to work variable &W99999, and that each time the parser is restarted everything changes and it requires a new set of 100'000 atom indices. Using u32 as currently, the parsing could be restarted 43'000 times until there is a clash.

Do you have another idea?

Beyond the fact that this method works with direct computations and no lookup, which shall be faster, it most importantly works with an immutable Nameset instance. If Nameset were to reserve internal spots for the "extra" atoms, this means it needs to be mutated, which goes against the current philosophy of "once the DB is parsed, everything is immutable", which allows very fast execution since we only pass around Arcs to the different structures of the DB.

I think that a name generator that does not mutate can't possibly work correctly. I think it is fine to use interior mutability here, and the only state a name generator needs to track is the last allocated ID so a AtomicU32 will suffice, so API-wise it can still take a &self.

I'm still a little dubious about this mmj2-styled work variable thing. This really ought to be some kind of enum, work vars are not the same as regular vars in a whole bunch of different ways, and adding phantom $f statements to the database seems like a really weird encoding to me.

tirix added 2 commits May 23, 2022 12:39

New trait to support work variables

811e44b

Cargo fmt

8450e1f

tirix mentioned this pull request May 23, 2022

Work Variables #87

Open

tirix added 4 commits May 24, 2022 01:04

Simplifying out Resolver's error type

03b7fdc

Use Cow when strings need to be built

dd045c2

Handle work variables in a more lightweight way

2444bb2

And avoid to double-count typecodes

Simplify back next_node in parse_formula

9020da2

digama0 reviewed May 31, 2022

View reviewed changes

tirix added 5 commits June 4, 2022 12:14

Implement label_for_symbol for Nameset

d2888a8

Enum for formula flattening iterator, and as_string

0e26a71

Smart deref coercions instead of casts

586e0a3

Atom generator, avoid exposing atom's internal

9008997

fmt and clippy

c397738

digama0 reviewed Jun 5, 2022

View reviewed changes

tirix mentioned this pull request Jun 1, 2024

Work variables tirix/rumm#12

Open

New trait to support work variables #89

Are you sure you want to change the base?

New trait to support work variables #89

Uh oh!

Conversation

tirix commented May 23, 2022

Uh oh!

david-a-wheeler commented May 31, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tirix Jun 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tirix Jun 4, 2022 •

edited

Loading