-
-
Notifications
You must be signed in to change notification settings - Fork 231
Fix schema for call to hash.HashOf()
in HashLookups
#3038
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Can you give more context about what this means? Was there an issue caused by this? Are there regression tests? What goes wrong when the schema is used here? |
hash.HashOf()
in HashLookups
enginetest/queries/join_queries.go
Outdated
@@ -1161,6 +1161,25 @@ var JoinScriptTests = []ScriptTest{ | |||
}, | |||
}, | |||
}, | |||
{ | |||
// Since hash.HashOf takes in a sql.Schema to convert and hash keys, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is confusing without context. This is a regression test but it doesn't describe what issue was being fixes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue is that we were passing in the wrong schema; now, we're passing in the right one.
sql/plan/hash_lookup.go
Outdated
var sch sql.Schema | ||
if tup, isTup := e.(*expression.Tuple); isTup { | ||
for _, expr := range tup.Children() { | ||
sch = append(sch, &sql.Column{Type: expr.Type()}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm worried about the performance implications of creating a schema every time we call GetHashKey. And it's not clear what was wrong with using the HashLookup's precomputed schema: does the value of row
have a different schema based on which side of the join we're computing the hash for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can pass in nil
for schema instead or just store it once we create the key.
HashLookup's "precomputed" schema is the schema of the join not the key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't pass in nil because we need to convert the keys to the same type before computing the hash. Otherwise if the tables being joined have different-but-convertible types for the join column, we might not detect that two rows are equal because they'll have different hashes.
It looks like in the proposed change, we're passing in a nil schema if the expression isn't a tuple. That's probably incorrect. I'm surprised that it's not breaking any tests.
We should precompute the schema of the key column(s) and store it in the HashLookup. We should also add a test where the tables being joined have different types that would produce different hashes, to make sure we handle that correctly.
enginetest/queries/join_queries.go
Outdated
@@ -1162,20 +1162,34 @@ var JoinScriptTests = []ScriptTest{ | |||
}, | |||
}, | |||
{ | |||
// Since hash.HashOf takes in a sql.Schema to convert and hash keys, | |||
// we need to pass in the right keys. | |||
Name: "HashLookups regression test", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we split this into two different tests?
The incorrect schema was used in the
hash.HashOf()
function.n.Schema()
is the schema of the entire JoinNode; we just needed the schema of the key.'Test bump: https://github.com/dolthub/ld/pull/20634