Skip to content

Fix schema for call to hash.HashOf() in HashLookups #3038

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jun 21, 2025
Merged

Fix schema for call to hash.HashOf() in HashLookups #3038

merged 8 commits into from
Jun 21, 2025

Conversation

jycor
Copy link
Contributor

@jycor jycor commented Jun 18, 2025

The incorrect schema was used in the hash.HashOf() function.
n.Schema() is the schema of the entire JoinNode; we just needed the schema of the key.'

Test bump: https://github.com/dolthub/ld/pull/20634

@nicktobey
Copy link
Contributor

Can you give more context about what this means? Was there an issue caused by this? Are there regression tests? What goes wrong when the schema is used here?

@jycor jycor changed the title don't use schema for hashlookups Fix schema for call to hash.HashOf() in HashLookups Jun 18, 2025
@@ -1161,6 +1161,25 @@ var JoinScriptTests = []ScriptTest{
},
},
},
{
// Since hash.HashOf takes in a sql.Schema to convert and hash keys,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is confusing without context. This is a regression test but it doesn't describe what issue was being fixes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is that we were passing in the wrong schema; now, we're passing in the right one.

var sch sql.Schema
if tup, isTup := e.(*expression.Tuple); isTup {
for _, expr := range tup.Children() {
sch = append(sch, &sql.Column{Type: expr.Type()})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worried about the performance implications of creating a schema every time we call GetHashKey. And it's not clear what was wrong with using the HashLookup's precomputed schema: does the value of row have a different schema based on which side of the join we're computing the hash for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can pass in nil for schema instead or just store it once we create the key.

HashLookup's "precomputed" schema is the schema of the join not the key.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't pass in nil because we need to convert the keys to the same type before computing the hash. Otherwise if the tables being joined have different-but-convertible types for the join column, we might not detect that two rows are equal because they'll have different hashes.

It looks like in the proposed change, we're passing in a nil schema if the expression isn't a tuple. That's probably incorrect. I'm surprised that it's not breaking any tests.

We should precompute the schema of the key column(s) and store it in the HashLookup. We should also add a test where the tables being joined have different types that would produce different hashes, to make sure we handle that correctly.

@@ -1162,20 +1162,34 @@ var JoinScriptTests = []ScriptTest{
},
},
{
// Since hash.HashOf takes in a sql.Schema to convert and hash keys,
// we need to pass in the right keys.
Name: "HashLookups regression test",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we split this into two different tests?

@jycor jycor merged commit 0270726 into main Jun 21, 2025
8 checks passed
@jycor jycor deleted the james/hash branch June 21, 2025 00:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants