Skip to content

Inconsistent SQL Query Results on Different Systems #25

@minger-hsxz

Description

@minger-hsxz

I encountered an issue while using your project. The problem arises because the pred results differ between Windows and Linux systems.

Description

Gold SQL:

SELECT YEAR FROM concert GROUP BY YEAR ORDER BY count(*) DESC LIMIT 1;

Pred SQL:

SELECT MAX(Year) FROM concert;

Schema and Data
The table schema and sample data used are as follows:

CREATE TABLE "concert" (
  "concert_ID" int,
  "concert_Name" text,
  "Theme" text,
  "Stadium_ID" text,
  "Year" text,
  PRIMARY KEY ("concert_ID"),
  FOREIGN KEY ("Stadium_ID") REFERENCES "stadium"("Stadium_ID")
);

INSERT INTO "concert" VALUES (1, "Auditions", "Free choice", 1, 2014);
INSERT INTO "concert" VALUES (2, "Super bootcamp", "Free choice 2", 2, 2014);
INSERT INTO "concert" VALUES (3, "Home Visits", "Bleeding Love", 2, 2015);
INSERT INTO "concert" VALUES (4, "Week 1", "Wide Awake", 10, 2014);
INSERT INTO "concert" VALUES (5, "Week 1", "Happy Tonight", 9, 2015);
INSERT INTO "concert" VALUES (6, "Week 2", "Party All Night", 7, 2015);

Observations

When executing the queries on different systems, the results for pred were inconsistent:

On Windows 11:

Gold: [('2014',)]
Pred: [('2015',)]

On Linux:

Gold: [('2015',)]
Pred: [('2015',)]

Analysis

Both 2014 and 2015 have the same count of concerts (3 each). Therefore, either year could be returned as the correct result. However, the discrepancy between systems suggests an underlying difference in implementation that affects the pred query results.

The main issue is that pred is not consistent across different systems, leading to potential false negatives when comparing it to gold. This example highlights a potential flaw in the evaluation method used by the project.

Conclusion

This issue demonstrates that the same SQL query can yield different results on different systems, affecting the reliability of the evaluation process. Further investigation into the underlying cause of this discrepancy and how to handle such cases in the evaluation would be valuable.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions