Skip to content

Bug with evaluating SQLs containing "value" identifiers #21

@BugMaker-Boyan

Description

@BugMaker-Boyan

I download spider dataset from Spider Leaderboard.

When I run the command:
python evaluation.py --db spider/test_database --etype exec --gold spider/test_data/dev_gold.sql --pred spider/test_data/dev_gold.sql

Then returns:
easy medium hard extra all
count 470 857 463 357 2147
===================== EXECUTION ACCURACY =====================
execution 0.996 0.977 0.983 0.994 0.985

However, in above command, the gold file and pred file are the same.

I find the problem in evaluation.py, code link (line# 569):
https://github.com/taoyds/test-suite-sql-eval/blob/e97acc546ecbee8fa27fa8dbf025ef61493a876c/evaluation.py#L569C13-L569C48

When predicted sql contains "value" identifiers (e.g. a column called "value"), p_str = p_str.replace("value", "1") replace string "value" in predicted sql incorrectly.

There are such sqls in Spider-test dataset:

SELECT CONTENTS FROM boxes WHERE Value > 150 warehouse_1
SELECT CONTENTS FROM boxes WHERE Value > 150 warehouse_1
SELECT warehouse , avg(value) FROM boxes GROUP BY warehouse warehouse_1
SELECT warehouse , avg(value) FROM boxes GROUP BY warehouse warehouse_1
SELECT avg(value) , sum(value) FROM boxes warehouse_1
SELECT avg(value) , sum(value) FROM boxes warehouse_1
SELECT avg(capacity) , sum(capacity) FROM warehouses warehouse_1
SELECT avg(capacity) , sum(capacity) FROM warehouses warehouse_1
SELECT avg(value) , max(value) , CONTENTS FROM boxes GROUP BY CONTENTS warehouse_1
SELECT avg(value) , max(value) , CONTENTS FROM boxes GROUP BY CONTENTS warehouse_1
SELECT CONTENTS FROM boxes ORDER BY value DESC LIMIT 1 warehouse_1
SELECT CONTENTS FROM boxes ORDER BY value DESC LIMIT 1 warehouse_1
SELECT avg(value) FROM boxes warehouse_1
SELECT avg(value) FROM boxes warehouse_1
......

When I delete p_str = p_str.replace("value", "1"), the evaluation returns 100% execution score as expected.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions