School of Computing Science
New CS research strengthens data systems with advanced SQL verification tool
VeriEQL, a new tool developed by researchers in SFU’s School of Computing Science and collaborators at the University of Michigan, enhances the semantics equivalence verification of complex SQL queries, paving the way for more reliable and secure data-driven systems.
The tool can formally prove when two SQL queries are equivalent up to a given bound or automatically generate counterexamples when they are not. It also accurately models the integrity constraints found in real databases, such as primary keys, foreign keys, and check conditions, which are essential for maintaining data integrity in modern applications.
Using an innovative Satisfiability-Modulo-Theories (SMT) encoding, VeriEQL can reason about advanced SQL features that commonly appear in industry and academic settings, including ORDER BY, CASE WHEN, NULLs, aggregate functions, and common table expressions. This allows the tool to verify a much broader and more realistic class of queries than previously possible.
A major leap forward for database reliability
In a comprehensive evaluation, the team tested VeriEQL on more than 24,000 pairs of SQL queries drawn from prior research and LeetCode. The results were striking: VeriEQL successfully solved 77% of the benchmarks compared to less than 2% solved by the previous state-of-the-art equivalence checking tools.
During this evaluation, the tool uncovered previously unknown bugs in widely used database engines and query optimizers such as MySQL and Apache Calcite, demonstrating its potential to strengthen the reliability of systems that underpin business analytics, financial tools, healthcare platforms, and countless data-driven applications.
“When database engines are implemented, bugs can occur, and as we show in the paper, we actually found some,” says Yuepeng, assistant professor, SFU Computing Science. “This part of the system is critical because it directly affects query results. Users issue queries because they want accurate information, and they don’t want any mistakes.”
VeriEQL addresses this challenge by helping developers verify that query rewrite and transformation rules used by database optimizers are truly correct. “If two queries are supposed to be equivalent but are not, we can generate counterexamples to show exactly where the problem is,” Yuepeng explains. “That feedback helps developers improve their systems and prevent errors from reaching users.”
Beyond testing: improving confidence in correctness
The researchers also point to the limitations of traditional testing-based approaches, particularly in online coding platforms such as LeetCode, where SQL queries are commonly used in practice problems.
“Most platforms rely on testing with a finite number of test cases, maybe hundreds or thousands,” Yuepeng says. “If a query passes all the tests, it’s considered correct. But that’s not enough. A finite set of tests always has limited coverage, and it’s hard to be truly confident in correctness.”
VeriEQL offers a fundamentally different approach. Instead of relying solely on tests, it can formally prove that a user-submitted SQL query is equivalent to a reference solution up to a given bound, providing much stronger guarantees.
In their evaluation, they found many cases on LeetCode where submissions that were previously marked as correct turned out to be incorrect when checked using their approach. With VeriEQL, we can say with much higher confidence whether a query is truly correct.
Transforming feedback in the classroom
The same capability has significant implications for computer science education. In database courses, instructors and teaching assistants often spend considerable time manually checking SQL assignments or designing test cases that may still miss edge cases.
“Instructors want to know whether a student’s query is equivalent to the standard solution,” Yang, PhD student and lead author, says. “We can do that automatically. If the query is incorrect, we can provide a concrete counterexample, showing a specific database where the student’s query produces a different result.”
This approach not only improves the quality of feedback students receive but also reduces the manual grading workload for instructors and teaching assistants. “It makes grading easier and more reliable, while helping students better understand their mistakes,” He adds.
Real-world impact across industry, education, and research
The implications of this research extend far beyond database theory:
Database system development: VeriEQL can help developers verify that query optimizations and rewrites are correct, reducing the risk of errors in production systems.
Computer science education: Instructors can use VeriEQL to check if the student’s SQL query is equivalent to the standard solution, auto-grade student solutions, and generate counterexamples when SQL queries are incorrect, which improves the quality of feedback and learning outcomes.
Text-to-SQL evaluation: The extension of VeriEQL can verify the equivalence or identify differences between LLM-generated SQL queries and ground truths for evaluating text-to-SQL techniques.
Future research: The team’s new encoding and formal semantics open the door to verifying even more expressive SQL features and broader SQL dialects.
Advancing SFU’s leadership in programming languages and data systems research
The development of VeriEQL highlights SFU’s growing strength in programming languages, data management, and software verification research, areas that are increasingly essential in a world driven by data.
This work was published in the Proceedings of the ACM on Programming Languages (OOPSLA 2024) and won the OOPSLA Distinguished Paper Award. It continues to gain attention in the programming languages and databases communities for its innovative approach and real-world impact.
About the research
Authors:
- Yang He & Yuepeng Wang— Simon Fraser University
- Pinhan Zhao & Xinyu Wang — University of Michigan
Paper: VeriEQL: Bounded Equivalence Verification for Complex SQL Queries with Integrity Constraints.