This repository provides a comprehensive collection of research papers, benchmarks, and open-source projects on large language model-based text-to-SQL (LLM-based Text-to-SQL). It includes all the contents from our survey paper π"Next-Generation Database Interfaces: A Survey of LLM-based Text-to-SQL" and will be continuously updated to incorporate the up-to-date advances and notable contributions from the text-to-SQL community. Stay tuned!!
π€ You are vey welcome to contribute to this repository by launching an issue or a pull request. If you find any missing resources or come across interesting new research works, please donβt hesitate to open an issue or submit a PR!
π« Contact us via emails: zijin[dot]hong[at]connect[dot]polyu[dot]hk
π Please cite our paper if you find our survey or repository helpful!
- [2025-09-21] π₯π₯ Finished building the benchmarks, datasets, and taxonomy for this repository.
- [2025-09-14] π₯π₯ Repository launched based on our survey paper to keep track of recent progress in LLM-based text-to-SQL.
- [2025-09-02] ππ Our paper "Next-Generation Database Interfaces: A Survey of LLM-based Text-to-SQL" has been accepted by IEEE Transactions on Knowledge and Data Engineering (TKDE)!
- [2025-05-01] ππ Our paper "Struture-Guided Large Language Models for Text-to-SQL Generation" has been accepted by International Conference of Machine Learning (ICML)!
A user asks a question about football leagues. The LLM takes this question together with the schema of the corresponding database as input and generates an SQL query as output. The generated SQL is then executed on the database, retrieving the result "The 5 leagues with the highest matches", which answers the user's question.
Before 2023, the focus is on a selection of representative traditional studies. However, from 2023 onward, the emphasis shifts to the rapid advancements driven by LLMs, marking a significant acceleration in the field.
Next-Generation Database Interfaces: A Survey of LLM-based Text-to-SQL [Paper] [Code]
A Survey on Employing Large Language Models for Text-to-SQL Tasks [Paper]
A Survey of Text-to-SQL in the Era of LLMs: Where are We, and Where are We Going? [Paper]
Natural Language Interfaces for Tabular Data Querying and Visualization: A Survey [Paper]
Large Language Model Enhanced Text-to-SQL Generation: A Survey [Paper]
A Survey on Deep Learning Approaches for Text-to-SQL [Paper]
Natural Language Interfaces for Databases with Deep Learning [Paper]
A Survey on Text-to-SQL Parsing: Concepts, Methods, and Future Directions [Paper]
Recent Advances in Text-to-SQL: A Survey of What We Have and What We Expect [Paper]
In the era of LLMs, two benchmarks and their variants/extensions are widely recognized for evaluating text-to-SQL capabilities. We will continually update the top five methods on each benchmark to showcase the latest advances in the text-to-SQL community. These benchmarks, along with other text-to-SQL dataset papers, are listed in the datasets section below.
BIRD - A Big Bench for Large-Scale Database Grounded Text-to-SQL
Method/Model | Dev EX (%) | Test EX (%) | Paper/Code | Date |
---|---|---|---|---|
74.32 | 77.53 | [Proprietary] | 2025-07-14 | |
75.36 | 77.14 | [Paper] | 2025-03-11 | |
74.90 | 76.02 | [Paper] | 2025-04-16 | |
74.25 | 75.74 | [Report] [Code] | 2025-09-22 | |
74.12 | 75.74 | [Report] | 2025-05-30 | |
73.50 | 75.63 | [Report] [Code] | 2025-02-27 | |
73.34 | 75.63 | [Paper] [Code] | 2024-12-17 |
Spider1.0 - Semantic Parsing and Text-to-SQL Challenge
Method/Model | Dev EX (%) | Test EX (%) | Paper/Code | Date |
---|---|---|---|---|
- | 91.2 | [Report] | 2023-11-02 | |
82.4 | 86.6 | [Paper] [Code] | 2023-08-20 | |
74.2 | 85.3 | [Paper] [Code] | 2023-04-21 | |
81.8 | 82.3 | [Paper] [Code] | 2023-06-01 | |
84.1 | 79.9 | [Paper] [Code] | 2023-02-27 |
Spider2.0 - Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows
Method/Model | Snow Score | Lite Score | Paper/Code | Date |
---|---|---|---|---|
- | 44.5 | [Paper] | 2025-08-07 | |
37.11 | 37.84 | [Paper] [Code] | 2025-05-22 | |
- | 33.09 | [Paper] [Code] | 2025-07-10 | |
- | 33.09 | [Paper] [Code] | 2025-04-27 | |
- | 28.52 | [Paper] [Code] | 2025-03-16 |
BIRD-CRITIC - Can LLMs Fix User Issues in Real-World Database Applications?
Model | SR (%) | Date |
---|---|---|
ByteBrain-Agent | 43.33 | 2025-06-10 |
GPT-5-High | 34.96 | 2025-09-04 |
grok-4 | 33.68 | 2025-07-18 |
DeepSeek-R1 | 33.51 | 2025-04-20 |
o3-Mini | 33.33 | 2025-04-20 |
BIRD-INTERACT - Re-imagining Text-to-SQL Evaluation via Lens of Dynamic Interactions
Model/Method | Reward | Date |
---|---|---|
Gemini-2.5-Pro | 20.92 | 2025-08-22 |
o3-Mini | 20.27 | 2025-08-22 |
Claude-Sonnet-4 | 18.35 | 2025-08-22 |
Qwen-3-Coder-480B | 17.75 | 2025-08-22 |
DeepSeek-V3 | 15.15 | 2025-08-22 |
We categorize the datasets into Original Datasets and Post-annotated Datasets based on whether they were released with the original dataset (questionβSQL pairs) and databases, or were developed by adapting existing datasets and databases with special settings. The Post-annotated Datasets rely on the databases from Spider 1.0. For each original dataset, we list its characteristics, number of examples, and number of databases under the dataset title.
BIRD-CRITIC | SWE-SQL: Illuminating LLM Pathways to Solve User SQL Issues in Real-World Applications [Paper] [Code] [Dataset]
Knowledge-augmented, Long-context; #Example: 600; #DB: 95Spider2.0 | Spider 2.0: Evaluating Language Models on Real-world Enterprise Text-to-SQL Workflows [Paper] [Code] [Dataset]
Knowledge-augmented, Long-context; #Example: 632; #DB: 213BULL | FinSQL: Model-Agnostic LLMs-based Text-to-SQL Framework for Financial Analysis [Paper] [Code] [Dataset]
Knowledge-augmented, Long-context; #Example: 4,966; #DB: 3BIRD | Can LLM Already Serve as A Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQLs [Paper] [Code] [Dataset]
Cross-domain, Knowledge-augmented; #Example: 12,751; #DB: 95KaggleDBQA | KaggleDBQA: Realistic Evaluation of Text-to-SQL Parsers [Paper] [Code] [Dataset]
Cross-domain; #Example: 272; #DB: 8DuSQL | DuSQL: A Large-Scale and Pragmatic Chinese Text-to-SQL Dataset [Paper] [Dataset]
Cross-domain, Cross-lingual; #Example: 23,797; #DB: 200SQUALL | On the Potential of Lexico-logical Alignments for Semantic Parsing to SQL Queries [Paper] [Code]
Cross-domain, Cross-lingual; #Example: 11,468; #DB: 1,679CoSQL | CoSQL: A Conversational Text-to-SQL Challenge Towards Cross-Domain Natural Language Interfaces to Databases [Paper] [Code] [Dataset]
Cross-domain, Context-dependent; #Example: 15,598; #DB: 200Spider | Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task [Paper] [Code] [Dataset]
Cross-domain; #Example: 10,181; #DB: 200WikiSQL | Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning [Paper] [Code] [Dataset]
Cross-domain; #Example: 80,654; #DB: 26,521
Dr. Spider | Dr.Spider: A Diagnostic Evaluation Benchmark towards Text-to-SQL Robustness [Paper] [Code]
Robustness; Perturbations in DB, query and SQLADVETA | Towards Robustness of Text-to-SQL Models Against Natural and Realistic Adversarial Table Perturbation [Paper] [Code] [Dataset]
Robustness; Adversarial table perturbationSpider-SS&CG | Measuring and Improving Compositional Generalization in Text-to-SQL via Component Alignment [Paper] [Code] [Dataset]
Context-dependent; Splitting example into sub-examplesSpider-DK | Exploring Underexplored Limitations of Cross-Domain Text-to-SQL Generalization [Paper] [Code]
Knowledge-augmented; Adding domain knowledgeSpider-SYN | Towards Robustness of Text-to-SQL Models against Synonym Substitution [Paper] [Code]
Knowledge-augmented; Adding domain knowledgeSpider-Vietnames | A Pilot Study of Text-to-SQL Semantic Parsing for Vietnamese [Paper] [Code]
Cross-lingual; Vietnamese version of SpiderSpider-Realistic | Structure-Grounded Pretraining for Text-to-SQL [Paper] [Dataset]
Robustness; Removing column names in questionCSpider | A Pilot Study for Chinese SQL Semantic Parsing [Paper] [Code]
Cross-lingual; Chinese version of SpiderSParC | SParC: Cross-Domain Semantic Parsing in Context [Paper] [Code] [Dataset]
Context-dependent; Annotate conversational contents
The implementation of recent LLM-based text-to-SQL methods primarily relies on in-context learning and fine-tuning, enabled by the release of both powerful proprietary and well-architected open-source LLMs. A detailed categorization of text-to-SQL methods can be found in our paper, and subsequent latest research papers will be continually updated and aligned with this taxonomy.
LinkAlign: Scalable Schema Linking for Real-World Large-Scale Multi-Database Text-to-SQL [Paper] [Code]
ReFoRCE: A Text-to-SQL Agent with Self-Refinement, Consensus Enforcement, and Column Exploration [Paper] [Code]
CSC-SQL: Corrective Self-Consistency in Text-to-SQL via Reinforcement Learning [Paper] [Code]
SAFE-SQL: Self-Augmented In-Context Learning with Fine-grained Example Selection for Text-to-SQL [Paper]
Gen-SQL: Efficient Text-to-SQL by Bridging Natural Language Question and Database Schema with Pseudo-Schema [Paper] [Code]
In-Context Reinforcement Learning based Retrieval-Augmented Generation for Text-to-SQL [Paper]
Spider 2.0: Evaluating Language Models on Real-world Enterprise Text-to-SQL Workflows [Paper] [Code]
RSL-SQL: Robust Schema Linking in Text-to-SQL Generation [Paper] [Code]
CHASE-SQL: Multi-Path Reasoning and Preference Optimized Candidate Selection in Text-to-SQL [Paper]
E-SQL: Direct Schema Linking via Question Enrichment in Text-to-SQL [Paper] [Code]
The Death of Schema Linking? Text-to-SQL in the Age of Well-Reasoned Language Models [Paper]
The Dawn of Natural Language to SQL: Are We Fully Ready? [Paper]
CHESS: Contextual Harnessing for Efficient SQL Synthesis [Paper] [Code]
MCS-SQL: Leveraging Multiple Prompts and Multiple-Choice Selection For Text-to-SQL Generation [Paper]
Before Generation, Align it! A Novel and Effective Strategy for Mitigating Hallucinations in Text-to-SQL Generation [Paper] [Code]
Dubo-SQL: Diverse Retrieval-Augmented Generation and Fine Tuning for Text-to-SQL [Paper] [Code]
MAGIC: Generating Self-Correction Guideline for In-Context Text-to-SQL [Paper] [Code]
You Only Read Once (YORO): Learning to Internalize Database Knowledge for Text-to-SQL [Paper]
PURPLE: Making a Large Language Model a Better SQL Writer [Paper]
PET-SQL: A Prompt-Enhanced Two-Round Refinement of Text-to-SQL with Cross-consistency [Paper] [Code]
βΒ³: "This is My SQL, Are You With Me?" A Consensus-Based Multi-Agent System for Text-to-SQL Tasks [Paper] [Code]
MetaSQL: A Generate-then-Rank Framework for Natural Language to SQL Translation [Paper] [Code]
Middleware for LLMs: Tools Are Instrumental for Language Agents in Complex Environments [Paper] [Code]
SQL-CRAFT: Text-to-SQL through Interactive Refinement and Enhanced Reasoning [Paper]
Structure-Guided Large Language Models for Text-to-SQL Generation [Paper]
Knowledge-to-SQL: Enhancing SQL Generation with Data Expert LLM [Paper] [Code]
Improving Demonstration Diversity by Human-Free Fusing for Text-to-SQL [Paper] [Code]
Decomposition for Enhancing Attention: Improving LLM-based Text-to-SQL through Workflow Paradigm [Paper] [Code]
MAC-SQL: A Multi-Agent Collaborative Framework for Text-to-SQL [Paper] [Code]
ACT-SQL: In-Context Learning for Text-to-SQL with Automatically-Generated Chain-of-Thought [Paper] [Code]
Selective Demonstrations for Cross-domain Text-to-SQL [Paper] [Code]
Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation [Paper] [Code]
C3: Zero-shot Text-to-SQL with ChatGPT [Paper] [Code]
Retrieval-augmented GPT-3.5-based Text-to-SQL Framework with Sample-aware Prompting and Dynamic Revision Chain [Paper]
SQL-PaLM: Improved Large Language Model Adaptation for Text-to-SQL [Paper]
Exploring Chain of Thought Style Prompting for Text-to-SQL [Paper]
Enhancing Text-to-SQL Capabilities of Large Language Models: A Study on Prompt Design Strategies [Paper]
StructGPT: A General Framework for Large Language Model to Reason over Structured Data [Paper] [Code]
DIN-SQL: Decomposed In-Context Learning of Text-to-SQL with Self-Correction [Paper] [Code]
Prompting GPT-3.5 for Text-to-SQL with De-semanticization and Skeleton Retrieval [Paper]
Teaching Large Language Models to Self-Debug [Paper]
LEVER: Learning to Verify Language-to-Code Generation with Execution [Paper] [Code]
Coder Reviewer Reranking for Code Generation [Paper]
Natural Language to Code Translation with Execution [Paper] [Code]
SHARE: An SLM-based Hierarchical Action CorREction Assistant for Text-to-SQL [Paper] [Code]
ROUTE: Robust Multitask Tuning and Collaboration for Text-to-SQL [Paper] [Code]
A Preview of XiYan-SQL: A Multi-Generator Ensemble Framework for Text-to-SQL [Paper] [Code]
MSc-SQL: Multi-Sample Critiquing Small Language Models For Text-To-SQL Translation [Paper] [Code]
Knapsack Optimization-based Schema Linking for LLM-based Text-to-SQL Generation [Paper]
DTS-SQL: Decomposed Text-to-SQL with Small Large Language Models [Paper] [Code]
MAC-SQL: A Multi-Agent Collaborative Framework for Text-to-SQL [Paper] [Code]
The Death of Schema Linking? Text-to-SQL in the Age of Well-Reasoned Language Models [Paper]
Dubo-SQL: Diverse Retrieval-Augmented Generation and Fine Tuning for Text-to-SQL [Paper] [Code]
StructLM: Towards Building Generalist Models for Structured Knowledge Grounding [Paper] [Code]
CodeS: Towards Building Open-source Language Models for Text-to-SQL [Paper] [Code]
Symbol-LLM: Towards Foundational Symbol-centric Interface For Large Language Models [Paper] [Code]
Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation [Paper] [Code]
CLLMs: Consistency Large Language Models [Paper] [Code]
@article{hong2025next,
title={Next-generation database interfaces: A survey of llm-based text-to-sql},
author={Hong, Zijin and Yuan, Zheng and Zhang, Qinggang and Chen, Hao and Dong, Junnan and Huang, Feiran and Huang, Xiao},
journal={IEEE Transactions on Knowledge and Data Engineering},
year={2025}
}