Natural Language to SQL: How Text-to-SQL Works

Natural language to SQL (text-to-SQL) translates human questions into database queries. Learn how it works, its limitations, and why semantic approaches are more reliable.

3 min read·

Natural language to SQL (also called text-to-SQL or NL2SQL) is a technology that translates questions asked in natural language into SQL queries that can be executed against databases. When you ask "What were sales last month?", a text-to-SQL system generates something like:

SELECT SUM(amount)
FROM sales
WHERE sale_date >= '2024-01-01'
AND sale_date < '2024-02-01'

This translation is what powers many conversational analytics tools.

How Text-to-SQL Works

Step 1: Parse the Question

The system breaks down the natural language:

  • Subject: What is being asked about? ("sales")
  • Action: What operation? ("sum")
  • Filter: What constraints? ("last month")
  • Output: What format? (single value)

Step 2: Schema Mapping

Map natural language terms to database elements:

  • "Sales" → sales table
  • "Amount" → amount column (guessed from context)
  • "Last month" → date calculation

Step 3: Query Construction

Build SQL that implements the intent:

  • SELECT clause with appropriate aggregation
  • FROM clause with correct tables
  • WHERE clause with filters
  • JOINs if multiple tables needed

Step 4: Validation

Some systems validate the generated query:

  • Is the SQL syntactically correct?
  • Do the referenced tables/columns exist?
  • Are the joins valid?

Step 5: Execution and Response

Execute the SQL and format results for the user.

Why Text-to-SQL Is Hard

Ambiguity

"Show me sales" could mean:

  • Total sales amount
  • Number of sales
  • Sales records
  • Sales by region
  • Sales for a specific period

The AI must pick an interpretation.

Schema Complexity

Real databases have:

  • Hundreds of tables
  • Unclear naming conventions
  • Multiple possible join paths
  • Undocumented business logic

Business Logic

SQL alone doesn't encode business rules:

  • What's "active" vs "churned"?
  • How is revenue recognized?
  • What exclusions apply?

Language Variation

Users ask the same question many ways:

  • "Revenue last quarter"
  • "Q4 sales"
  • "What did we make in the last three months?"
  • "Show income for Oct-Dec"

Accuracy Challenges

Text-to-SQL accuracy is measured on benchmarks like Spider and WikiSQL. State-of-the-art systems achieve:

  • 60-70% on complex multi-table queries
  • 75-85% on simpler single-table queries
  • Lower on real-world business schemas

Even 80% accuracy means 1 in 5 queries is wrong - often without obvious indication.

The Semantic Alternative

Instead of generating SQL from scratch, semantic approaches:

  1. Map questions to semantic concepts: "Revenue" → defined Revenue metric
  2. Query the semantic layer: Use the metric's certified definition
  3. Let the semantic layer generate SQL: Correct SQL is determined by the metric definition

This approach achieves 95%+ accuracy for supported queries because the AI isn't guessing - it's using explicit definitions.

When Text-to-SQL Works

  • Simple schemas with clear naming
  • Single-table queries
  • Basic aggregations and filters
  • Users who can validate results

When Text-to-SQL Struggles

  • Complex business schemas
  • Multi-table joins
  • Business logic and rules
  • High-stakes decisions
  • Users who can't validate

For trustworthy analytics, semantic approaches are generally preferable to raw text-to-SQL.

Questions

Academic benchmarks show 70-85% accuracy on standardized tests. Real-world performance on complex business schemas is often lower (50-70%). Accuracy varies significantly based on query complexity, schema quality, and how well the AI understands the domain.

Related