Metadata Extraction from Azure SQL: Cloud Database Schema Discovery for Semantic Layers
Learn how to extract metadata from Azure SQL Database and Azure Synapse Analytics for building semantic layers, including system catalog queries and cloud-specific considerations.
Metadata extraction from Azure SQL is the process of programmatically reading schema structures, table definitions, column information, and relationships from Azure SQL Database or Azure Synapse Analytics to build intelligent semantic layers. Azure SQL provides standard SQL Server system catalogs alongside cloud-specific views, enabling comprehensive metadata discovery for modern cloud analytics.
This metadata forms the foundation for AI-powered analytics, allowing semantic layers to understand your cloud database structure and generate optimized T-SQL queries.
Azure SQL Metadata Architecture
Azure SQL Database
Azure SQL Database uses SQL Server's system catalog:
System views (sys schema):
- sys.tables, sys.views
- sys.columns, sys.types
- sys.foreign_keys, sys.foreign_key_columns
- sys.indexes, sys.index_columns
- sys.extended_properties
INFORMATION_SCHEMA:
- Standard SQL interface
- Cross-platform compatibility
- Simplified metadata access
Azure Synapse Analytics
Synapse adds data warehouse-specific views:
- sys.pdw_table_distribution_properties
- sys.pdw_column_distribution_properties
- sys.external_tables, sys.external_data_sources
- sys.workload_management_workload_groups
Codd AI Integrations connect to Azure SQL and Synapse to extract this metadata automatically, building semantic layers optimized for your cloud analytics environment.
Metadata Extraction Queries
Schema and Table Discovery
Discover database structure:
-- List all schemas
SELECT schema_id, name as schema_name
FROM sys.schemas
WHERE name NOT IN ('sys', 'INFORMATION_SCHEMA', 'guest');
-- Discover tables with row counts
SELECT
s.name as schema_name,
t.name as table_name,
t.create_date,
t.modify_date,
p.rows as row_count
FROM sys.tables t
JOIN sys.schemas s ON t.schema_id = s.schema_id
JOIN sys.partitions p ON t.object_id = p.object_id
WHERE p.index_id IN (0, 1) -- Heap or clustered index
AND s.name = 'dbo'
ORDER BY t.name;
-- Views in schema
SELECT
s.name as schema_name,
v.name as view_name,
v.create_date
FROM sys.views v
JOIN sys.schemas s ON v.schema_id = s.schema_id
WHERE s.name = 'dbo';
-- Extended properties (descriptions)
SELECT
OBJECT_NAME(ep.major_id) as object_name,
ep.name as property_name,
ep.value as property_value
FROM sys.extended_properties ep
WHERE ep.class = 1 -- Object or column
AND ep.name = 'MS_Description';
Column Metadata
Extract detailed column information:
-- Column definitions
SELECT
s.name as schema_name,
t.name as table_name,
c.name as column_name,
c.column_id,
ty.name as data_type,
c.max_length,
c.precision,
c.scale,
c.is_nullable,
c.is_identity,
c.is_computed,
ep.value as column_description
FROM sys.columns c
JOIN sys.tables t ON c.object_id = t.object_id
JOIN sys.schemas s ON t.schema_id = s.schema_id
JOIN sys.types ty ON c.user_type_id = ty.user_type_id
LEFT JOIN sys.extended_properties ep
ON ep.major_id = c.object_id
AND ep.minor_id = c.column_id
AND ep.name = 'MS_Description'
WHERE s.name = 'dbo'
ORDER BY t.name, c.column_id;
-- Computed column expressions
SELECT
OBJECT_NAME(object_id) as table_name,
name as column_name,
definition as expression
FROM sys.computed_columns
WHERE OBJECT_SCHEMA_NAME(object_id) = 'dbo';
Constraint Discovery
Extract relationship information:
-- Primary keys
SELECT
s.name as schema_name,
t.name as table_name,
kc.name as constraint_name,
c.name as column_name,
ic.key_ordinal
FROM sys.key_constraints kc
JOIN sys.tables t ON kc.parent_object_id = t.object_id
JOIN sys.schemas s ON t.schema_id = s.schema_id
JOIN sys.index_columns ic ON kc.parent_object_id = ic.object_id
AND kc.unique_index_id = ic.index_id
JOIN sys.columns c ON ic.object_id = c.object_id
AND ic.column_id = c.column_id
WHERE kc.type = 'PK'
AND s.name = 'dbo'
ORDER BY t.name, ic.key_ordinal;
-- Foreign keys with references
SELECT
s.name as schema_name,
t.name as table_name,
fk.name as constraint_name,
COL_NAME(fkc.parent_object_id, fkc.parent_column_id) as column_name,
OBJECT_NAME(fk.referenced_object_id) as referenced_table,
COL_NAME(fkc.referenced_object_id, fkc.referenced_column_id) as referenced_column
FROM sys.foreign_keys fk
JOIN sys.tables t ON fk.parent_object_id = t.object_id
JOIN sys.schemas s ON t.schema_id = s.schema_id
JOIN sys.foreign_key_columns fkc ON fk.object_id = fkc.constraint_object_id
WHERE s.name = 'dbo'
ORDER BY t.name, fk.name;
-- Unique constraints
SELECT
s.name as schema_name,
t.name as table_name,
kc.name as constraint_name,
STRING_AGG(c.name, ', ') WITHIN GROUP (ORDER BY ic.key_ordinal) as columns
FROM sys.key_constraints kc
JOIN sys.tables t ON kc.parent_object_id = t.object_id
JOIN sys.schemas s ON t.schema_id = s.schema_id
JOIN sys.index_columns ic ON kc.parent_object_id = ic.object_id
AND kc.unique_index_id = ic.index_id
JOIN sys.columns c ON ic.object_id = c.object_id
AND ic.column_id = c.column_id
WHERE kc.type = 'UQ'
AND s.name = 'dbo'
GROUP BY s.name, t.name, kc.name;
Azure Synapse-Specific Metadata
Extract Synapse distribution information:
-- Distribution type per table
SELECT
s.name as schema_name,
t.name as table_name,
tdp.distribution_policy_desc as distribution_type
FROM sys.tables t
JOIN sys.schemas s ON t.schema_id = s.schema_id
JOIN sys.pdw_table_distribution_properties tdp
ON t.object_id = tdp.object_id
WHERE s.name = 'dbo';
-- Distribution column
SELECT
s.name as schema_name,
t.name as table_name,
c.name as distribution_column
FROM sys.tables t
JOIN sys.schemas s ON t.schema_id = s.schema_id
JOIN sys.pdw_column_distribution_properties cdp
ON t.object_id = cdp.object_id
JOIN sys.columns c
ON cdp.object_id = c.object_id
AND cdp.column_id = c.column_id
WHERE cdp.distribution_ordinal = 1
AND s.name = 'dbo';
-- External tables (PolyBase)
SELECT
s.name as schema_name,
et.name as table_name,
eds.name as data_source_name,
et.location
FROM sys.external_tables et
JOIN sys.schemas s ON et.schema_id = s.schema_id
JOIN sys.external_data_sources eds
ON et.data_source_id = eds.data_source_id;
Building Semantic Models
Table to Entity Mapping
Transform Azure SQL metadata into semantic concepts:
Tables become entities:
- Primary key defines entity identity
- Extended properties provide descriptions
- Distribution strategy informs query optimization
Columns map to attributes:
- nvarchar, varchar become text dimensions
- int, bigint, decimal become measures
- date, datetime2 become time dimensions
- bit becomes boolean dimension
Relationship Mapping
Leverage SQL Server's constraint system:
entity: Orders
source: dbo.Orders
description: "Customer order transactions"
relationships:
- name: customer
type: many_to_one
target: Customers
join:
source_column: CustomerID
target_column: CustomerID
Synapse Optimization
Use distribution metadata for query optimization:
entity: SalesTransactions
source: dbo.FactSales
distribution: HASH(CustomerID)
# Semantic layer can optimize joins
# with tables distributed on same key
Implementation Approach
Step 1: Configure Authentication
Set up Azure SQL connectivity:
import pyodbc
# Using SQL authentication
conn_str = (
"Driver={ODBC Driver 18 for SQL Server};"
"Server=server.database.windows.net;"
"Database=analytics;"
"Uid=metadata_reader;"
"Pwd=secure_password;"
"Encrypt=yes;"
"TrustServerCertificate=no;"
)
# Using Azure AD service principal
conn_str = (
"Driver={ODBC Driver 18 for SQL Server};"
"Server=server.database.windows.net;"
"Database=analytics;"
"Authentication=ActiveDirectoryServicePrincipal;"
f"UID={client_id};"
f"PWD={client_secret};"
)
conn = pyodbc.connect(conn_str)
Step 2: Create Extraction User
Configure appropriate permissions:
-- Create extraction user
CREATE USER metadata_reader FROM LOGIN metadata_reader;
-- Grant read permissions
GRANT VIEW DEFINITION ON SCHEMA::dbo TO metadata_reader;
GRANT SELECT ON SCHEMA::dbo TO metadata_reader;
-- For extended properties
GRANT VIEW ANY DEFINITION TO metadata_reader;
Step 3: Extract Comprehensive Metadata
Build complete schema picture:
def extract_tables(conn, schema='dbo'):
"""Extract table metadata with descriptions."""
query = """
SELECT
t.name as table_name,
p.rows as row_count,
ep.value as description
FROM sys.tables t
JOIN sys.schemas s ON t.schema_id = s.schema_id
JOIN sys.partitions p ON t.object_id = p.object_id
LEFT JOIN sys.extended_properties ep
ON ep.major_id = t.object_id
AND ep.minor_id = 0
AND ep.name = 'MS_Description'
WHERE s.name = ?
AND p.index_id IN (0, 1)
"""
cursor = conn.cursor()
cursor.execute(query, schema)
return cursor.fetchall()
Step 4: Handle Azure-Specific Features
Account for cloud capabilities:
Temporal tables:
SELECT
t.name as table_name,
t.temporal_type_desc,
ht.name as history_table
FROM sys.tables t
LEFT JOIN sys.tables ht
ON t.history_table_id = ht.object_id
WHERE t.temporal_type <> 0;
Column encryption:
SELECT
c.name as column_name,
c.encryption_type_desc
FROM sys.columns c
WHERE c.encryption_type IS NOT NULL;
Step 5: Generate Semantic Model
Transform to semantic definitions:
entity: Customers
source: dbo.Customers
description: "Customer master data with contact information"
primary_key: CustomerID
dimensions:
- name: customer_name
source_column: CustomerName
type: text
- name: region
source_column: Region
type: categorical
- name: signup_date
source_column: CreatedDate
type: time
grain: day
measures:
- name: customer_count
calculation: COUNT(DISTINCT CustomerID)
relationships:
- name: orders
target: Orders
type: one_to_many
join_column: CustomerID
Common Challenges
Multi-Database Environments
Azure SQL databases are isolated:
Solutions:
- Extract from each database separately
- Aggregate metadata centrally
- Use elastic database queries for cross-db views
- Document database relationships
Synapse Serverless Pools
Dynamic schema with external data:
Solutions:
- Query sys.external_tables for registered objects
- Handle schema inference for parquet files
- Document data source configurations
Performance Tier Limitations
Smaller tiers have resource constraints:
Solutions:
- Schedule extraction during low-usage periods
- Limit query complexity
- Cache metadata locally
- Use incremental extraction
The Value of Azure SQL Metadata Extraction
Automated metadata extraction from Azure SQL delivers:
Cloud-native visibility: Understand your Azure database structure.
Power BI integration: Align semantic layer with Microsoft ecosystem.
Synapse optimization: Leverage distribution metadata for performance.
Unified governance: Connect with Azure Purview and Microsoft governance tools.
Organizations that automate Azure SQL metadata extraction build semantic layers that integrate seamlessly with the Microsoft ecosystem - enabling AI-powered analytics across Azure cloud environments.
Questions
Azure SQL Database uses standard SQL Server system views including sys.tables, sys.columns, sys.foreign_keys, and INFORMATION_SCHEMA views. These provide comprehensive metadata about tables, columns, relationships, and constraints identical to on-premises SQL Server.