Metadata Extraction from Teradata: Enterprise Data Warehouse Schema Discovery
Learn how to extract metadata from Teradata for building semantic layers, including Data Dictionary queries, schema discovery, and relationship mapping for enterprise AI-powered analytics.
Metadata extraction from Teradata is the process of programmatically reading schema structures, table definitions, column information, and relationships from your Teradata data warehouse to build intelligent semantic layers. Teradata's comprehensive Data Dictionary provides detailed metadata through DBC views, enabling sophisticated semantic model generation for enterprise analytics at scale.
This metadata forms the foundation for AI-powered analytics, allowing semantic layers to understand complex enterprise schemas and generate optimized Teradata SQL.
Teradata Data Dictionary Architecture
DBC Database Structure
Teradata stores all metadata in the DBC database:
Core metadata views:
- DBC.DatabasesV - database information
- DBC.TablesV - tables, views, macros
- DBC.ColumnsV - column definitions
- DBC.IndicesV - index information
- DBC.All_RI_ParentsV - foreign key relationships
Additional metadata:
- DBC.FunctionsV - user-defined functions
- DBC.RolesV - security roles
- DBC.AccessRightsV - permissions
- DBC.PartitioningConstraintsV - partitioning
Metadata Richness
Teradata stores extensive metadata:
- Database hierarchy and spaces
- Table definitions and storage
- Column types with full attributes
- Primary Index configuration
- Secondary indexes
- Foreign key relationships
- Partition elimination columns
- Comments and documentation
Codd AI Integrations connect to Teradata's Data Dictionary to extract this comprehensive metadata, building semantic layers that understand enterprise-scale data warehouses.
Metadata Extraction Queries
Database and Table Discovery
Discover database structure:
-- List accessible databases
SELECT
DatabaseName,
OwnerName,
CommentString as Description,
PermSpace,
SpoolSpace
FROM DBC.DatabasesV
WHERE DatabaseName NOT IN ('DBC', 'SystemFE', 'SYSLIB')
ORDER BY DatabaseName;
-- Discover tables with details
SELECT
t.DatabaseName,
t.TableName,
t.TableKind, -- 'T' = table, 'V' = view
t.CommentString as TableComment,
t.CreatorName,
t.CreateTimeStamp
FROM DBC.TablesV t
WHERE t.DatabaseName = 'ANALYTICS'
AND t.TableKind IN ('T', 'V', 'O') -- Tables, Views, No-PI tables
ORDER BY t.TableName;
-- Table sizes
SELECT
DatabaseName,
TableName,
SUM(CurrentPerm) as CurrentPerm_Bytes,
SUM(PeakPerm) as PeakPerm_Bytes
FROM DBC.TableSizeV
WHERE DatabaseName = 'ANALYTICS'
GROUP BY DatabaseName, TableName;
-- Row counts from statistics
SELECT
DatabaseName,
TableName,
RowCount
FROM DBC.TableStatsV
WHERE DatabaseName = 'ANALYTICS';
Column Metadata
Extract detailed column information:
-- Column definitions
SELECT
c.DatabaseName,
c.TableName,
c.ColumnName,
c.ColumnId,
c.ColumnType,
c.ColumnFormat,
c.ColumnLength,
c.DecimalTotalDigits,
c.DecimalFractionalDigits,
c.Nullable,
c.DefaultValue,
c.CommentString as ColumnComment
FROM DBC.ColumnsV c
WHERE c.DatabaseName = 'ANALYTICS'
ORDER BY c.TableName, c.ColumnId;
-- Column type mapping
SELECT
ColumnType,
CASE ColumnType
WHEN 'CV' THEN 'VARCHAR'
WHEN 'CF' THEN 'CHAR'
WHEN 'I' THEN 'INTEGER'
WHEN 'I1' THEN 'BYTEINT'
WHEN 'I2' THEN 'SMALLINT'
WHEN 'I8' THEN 'BIGINT'
WHEN 'D' THEN 'DECIMAL'
WHEN 'F' THEN 'FLOAT'
WHEN 'DA' THEN 'DATE'
WHEN 'TS' THEN 'TIMESTAMP'
WHEN 'AT' THEN 'TIME'
ELSE ColumnType
END as TypeName
FROM DBC.ColumnsV
WHERE DatabaseName = 'ANALYTICS';
Index and Primary Index Discovery
Extract index information:
-- Primary Index columns
SELECT
i.DatabaseName,
i.TableName,
i.ColumnName,
i.ColumnPosition,
i.IndexType,
CASE i.IndexType
WHEN 'P' THEN 'Primary Index'
WHEN 'K' THEN 'Primary Key'
WHEN 'S' THEN 'Secondary Index'
WHEN 'U' THEN 'Unique Secondary Index'
ELSE i.IndexType
END as IndexTypeDesc
FROM DBC.IndicesV i
WHERE i.DatabaseName = 'ANALYTICS'
AND i.IndexType IN ('P', 'K')
ORDER BY i.TableName, i.ColumnPosition;
-- All indexes
SELECT
DatabaseName,
TableName,
IndexName,
IndexType,
UniqueFlag
FROM DBC.IndicesV
WHERE DatabaseName = 'ANALYTICS'
GROUP BY 1, 2, 3, 4, 5;
Relationship Discovery
Extract foreign key relationships:
-- Foreign key constraints
SELECT
ChildDB as child_database,
ChildTable as child_table,
ChildKeyColumn as child_column,
ParentDB as parent_database,
ParentTable as parent_table,
ParentKeyColumn as parent_column,
IndexName as constraint_name
FROM DBC.All_RI_ParentsV
WHERE ChildDB = 'ANALYTICS'
ORDER BY ChildTable, IndexName;
-- Full relationship details
SELECT
rp.ChildDB,
rp.ChildTable,
rp.ParentDB,
rp.ParentTable,
LISTAGG(rp.ChildKeyColumn, ', ')
WITHIN GROUP (ORDER BY rp.ColumnId) as ChildColumns,
LISTAGG(rp.ParentKeyColumn, ', ')
WITHIN GROUP (ORDER BY rp.ColumnId) as ParentColumns
FROM DBC.All_RI_ParentsV rp
WHERE rp.ChildDB = 'ANALYTICS'
GROUP BY rp.ChildDB, rp.ChildTable, rp.ParentDB,
rp.ParentTable, rp.IndexName;
Partition Information
Handle partitioned tables:
-- Partitioned tables
SELECT
t.DatabaseName,
t.TableName,
pc.ColumnPartitioningLevel,
pc.PartitioningExpression
FROM DBC.TablesV t
JOIN DBC.PartitioningConstraintsV pc
ON t.DatabaseName = pc.DatabaseName
AND t.TableName = pc.TableName
WHERE t.DatabaseName = 'ANALYTICS';
-- Partition elimination columns
SELECT
DatabaseName,
TableName,
ColumnName as PartitionColumn
FROM DBC.ColumnsV
WHERE PartitioningColumn = 'Y'
AND DatabaseName = 'ANALYTICS';
Building Semantic Models
Table to Entity Mapping
Transform Teradata metadata into semantic concepts:
Tables become entities:
- Primary Index informs join optimization
- Comments provide descriptions
- Partition columns indicate time dimensions
Columns map to attributes:
- CV, CF (VARCHAR, CHAR) become text dimensions
- I, I2, I8, D, F become measures
- DA, TS become time dimensions
- Column format hints at display preferences
Relationship Mapping
Leverage Teradata's referential integrity:
entity: Orders
source: ANALYTICS.ORDERS
description: "Customer order transactions"
primary_index: CUSTOMER_ID
relationships:
- name: customer
type: many_to_one
target: Customers
# From RI constraint
join:
source_column: CUSTOMER_ID
target_column: CUSTOMER_ID
Primary Index Optimization
Use PI information for query optimization:
entity: SalesTransactions
source: ANALYTICS.SALES_FACT
primary_index: STORE_ID
partition_column: SALE_DATE
# Semantic layer can optimize joins
# with tables sharing primary index column
Implementation Approach
Step 1: Configure Connection
Set up Teradata connectivity:
import teradatasql
conn = teradatasql.connect(
host='teradata-server',
user='metadata_reader',
password='secure_password',
database='ANALYTICS'
)
Create dedicated extraction account:
-- Create extraction user
CREATE USER metadata_reader
FROM DBC
AS PASSWORD = secure_password
PERM = 0
SPOOL = 100000000;
-- Grant SELECT on Data Dictionary
GRANT SELECT ON DBC TO metadata_reader;
-- Grant SELECT on target database
GRANT SELECT ON ANALYTICS TO metadata_reader;
Step 2: Extract Comprehensive Metadata
Build complete schema picture:
def extract_tables(conn, database):
"""Extract table metadata with comments."""
query = """
SELECT TableName, TableKind, CommentString,
CreateTimeStamp
FROM DBC.TablesV
WHERE DatabaseName = ?
AND TableKind IN ('T', 'V', 'O')
"""
cursor = conn.cursor()
cursor.execute(query, (database,))
return cursor.fetchall()
def extract_columns(conn, database, table):
"""Extract column metadata."""
query = """
SELECT ColumnName, ColumnId, ColumnType,
ColumnLength, Nullable, CommentString
FROM DBC.ColumnsV
WHERE DatabaseName = ? AND TableName = ?
ORDER BY ColumnId
"""
cursor = conn.cursor()
cursor.execute(query, (database, table))
return cursor.fetchall()
Step 3: Handle Teradata-Specific Features
Account for Teradata capabilities:
Temporal tables:
SELECT
TableName,
TemporalProperty
FROM DBC.TablesV
WHERE DatabaseName = 'ANALYTICS'
AND TemporalProperty IS NOT NULL;
Join Indexes:
SELECT
IndexName,
IndexTableName,
IndexDatabaseName
FROM DBC.IndicesV
WHERE IndexType = 'J'
AND DatabaseName = 'ANALYTICS';
Step 4: Generate Semantic Model
Transform to semantic definitions:
entity: Customers
source: ANALYTICS.CUSTOMER_DIM
description: "Customer dimension with demographics"
primary_index: CUSTOMER_KEY
dimensions:
- name: customer_name
source_column: CUST_NAME
type: text
- name: region
source_column: REGION_CD
type: categorical
- name: signup_date
source_column: SIGNUP_DT
type: time
grain: day
measures:
- name: customer_count
calculation: COUNT(DISTINCT CUSTOMER_KEY)
relationships:
- name: orders
target: OrderFact
type: one_to_many
join_column: CUSTOMER_KEY
Step 5: Maintain Synchronization
Keep semantic layer current:
- Query DBC.EventLog for DDL changes
- Compare object timestamps
- Implement change detection
- Schedule periodic full extractions
Common Challenges
Large Enterprise Schemas
Teradata environments with thousands of tables:
Solutions:
- Focus on analytics-relevant databases
- Use incremental extraction
- Leverage DBC.TableStatsV for active tables
- Prioritize by query frequency
Legacy Naming Conventions
Older schemas with cryptic names:
Solutions:
- Extract comments for context
- Build synonym mappings
- Layer semantic names over technical names
- Document naming conventions
Query Logging Access
DBQLogTbl access may be restricted:
Solutions:
- Work with DBAs for access
- Use sampled query analysis
- Infer relationships from schema
- Leverage existing documentation
The Value of Teradata Metadata Extraction
Automated metadata extraction from Teradata delivers:
Enterprise visibility: Understand massive data warehouses.
Performance optimization: Leverage Primary Index information.
Relationship richness: Use referential integrity constraints.
Legacy modernization: Bring established schemas into modern analytics.
Organizations that automate Teradata metadata extraction build semantic layers that unlock decades of enterprise data - enabling AI-powered analytics that understand and optimize for Teradata's unique architecture.
Questions
Teradata provides metadata through DBC views including DBC.TablesV (tables and views), DBC.ColumnsV (column definitions), DBC.IndicesV (indexes), and DBC.All_RI_ParentsV (foreign key relationships). These views provide comprehensive access to all database objects and their properties.