Metadata Extraction from Teradata: Enterprise Data Warehouse Schema Discovery

Learn how to extract metadata from Teradata for building semantic layers, including Data Dictionary queries, schema discovery, and relationship mapping for enterprise AI-powered analytics.

7 min read·

Metadata extraction from Teradata is the process of programmatically reading schema structures, table definitions, column information, and relationships from your Teradata data warehouse to build intelligent semantic layers. Teradata's comprehensive Data Dictionary provides detailed metadata through DBC views, enabling sophisticated semantic model generation for enterprise analytics at scale.

This metadata forms the foundation for AI-powered analytics, allowing semantic layers to understand complex enterprise schemas and generate optimized Teradata SQL.

Teradata Data Dictionary Architecture

DBC Database Structure

Teradata stores all metadata in the DBC database:

Core metadata views:

  • DBC.DatabasesV - database information
  • DBC.TablesV - tables, views, macros
  • DBC.ColumnsV - column definitions
  • DBC.IndicesV - index information
  • DBC.All_RI_ParentsV - foreign key relationships

Additional metadata:

  • DBC.FunctionsV - user-defined functions
  • DBC.RolesV - security roles
  • DBC.AccessRightsV - permissions
  • DBC.PartitioningConstraintsV - partitioning

Metadata Richness

Teradata stores extensive metadata:

  • Database hierarchy and spaces
  • Table definitions and storage
  • Column types with full attributes
  • Primary Index configuration
  • Secondary indexes
  • Foreign key relationships
  • Partition elimination columns
  • Comments and documentation

Codd AI Integrations connect to Teradata's Data Dictionary to extract this comprehensive metadata, building semantic layers that understand enterprise-scale data warehouses.

Metadata Extraction Queries

Database and Table Discovery

Discover database structure:

-- List accessible databases
SELECT
    DatabaseName,
    OwnerName,
    CommentString as Description,
    PermSpace,
    SpoolSpace
FROM DBC.DatabasesV
WHERE DatabaseName NOT IN ('DBC', 'SystemFE', 'SYSLIB')
ORDER BY DatabaseName;

-- Discover tables with details
SELECT
    t.DatabaseName,
    t.TableName,
    t.TableKind,  -- 'T' = table, 'V' = view
    t.CommentString as TableComment,
    t.CreatorName,
    t.CreateTimeStamp
FROM DBC.TablesV t
WHERE t.DatabaseName = 'ANALYTICS'
    AND t.TableKind IN ('T', 'V', 'O')  -- Tables, Views, No-PI tables
ORDER BY t.TableName;

-- Table sizes
SELECT
    DatabaseName,
    TableName,
    SUM(CurrentPerm) as CurrentPerm_Bytes,
    SUM(PeakPerm) as PeakPerm_Bytes
FROM DBC.TableSizeV
WHERE DatabaseName = 'ANALYTICS'
GROUP BY DatabaseName, TableName;

-- Row counts from statistics
SELECT
    DatabaseName,
    TableName,
    RowCount
FROM DBC.TableStatsV
WHERE DatabaseName = 'ANALYTICS';

Column Metadata

Extract detailed column information:

-- Column definitions
SELECT
    c.DatabaseName,
    c.TableName,
    c.ColumnName,
    c.ColumnId,
    c.ColumnType,
    c.ColumnFormat,
    c.ColumnLength,
    c.DecimalTotalDigits,
    c.DecimalFractionalDigits,
    c.Nullable,
    c.DefaultValue,
    c.CommentString as ColumnComment
FROM DBC.ColumnsV c
WHERE c.DatabaseName = 'ANALYTICS'
ORDER BY c.TableName, c.ColumnId;

-- Column type mapping
SELECT
    ColumnType,
    CASE ColumnType
        WHEN 'CV' THEN 'VARCHAR'
        WHEN 'CF' THEN 'CHAR'
        WHEN 'I' THEN 'INTEGER'
        WHEN 'I1' THEN 'BYTEINT'
        WHEN 'I2' THEN 'SMALLINT'
        WHEN 'I8' THEN 'BIGINT'
        WHEN 'D' THEN 'DECIMAL'
        WHEN 'F' THEN 'FLOAT'
        WHEN 'DA' THEN 'DATE'
        WHEN 'TS' THEN 'TIMESTAMP'
        WHEN 'AT' THEN 'TIME'
        ELSE ColumnType
    END as TypeName
FROM DBC.ColumnsV
WHERE DatabaseName = 'ANALYTICS';

Index and Primary Index Discovery

Extract index information:

-- Primary Index columns
SELECT
    i.DatabaseName,
    i.TableName,
    i.ColumnName,
    i.ColumnPosition,
    i.IndexType,
    CASE i.IndexType
        WHEN 'P' THEN 'Primary Index'
        WHEN 'K' THEN 'Primary Key'
        WHEN 'S' THEN 'Secondary Index'
        WHEN 'U' THEN 'Unique Secondary Index'
        ELSE i.IndexType
    END as IndexTypeDesc
FROM DBC.IndicesV i
WHERE i.DatabaseName = 'ANALYTICS'
    AND i.IndexType IN ('P', 'K')
ORDER BY i.TableName, i.ColumnPosition;

-- All indexes
SELECT
    DatabaseName,
    TableName,
    IndexName,
    IndexType,
    UniqueFlag
FROM DBC.IndicesV
WHERE DatabaseName = 'ANALYTICS'
GROUP BY 1, 2, 3, 4, 5;

Relationship Discovery

Extract foreign key relationships:

-- Foreign key constraints
SELECT
    ChildDB as child_database,
    ChildTable as child_table,
    ChildKeyColumn as child_column,
    ParentDB as parent_database,
    ParentTable as parent_table,
    ParentKeyColumn as parent_column,
    IndexName as constraint_name
FROM DBC.All_RI_ParentsV
WHERE ChildDB = 'ANALYTICS'
ORDER BY ChildTable, IndexName;

-- Full relationship details
SELECT
    rp.ChildDB,
    rp.ChildTable,
    rp.ParentDB,
    rp.ParentTable,
    LISTAGG(rp.ChildKeyColumn, ', ')
        WITHIN GROUP (ORDER BY rp.ColumnId) as ChildColumns,
    LISTAGG(rp.ParentKeyColumn, ', ')
        WITHIN GROUP (ORDER BY rp.ColumnId) as ParentColumns
FROM DBC.All_RI_ParentsV rp
WHERE rp.ChildDB = 'ANALYTICS'
GROUP BY rp.ChildDB, rp.ChildTable, rp.ParentDB,
         rp.ParentTable, rp.IndexName;

Partition Information

Handle partitioned tables:

-- Partitioned tables
SELECT
    t.DatabaseName,
    t.TableName,
    pc.ColumnPartitioningLevel,
    pc.PartitioningExpression
FROM DBC.TablesV t
JOIN DBC.PartitioningConstraintsV pc
    ON t.DatabaseName = pc.DatabaseName
    AND t.TableName = pc.TableName
WHERE t.DatabaseName = 'ANALYTICS';

-- Partition elimination columns
SELECT
    DatabaseName,
    TableName,
    ColumnName as PartitionColumn
FROM DBC.ColumnsV
WHERE PartitioningColumn = 'Y'
    AND DatabaseName = 'ANALYTICS';

Building Semantic Models

Table to Entity Mapping

Transform Teradata metadata into semantic concepts:

Tables become entities:

  • Primary Index informs join optimization
  • Comments provide descriptions
  • Partition columns indicate time dimensions

Columns map to attributes:

  • CV, CF (VARCHAR, CHAR) become text dimensions
  • I, I2, I8, D, F become measures
  • DA, TS become time dimensions
  • Column format hints at display preferences

Relationship Mapping

Leverage Teradata's referential integrity:

entity: Orders
  source: ANALYTICS.ORDERS
  description: "Customer order transactions"
  primary_index: CUSTOMER_ID

  relationships:
    - name: customer
      type: many_to_one
      target: Customers
      # From RI constraint
      join:
        source_column: CUSTOMER_ID
        target_column: CUSTOMER_ID

Primary Index Optimization

Use PI information for query optimization:

entity: SalesTransactions
  source: ANALYTICS.SALES_FACT
  primary_index: STORE_ID
  partition_column: SALE_DATE

  # Semantic layer can optimize joins
  # with tables sharing primary index column

Implementation Approach

Step 1: Configure Connection

Set up Teradata connectivity:

import teradatasql

conn = teradatasql.connect(
    host='teradata-server',
    user='metadata_reader',
    password='secure_password',
    database='ANALYTICS'
)

Create dedicated extraction account:

-- Create extraction user
CREATE USER metadata_reader
    FROM DBC
    AS PASSWORD = secure_password
    PERM = 0
    SPOOL = 100000000;

-- Grant SELECT on Data Dictionary
GRANT SELECT ON DBC TO metadata_reader;

-- Grant SELECT on target database
GRANT SELECT ON ANALYTICS TO metadata_reader;

Step 2: Extract Comprehensive Metadata

Build complete schema picture:

def extract_tables(conn, database):
    """Extract table metadata with comments."""
    query = """
        SELECT TableName, TableKind, CommentString,
               CreateTimeStamp
        FROM DBC.TablesV
        WHERE DatabaseName = ?
            AND TableKind IN ('T', 'V', 'O')
    """
    cursor = conn.cursor()
    cursor.execute(query, (database,))
    return cursor.fetchall()

def extract_columns(conn, database, table):
    """Extract column metadata."""
    query = """
        SELECT ColumnName, ColumnId, ColumnType,
               ColumnLength, Nullable, CommentString
        FROM DBC.ColumnsV
        WHERE DatabaseName = ? AND TableName = ?
        ORDER BY ColumnId
    """
    cursor = conn.cursor()
    cursor.execute(query, (database, table))
    return cursor.fetchall()

Step 3: Handle Teradata-Specific Features

Account for Teradata capabilities:

Temporal tables:

SELECT
    TableName,
    TemporalProperty
FROM DBC.TablesV
WHERE DatabaseName = 'ANALYTICS'
    AND TemporalProperty IS NOT NULL;

Join Indexes:

SELECT
    IndexName,
    IndexTableName,
    IndexDatabaseName
FROM DBC.IndicesV
WHERE IndexType = 'J'
    AND DatabaseName = 'ANALYTICS';

Step 4: Generate Semantic Model

Transform to semantic definitions:

entity: Customers
  source: ANALYTICS.CUSTOMER_DIM
  description: "Customer dimension with demographics"
  primary_index: CUSTOMER_KEY

  dimensions:
    - name: customer_name
      source_column: CUST_NAME
      type: text

    - name: region
      source_column: REGION_CD
      type: categorical

    - name: signup_date
      source_column: SIGNUP_DT
      type: time
      grain: day

  measures:
    - name: customer_count
      calculation: COUNT(DISTINCT CUSTOMER_KEY)

  relationships:
    - name: orders
      target: OrderFact
      type: one_to_many
      join_column: CUSTOMER_KEY

Step 5: Maintain Synchronization

Keep semantic layer current:

  • Query DBC.EventLog for DDL changes
  • Compare object timestamps
  • Implement change detection
  • Schedule periodic full extractions

Common Challenges

Large Enterprise Schemas

Teradata environments with thousands of tables:

Solutions:

  • Focus on analytics-relevant databases
  • Use incremental extraction
  • Leverage DBC.TableStatsV for active tables
  • Prioritize by query frequency

Legacy Naming Conventions

Older schemas with cryptic names:

Solutions:

  • Extract comments for context
  • Build synonym mappings
  • Layer semantic names over technical names
  • Document naming conventions

Query Logging Access

DBQLogTbl access may be restricted:

Solutions:

  • Work with DBAs for access
  • Use sampled query analysis
  • Infer relationships from schema
  • Leverage existing documentation

The Value of Teradata Metadata Extraction

Automated metadata extraction from Teradata delivers:

Enterprise visibility: Understand massive data warehouses.

Performance optimization: Leverage Primary Index information.

Relationship richness: Use referential integrity constraints.

Legacy modernization: Bring established schemas into modern analytics.

Organizations that automate Teradata metadata extraction build semantic layers that unlock decades of enterprise data - enabling AI-powered analytics that understand and optimize for Teradata's unique architecture.

Questions

Teradata provides metadata through DBC views including DBC.TablesV (tables and views), DBC.ColumnsV (column definitions), DBC.IndicesV (indexes), and DBC.All_RI_ParentsV (foreign key relationships). These views provide comprehensive access to all database objects and their properties.

Related