What system tables contain Redshift metadata?

Redshift provides metadata through PG_CATALOG system tables (PostgreSQL-compatible) and SVV_ views. Key sources include svv_tables, svv_columns, svv_table_info for table metadata, and pg_constraint for relationships. The information_schema views also provide standard SQL metadata access.

How do I extract metadata from Redshift Serverless?

Redshift Serverless uses the same system catalogs as provisioned Redshift. Connect using the serverless endpoint, and query svv_ and pg_ system tables identically. Workgroup and namespace information is available through serverless-specific system tables.

Can I extract external table metadata from Redshift Spectrum?

Yes. External tables registered in Redshift are visible in svv_external_tables and svv_external_columns. These views show schema information for data stored in S3 and accessed through Spectrum, enabling unified semantic layer coverage.

How do distribution and sort keys affect semantic layer design?

Distribution and sort keys optimize query performance. Your semantic layer should understand these keys to generate efficient queries. Extract this metadata from svv_table_info and consider aligning semantic model join patterns with distribution key relationships.

Metadata Extraction from Redshift: Building Semantic Layers from Your Data Warehouse

Metadata extraction from Amazon Redshift is the process of programmatically reading schema structures, table definitions, column information, and relationships from your Redshift data warehouse to build intelligent semantic layers. Redshift provides comprehensive system catalogs and views that expose detailed metadata about your data warehouse structure, enabling automated discovery and semantic model generation.

This extracted metadata forms the foundation for AI-powered analytics, allowing semantic layers to understand your data warehouse and generate accurate queries.

Redshift Metadata Architecture

System Catalog Structure

Redshift provides metadata through multiple interfaces:

PostgreSQL system catalogs (pg_catalog):

pg_tables, pg_views
pg_columns
pg_constraint
pg_namespace

Redshift system views (svv_):

svv_tables, svv_columns
svv_table_info
svv_external_tables
svv_all_schemas

Information schema:

information_schema.tables
information_schema.columns
information_schema.table_constraints

Metadata Components

Redshift stores rich metadata:

Database and schema organization
Table definitions and storage properties
Column names, types, and encoding
Distribution and sort key configuration
Primary and foreign key constraints
Table statistics and row counts
Query history and access patterns

Codd AI Integrations connect to Redshift system catalogs to extract this metadata automatically, building semantic layers that understand your data warehouse structure and optimize query generation.

Metadata Extraction Queries

Schema and Table Discovery

Query available schemas and tables:

-- List all schemas
SELECT schema_name, schema_owner
FROM svv_all_schemas
WHERE schema_name NOT IN ('pg_catalog', 'information_schema');

-- Discover tables with details
SELECT
    schema_name,
    table_name,
    table_type,
    table_owner,
    remarks
FROM svv_tables
WHERE schema_name = 'analytics';

-- Table statistics and storage
SELECT
    schema as schema_name,
    "table" as table_name,
    diststyle,
    sortkey1,
    tbl_rows,
    size as size_mb
FROM svv_table_info
WHERE schema = 'analytics';

Column Metadata

Extract detailed column information:

-- Column definitions
SELECT
    table_schema,
    table_name,
    column_name,
    ordinal_position,
    data_type,
    character_maximum_length,
    numeric_precision,
    is_nullable,
    column_default
FROM svv_columns
WHERE table_schema = 'analytics'
ORDER BY table_name, ordinal_position;

-- Column encoding and compression
SELECT
    schemaname,
    tablename,
    column,
    type,
    encoding,
    distkey,
    sortkey
FROM pg_table_def
WHERE schemaname = 'analytics';

Relationship Discovery

Identify table relationships:

-- Primary keys
SELECT
    tc.constraint_name,
    tc.table_schema,
    tc.table_name,
    kcu.column_name
FROM information_schema.table_constraints tc
JOIN information_schema.key_column_usage kcu
    ON tc.constraint_name = kcu.constraint_name
WHERE tc.constraint_type = 'PRIMARY KEY'
    AND tc.table_schema = 'analytics';

-- Foreign keys
SELECT
    tc.constraint_name,
    tc.table_schema,
    tc.table_name,
    kcu.column_name,
    ccu.table_schema AS foreign_table_schema,
    ccu.table_name AS foreign_table_name,
    ccu.column_name AS foreign_column_name
FROM information_schema.table_constraints tc
JOIN information_schema.key_column_usage kcu
    ON tc.constraint_name = kcu.constraint_name
JOIN information_schema.constraint_column_usage ccu
    ON ccu.constraint_name = tc.constraint_name
WHERE tc.constraint_type = 'FOREIGN KEY';

External Table Metadata

Capture Redshift Spectrum tables:

-- External schemas
SELECT
    schemaname,
    databasename,
    esoptions
FROM svv_external_schemas;

-- External tables
SELECT
    schemaname,
    tablename,
    location,
    input_format,
    output_format
FROM svv_external_tables;

-- External columns
SELECT
    schemaname,
    tablename,
    columnname,
    external_type,
    columnnum
FROM svv_external_columns
ORDER BY schemaname, tablename, columnnum;

Query Pattern Analysis

Understand data usage:

-- Recent query patterns
SELECT
    userid,
    query,
    querytxt,
    starttime,
    endtime
FROM stl_query
WHERE starttime > DATEADD(day, -30, GETDATE())
    AND querytxt LIKE '%analytics.%'
ORDER BY starttime DESC
LIMIT 1000;

-- Table access frequency
SELECT
    table_id,
    "table" as table_name,
    pct_mem_used,
    rows,
    unsorted_rows
FROM svv_table_info
ORDER BY rows DESC;

Building Semantic Models

Table to Entity Mapping

Transform Redshift metadata into semantic concepts:

Tables map to entities:

Distribution style informs join optimization
Sort keys indicate common query patterns
Row counts help with cardinality understanding

Columns become attributes:

VARCHAR becomes text dimension
INTEGER, DECIMAL become measures
DATE, TIMESTAMP become time dimensions
Encoding hints at column usage patterns

Relationship Modeling

Leverage constraint metadata:

Primary keys define unique identifiers:

Map to entity primary keys
Enable record-level operations

Foreign keys define relationships:

Map to semantic layer joins
Inform navigation between entities

Inferred relationships fill gaps:

Match column names across tables
Analyze query join patterns
Review distribution key relationships

Distribution-Aware Semantics

Optimize semantic queries using distribution metadata:

KEY distribution:

Join on distribution key for collocated joins
Semantic layer should prefer these join patterns

ALL distribution:

Small tables replicated to all nodes
Efficient for broadcast joins

EVEN distribution:

Data spread across nodes
Consider for large fact tables

Implementation Approach

Step 1: Configure Connection

Set up secure access to Redshift:

import psycopg2

conn = psycopg2.connect(
    host='cluster.region.redshift.amazonaws.com',
    port=5439,
    database='analytics',
    user='extraction_user',
    password='secure_password'
)

Consider using IAM authentication for enhanced security.

Step 2: Extract Schema Metadata

Gather comprehensive schema information:

def extract_tables(conn, schema):
    query = """
        SELECT table_name, table_type, remarks
        FROM svv_tables
        WHERE schema_name = %s
    """
    cursor = conn.cursor()
    cursor.execute(query, (schema,))
    return cursor.fetchall()

def extract_columns(conn, schema, table):
    query = """
        SELECT column_name, data_type, is_nullable,
               ordinal_position
        FROM svv_columns
        WHERE table_schema = %s AND table_name = %s
        ORDER BY ordinal_position
    """
    cursor = conn.cursor()
    cursor.execute(query, (schema, table))
    return cursor.fetchall()

Step 3: Capture Relationships

Extract constraint information:

def extract_relationships(conn, schema):
    query = """
        SELECT
            tc.table_name,
            kcu.column_name,
            ccu.table_name AS referenced_table,
            ccu.column_name AS referenced_column
        FROM information_schema.table_constraints tc
        JOIN information_schema.key_column_usage kcu
            ON tc.constraint_name = kcu.constraint_name
        JOIN information_schema.constraint_column_usage ccu
            ON ccu.constraint_name = tc.constraint_name
        WHERE tc.constraint_type = 'FOREIGN KEY'
            AND tc.table_schema = %s
    """
    cursor = conn.cursor()
    cursor.execute(query, (schema,))
    return cursor.fetchall()

Step 4: Generate Semantic Model

Transform metadata into semantic definitions:

entity: Orders
  source: analytics.orders
  description: "Customer order transactions"
  distribution_key: customer_id
  sort_key: order_date

  dimensions:
    - name: order_date
      source_column: order_date
      type: time

    - name: order_status
      source_column: status
      type: categorical

  measures:
    - name: revenue
      calculation: SUM(order_total)
      format: currency

  relationships:
    - name: customer
      target_entity: Customers
      join_type: many_to_one
      join_columns:
        - source: customer_id
          target: customer_id

Step 5: Schedule Synchronization

Maintain metadata currency:

Run extraction during maintenance windows
Compare against previous extraction
Update semantic models incrementally
Alert on schema changes

Common Challenges

Cross-Database Metadata

Redshift allows cross-database queries:

Solutions:

Extract from each database
Use datashare metadata for shared data
Aggregate into unified semantic layer

External Table Integration

Spectrum tables have different metadata patterns:

Solutions:

Query svv_external_* views
Map external types to internal types
Handle location and format information

Performance Considerations

Large clusters with many objects:

Solutions:

Limit extraction to relevant schemas
Use incremental extraction
Cache metadata locally
Schedule during low-usage periods

The Value of Redshift Metadata Extraction

Automated metadata extraction from Redshift provides:

Comprehensive visibility: Understand all schemas, tables, and relationships.

Performance optimization: Leverage distribution and sort keys.

Relationship discovery: Map foreign keys to semantic joins.

Spectrum integration: Include external data in semantic layer.

Organizations that automate Redshift metadata extraction build semantic layers that generate optimized queries - enabling AI-powered analytics that are both accurate and performant across your data warehouse.

Metadata Extraction from Redshift: Building Semantic Layers from Your Data Warehouse

Redshift Metadata Architecture

System Catalog Structure

Metadata Components

Metadata Extraction Queries

Schema and Table Discovery

Column Metadata

Relationship Discovery

External Table Metadata

Query Pattern Analysis

Building Semantic Models

Table to Entity Mapping

Relationship Modeling

Distribution-Aware Semantics

Implementation Approach

Step 1: Configure Connection

Step 2: Extract Schema Metadata

Step 3: Capture Relationships

Step 4: Generate Semantic Model

Step 5: Schedule Synchronization

Common Challenges

Cross-Database Metadata

External Table Integration

Performance Considerations

The Value of Redshift Metadata Extraction

Questions

Related