What PostgreSQL system catalogs are most useful for metadata extraction?

The pg_catalog schema contains essential tables including pg_class (tables/views), pg_attribute (columns), pg_constraint (constraints), pg_namespace (schemas), and pg_description (comments). The information_schema provides a more SQL-standard interface to similar information.

How do I extract comments and documentation from PostgreSQL?

PostgreSQL stores object comments in pg_description and can be retrieved using obj_description() and col_description() functions. These comments provide valuable business context when building semantic layers and should be extracted along with structural metadata.

Can metadata extraction work across multiple PostgreSQL schemas?

Yes. PostgreSQL's system catalogs provide visibility across all schemas the extraction user can access. Query pg_namespace for available schemas and filter metadata queries by schema name. This enables building unified semantic layers from multi-schema databases.

How do partitioned tables appear in PostgreSQL metadata?

Partitioned tables have a parent table and child partitions. The pg_partitioned_table catalog identifies partition parents, while pg_inherits shows parent-child relationships. Your semantic layer should typically expose the parent table while understanding the partition structure for optimization.

Metadata Extraction from PostgreSQL: Schema Discovery for Semantic Layers

Metadata extraction from PostgreSQL is the process of programmatically reading schema structures, table definitions, column information, and relationships from your PostgreSQL database to build intelligent semantic layers. PostgreSQL provides one of the richest system catalogs available, with comprehensive information about tables, columns, constraints, and documentation that enables sophisticated semantic model generation.

This metadata forms the foundation for AI-powered analytics, allowing semantic layers to understand your database structure and generate accurate, optimized queries.

PostgreSQL Metadata Architecture

System Catalog Structure

PostgreSQL organizes metadata in pg_catalog:

Core catalog tables:

pg_class - tables, views, indexes, sequences
pg_attribute - columns
pg_type - data types
pg_namespace - schemas
pg_constraint - constraints
pg_description - comments
pg_index - index information

Information schema:

Standard SQL interface to metadata
More portable but less detailed
Good for cross-database compatibility

Metadata Richness

PostgreSQL stores extensive metadata:

Schema organization
Table and view definitions
Column types with full precision
Primary, foreign, unique, and check constraints
Indexes and their configurations
Comments at all object levels
Partition information
Inheritance relationships

Codd AI Integrations connect to PostgreSQL system catalogs to extract this rich metadata, building semantic layers that fully understand your database structure.

Metadata Extraction Queries

Schema and Table Discovery

Discover database structure:

-- List all schemas
SELECT schema_name
FROM information_schema.schemata
WHERE schema_name NOT IN ('pg_catalog', 'information_schema', 'pg_toast');

-- Discover tables and views
SELECT
    table_schema,
    table_name,
    table_type,
    obj_description(
        (quote_ident(table_schema) || '.' || quote_ident(table_name))::regclass
    ) as table_comment
FROM information_schema.tables
WHERE table_schema = 'public'
ORDER BY table_name;

-- Table sizes and row estimates
SELECT
    schemaname,
    relname as table_name,
    n_live_tup as row_estimate,
    pg_size_pretty(pg_total_relation_size(relid)) as total_size
FROM pg_stat_user_tables
WHERE schemaname = 'public';

Column Metadata

Extract detailed column information:

-- Column definitions with comments
SELECT
    c.table_schema,
    c.table_name,
    c.column_name,
    c.ordinal_position,
    c.data_type,
    c.character_maximum_length,
    c.numeric_precision,
    c.numeric_scale,
    c.is_nullable,
    c.column_default,
    col_description(
        (c.table_schema || '.' || c.table_name)::regclass,
        c.ordinal_position
    ) as column_comment
FROM information_schema.columns c
WHERE c.table_schema = 'public'
ORDER BY c.table_name, c.ordinal_position;

-- Columns using pg_catalog for more detail
SELECT
    n.nspname as schema_name,
    c.relname as table_name,
    a.attname as column_name,
    a.attnum as position,
    pg_catalog.format_type(a.atttypid, a.atttypmod) as data_type,
    a.attnotnull as not_null,
    pg_get_expr(d.adbin, d.adrelid) as default_value
FROM pg_catalog.pg_attribute a
JOIN pg_catalog.pg_class c ON a.attrelid = c.oid
JOIN pg_catalog.pg_namespace n ON c.relnamespace = n.oid
LEFT JOIN pg_catalog.pg_attrdef d ON a.attrelid = d.adrelid AND a.attnum = d.adnum
WHERE n.nspname = 'public'
    AND a.attnum > 0
    AND NOT a.attisdropped
ORDER BY c.relname, a.attnum;

Constraint Discovery

Extract relationship information:

-- Primary keys
SELECT
    tc.table_schema,
    tc.table_name,
    kcu.column_name,
    tc.constraint_name
FROM information_schema.table_constraints tc
JOIN information_schema.key_column_usage kcu
    ON tc.constraint_name = kcu.constraint_name
    AND tc.table_schema = kcu.table_schema
WHERE tc.constraint_type = 'PRIMARY KEY'
    AND tc.table_schema = 'public';

-- Foreign keys with referenced tables
SELECT
    tc.table_schema,
    tc.table_name,
    kcu.column_name,
    ccu.table_schema AS foreign_table_schema,
    ccu.table_name AS foreign_table_name,
    ccu.column_name AS foreign_column_name,
    tc.constraint_name
FROM information_schema.table_constraints tc
JOIN information_schema.key_column_usage kcu
    ON tc.constraint_name = kcu.constraint_name
    AND tc.table_schema = kcu.table_schema
JOIN information_schema.constraint_column_usage ccu
    ON tc.constraint_name = ccu.constraint_name
WHERE tc.constraint_type = 'FOREIGN KEY'
    AND tc.table_schema = 'public';

-- Unique constraints
SELECT
    tc.table_name,
    kcu.column_name,
    tc.constraint_name
FROM information_schema.table_constraints tc
JOIN information_schema.key_column_usage kcu
    ON tc.constraint_name = kcu.constraint_name
WHERE tc.constraint_type = 'UNIQUE'
    AND tc.table_schema = 'public';

Index Information

Capture index metadata:

-- Index definitions
SELECT
    schemaname,
    tablename,
    indexname,
    indexdef
FROM pg_indexes
WHERE schemaname = 'public';

-- Detailed index columns
SELECT
    i.relname as index_name,
    t.relname as table_name,
    a.attname as column_name,
    ix.indisunique as is_unique,
    ix.indisprimary as is_primary
FROM pg_class t
JOIN pg_index ix ON t.oid = ix.indrelid
JOIN pg_class i ON i.oid = ix.indexrelid
JOIN pg_attribute a ON a.attrelid = t.oid
    AND a.attnum = ANY(ix.indkey)
JOIN pg_namespace n ON n.oid = t.relnamespace
WHERE n.nspname = 'public';

Partition Information

Handle partitioned tables:

-- Partitioned tables
SELECT
    n.nspname as schema_name,
    c.relname as table_name,
    pg_get_partkeydef(c.oid) as partition_key
FROM pg_partitioned_table pt
JOIN pg_class c ON pt.partrelid = c.oid
JOIN pg_namespace n ON c.relnamespace = n.oid;

-- Partition children
SELECT
    parent.relname as parent_table,
    child.relname as partition_name,
    pg_get_expr(child.relpartbound, child.oid) as partition_bounds
FROM pg_inherits
JOIN pg_class parent ON pg_inherits.inhparent = parent.oid
JOIN pg_class child ON pg_inherits.inhrelid = child.oid
JOIN pg_namespace n ON parent.relnamespace = n.oid
WHERE n.nspname = 'public';

Building Semantic Models

Table to Entity Mapping

Transform PostgreSQL metadata into semantic concepts:

Tables become entities:

Primary key defines entity identity
Table comment provides description
Partition parent represents logical entity

Columns map to attributes:

text, varchar become text dimensions
integer, bigint, numeric become measures
date, timestamp become time dimensions
boolean becomes categorical dimension

Relationship Mapping

Leverage PostgreSQL's rich constraint system:

Foreign keys directly map to semantic relationships:

entity: Orders
  relationships:
    - name: customer
      type: many_to_one
      target: Customers
      # Derived from foreign key constraint
      join:
        source_column: customer_id
        target_column: id

Unique constraints indicate dimension tables:

Tables with unique keys are often dimensions
Natural keys suggest business identifiers

Comment Utilization

Leverage PostgreSQL comments:

entity: Customers
  # From table comment
  description: "Active customer accounts with subscription status"

  attributes:
    - name: email
      source_column: email
      # From column comment
      description: "Primary contact email, validated format"

Implementation Guide

Step 1: Configure Connection

Set up secure database access:

import psycopg2

conn = psycopg2.connect(
    host='localhost',
    port=5432,
    database='analytics',
    user='metadata_reader',
    password='secure_password'
)

Create a dedicated user with minimal permissions:

CREATE USER metadata_reader WITH PASSWORD 'secure_password';
GRANT USAGE ON SCHEMA public TO metadata_reader;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO metadata_reader;

Step 2: Extract Comprehensive Metadata

Build complete schema picture:

def extract_schema_metadata(conn, schema='public'):
    """Extract complete schema metadata."""
    metadata = {
        'tables': extract_tables(conn, schema),
        'columns': extract_columns(conn, schema),
        'primary_keys': extract_primary_keys(conn, schema),
        'foreign_keys': extract_foreign_keys(conn, schema),
        'indexes': extract_indexes(conn, schema)
    }
    return metadata

def extract_tables(conn, schema):
    query = """
        SELECT table_name, table_type,
               obj_description((quote_ident(%s) || '.' ||
                   quote_ident(table_name))::regclass) as comment
        FROM information_schema.tables
        WHERE table_schema = %s
    """
    cursor = conn.cursor()
    cursor.execute(query, (schema, schema))
    return cursor.fetchall()

Step 3: Handle PostgreSQL-Specific Features

Account for PostgreSQL capabilities:

Array types:

# Detect array columns
if data_type.endswith('[]'):
    # Handle as multi-value attribute
    pass

JSON/JSONB columns:

# Identify JSON columns for special handling
if data_type in ('json', 'jsonb'):
    # May need path-based attribute extraction
    pass

Step 4: Generate Semantic Model

Transform to semantic definitions:

entity: Customers
  source: public.customers
  description: "Active customer accounts with subscription status"
  primary_key: id

  dimensions:
    - name: customer_name
      source_column: name
      type: text
      description: "Full legal name"

    - name: signup_date
      source_column: created_at
      type: time
      grain: day

  measures:
    - name: customer_count
      calculation: COUNT(DISTINCT id)

  relationships:
    - name: orders
      target: Orders
      type: one_to_many
      join_column: id
      foreign_column: customer_id

Step 5: Maintain Synchronization

Keep semantic layer current:

Monitor pg_stat_activity for DDL changes
Compare schema versions periodically
Update semantic models on change detection
Validate queries after updates

Common Challenges

Schema Proliferation

Many schemas with overlapping tables:

Solutions:

Focus on production schemas
Document schema purposes
Create unified semantic models

Missing Documentation

Limited table and column comments:

Solutions:

Encourage comment creation
Generate AI-suggested descriptions
Layer documentation in semantic model

Complex Data Types

PostgreSQL's rich type system:

Solutions:

Map composite types to structured attributes
Handle arrays appropriately
Document JSON path patterns

The Value of PostgreSQL Metadata Extraction

Automated metadata extraction from PostgreSQL delivers:

Relationship richness: Leverage PostgreSQL's constraint system.

Documentation integration: Use comments for semantic descriptions.

Type intelligence: Map PostgreSQL's rich types accurately.

Complete visibility: Understand all schemas and objects.

Organizations that automate PostgreSQL metadata extraction build semantic layers that fully leverage their database's relational structure - enabling AI-powered analytics that understand and respect data relationships.

Metadata Extraction from PostgreSQL: Schema Discovery for Semantic Layers

PostgreSQL Metadata Architecture

System Catalog Structure

Metadata Richness

Metadata Extraction Queries

Schema and Table Discovery

Column Metadata

Constraint Discovery

Index Information

Partition Information

Building Semantic Models

Table to Entity Mapping

Relationship Mapping

Comment Utilization

Implementation Guide

Step 1: Configure Connection

Step 2: Extract Comprehensive Metadata

Step 3: Handle PostgreSQL-Specific Features

Step 4: Generate Semantic Model

Step 5: Maintain Synchronization

Common Challenges

Schema Proliferation

Missing Documentation

Complex Data Types

The Value of PostgreSQL Metadata Extraction

Questions

Related