What's the difference between technical and business metadata?

Technical metadata describes how data is stored and structured - column types, table relationships, storage formats. Business metadata describes what data means - definitions, ownership, usage context. Technical metadata is typically auto-generated from systems. Business metadata requires human input. Both are essential: technical for system operation, business for human understanding.

Where is technical metadata stored?

Technical metadata lives in multiple places: database system catalogs, data catalog repositories, ETL tool metadata stores, and infrastructure monitoring systems. Modern metadata platforms aggregate these sources into unified catalogs. The challenge is keeping aggregated metadata synchronized with changing source systems.

How does technical metadata support data governance?

Technical metadata provides the foundation for governance - you cannot govern what you cannot see. Column-level security requires knowing what columns exist. Retention policies need storage location information. Lineage for compliance requires transformation metadata. Access control depends on understanding system structure. Technical metadata makes governance actionable.

Can technical metadata be extracted automatically?

Yes, most technical metadata can be automatically extracted from source systems. Databases expose catalogs via information_schema. ETL tools provide API access to job definitions. Cloud platforms offer metadata APIs. Automated extraction is far more reliable than manual documentation. Platforms like Codd AI automate this extraction across diverse sources.

Technical Metadata Types: Understanding the Foundation of Data Systems

Technical metadata is information that describes the structural, physical, and operational characteristics of data - how data is organized, stored, formatted, and processed within systems. It includes schema definitions, storage specifications, access patterns, and processing information that enables systems to function and humans to understand data infrastructure.

While business metadata explains what data means, technical metadata explains how data exists. Together, they provide complete understanding of data assets.

Categories of Technical Metadata

Schema Metadata

Schema metadata describes data structure:

Table and View Definitions

Table names and types (table, view, materialized view)
Column names and ordinal positions
Data types and precision
Nullable versus required constraints
Default values

Relationships

Primary key definitions
Foreign key constraints
Unique constraints
Check constraints
Index definitions

Object Properties

Creation and modification timestamps
Storage parameters (tablespaces, partitioning)
Row counts and size estimates
Compression settings

Schema metadata is typically accessible through database information_schema views or system catalogs.

Storage Metadata

Storage metadata describes physical data organization:

File System Information

File paths and names
File formats (Parquet, ORC, CSV, JSON)
Compression algorithms
Encryption status

Partitioning

Partition schemes (date, region, category)
Partition locations
Partition statistics

Cloud Storage

Bucket and object paths
Storage classes (hot, cold, archive)
Replication configurations
Lifecycle policies

Processing Metadata

Processing metadata describes data transformations:

ETL Job Definitions

Pipeline names and versions
Source and target specifications
Transformation logic references
Scheduling configurations

Execution History

Run timestamps and durations
Success and failure status
Records processed
Error messages and logs

Dependencies

Upstream data sources
Downstream consumers
Job orchestration relationships

Access Metadata

Access metadata describes data usage:

Permissions

User and role grants
Column-level security policies
Row-level security filters
Masking configurations

Query History

User queries executed
Tables and columns accessed
Query frequency patterns
Performance characteristics

Connection Information

Database connection strings
API endpoints
Authentication methods
Network configurations

Technical Metadata by Platform

Relational Databases

Traditional databases provide rich catalog information:

-- PostgreSQL example
SELECT
  table_name,
  column_name,
  data_type,
  is_nullable,
  column_default
FROM information_schema.columns
WHERE table_schema = 'public';

Databases expose metadata through standardized schemas (information_schema) and vendor-specific catalogs (pg_catalog, sys.objects).

Cloud Data Warehouses

Modern warehouses extend traditional catalogs:

Snowflake: ACCOUNT_USAGE schema with query history, storage metrics, access history BigQuery: INFORMATION_SCHEMA with jobs, tables, views, partitions metadata Redshift: System tables with design, performance, and usage metadata

Cloud platforms add operational metadata that traditional databases lack - query history, cost allocation, resource consumption.

Data Lakes

Lakes require different metadata approaches:

File Catalogs: Systems like AWS Glue Catalog or Hive Metastore track file locations and schema Table Formats: Delta Lake, Iceberg, and Hudi maintain metadata about table versions, partitions, and schema evolution Object Tags: Cloud storage metadata tags for classification and lifecycle

Lakes lack the built-in catalogs of databases, requiring explicit metadata infrastructure.

Transformation Platforms

ETL and modeling tools generate processing metadata:

dbt: Manifest files with model definitions, sources, tests, and documentation Airflow: DAG definitions, task instances, execution logs Spark: Job history, stage metrics, shuffle statistics

This processing metadata is essential for understanding how data flows and transforms.

Extracting Technical Metadata

Database Introspection

Query system catalogs directly to extract schema metadata:

-- Extract table and column information
SELECT
  t.table_name,
  c.column_name,
  c.data_type,
  c.character_maximum_length,
  tc.constraint_type
FROM information_schema.tables t
JOIN information_schema.columns c ON t.table_name = c.table_name
LEFT JOIN information_schema.key_column_usage kcu
  ON c.column_name = kcu.column_name AND c.table_name = kcu.table_name
LEFT JOIN information_schema.table_constraints tc
  ON kcu.constraint_name = tc.constraint_name
WHERE t.table_schema = 'analytics';

API Integration

Modern platforms expose metadata through APIs:

REST APIs for cloud services
Python SDKs for programmatic access
GraphQL endpoints for flexible queries
Webhook notifications for change events

Codd Semantic Layer integrates with these APIs to aggregate technical metadata from across your data stack.

Log Analysis

Operational metadata often lives in logs:

Query logs for access patterns
Audit logs for security events
Job logs for processing history
Error logs for quality issues

Log analysis extracts metadata not available through catalogs.

File Inspection

For file-based data, extract metadata from files themselves:

Parquet footer metadata (schema, row groups, statistics)
JSON schema inference from samples
CSV header analysis
File system attributes (size, modification time)

Technical Metadata Use Cases

Impact Analysis

Before modifying data structures, assess downstream effects:

Query schema metadata to find column dependencies
Trace processing metadata to identify affected pipelines
Check access metadata for impacted users and applications
Plan migration based on comprehensive impact understanding

Performance Optimization

Use operational metadata to identify optimization opportunities:

Query history reveals slow-performing queries
Access patterns show frequently used tables deserving optimization
Processing metrics identify bottleneck transformations
Storage metadata guides partitioning decisions

Troubleshooting

Debug data issues using technical metadata:

Schema metadata confirms expected structure
Processing metadata shows transformation execution
Access metadata reveals who changed what
Storage metadata indicates data location for investigation

Automation

Technical metadata enables automated operations:

Schema metadata drives code generation
Processing metadata enables intelligent scheduling
Access metadata automates permission provisioning
Storage metadata informs tiering and archival

Technical Metadata Management

Cataloging

Aggregate technical metadata into searchable catalogs:

Centralize metadata from diverse sources
Enable search and discovery
Provide API access for tooling
Maintain freshness through synchronization

Versioning

Track metadata changes over time:

Schema version history
Configuration changes
Permission modifications
Processing definition evolution

Version history enables rollback, audit, and change analysis.

Quality

Ensure metadata accuracy:

Validate extracted metadata against sources
Detect stale or orphaned metadata
Monitor for synchronization failures
Alert on unexpected changes

Security

Protect metadata appropriately:

Some metadata is sensitive (connection strings, row counts)
Apply access controls to metadata catalogs
Audit metadata access
Encrypt sensitive metadata at rest and in transit

Connecting Technical and Business Metadata

Technical metadata alone lacks meaning. Business metadata alone lacks grounding. Value comes from connecting them:

Column Mappings: Link technical column names to business term definitions Lineage Context: Add business meaning to technical transformation flows Quality Interpretation: Explain what technical metrics mean for business use Access Justification: Document why users need specific technical access

Codd Semantic Layer bridges technical and business metadata, enabling users to understand both structure and meaning through a unified interface.

Building Technical Metadata Capability

Organizations need:

Extraction Automation: Manual metadata documentation fails at scale. Invest in automated extraction from all source systems.

Centralized Catalog: Scattered metadata is hard to use. Aggregate into discoverable, accessible catalogs.

Integration APIs: Metadata must flow to tools that need it - BI platforms, governance systems, automation pipelines.

Freshness Monitoring: Stale metadata is worse than no metadata. Monitor synchronization and alert on staleness.

Technical metadata is infrastructure - invisible when working well, painfully obvious when broken. Organizations that invest in comprehensive technical metadata management enable all downstream metadata-dependent capabilities.