What's the difference between time series data and regular transactional data?

Time series data is specifically ordered by time, with analysis focusing on temporal patterns - trends, seasonality, correlations over time. Transactional data may have timestamps but is analyzed primarily by business dimensions (customer, product, region). The same data can be both - sales transactions are transactional data but can also be analyzed as a time series.

How should I choose between regular timestamps and specialized time series databases?

Use specialized time series databases when: query patterns are predominantly time-based, data volumes are very high, you need automatic downsampling/retention, or you need built-in functions for temporal analysis. Use regular databases with timestamp columns when: time is one dimension among many, you need complex joins with other data, or time series volumes are moderate.

How do I handle gaps in time series data?

Gaps can be handled at query time or during modeling. Options include: interpolation (fill with estimated values), forward-fill (use last known value), explicit NULL values, or generating synthetic points at regular intervals. The right approach depends on what the gaps represent - missing data, true zero values, or periods of inactivity.

What's the best granularity for storing time series data?

Store at the finest granularity you'll ever need for analysis. You can always aggregate up but can't disaggregate down. Balance storage costs against analytical needs. Consider tiered approaches - detailed data for recent periods, pre-aggregated summaries for historical data.

Time Series Data Modeling: Patterns for Temporal Analytics

Time series data consists of data points indexed in time order - typically measurements or events captured at regular or irregular intervals. Stock prices, server metrics, sensor readings, daily sales, and user activity logs are all time series data. The temporal ordering is not just metadata but the essential structure around which analysis revolves.

Modeling time series data correctly enables powerful temporal analytics - trend detection, seasonality analysis, anomaly identification, and forecasting. Poor time series modeling leads to inefficient queries, incorrect temporal calculations, and AI systems that misinterpret temporal patterns.

Time Series Characteristics

Regular vs. Irregular Intervals

Regular (fixed interval): Measurements at consistent time intervals

Server metrics every 60 seconds
Daily closing stock prices
Hourly weather readings

Irregular (variable interval): Events occurring at variable times

User login events
Sales transactions
Support ticket creation

Regular series enable simpler aggregation and gap detection. Irregular series require more careful handling of time windows.

Single vs. Multiple Series

Univariate: One measurement over time

Total website traffic per hour

Multivariate: Multiple related measurements over time

CPU usage, memory usage, and network traffic together

Multiple independent series: Many separate time series in one model

Metrics for each of 10,000 servers
Sales for each of 500 products

Density and Cardinality

High cardinality: Many distinct series (one per user, per sensor) High density: Frequent measurements within each series High volume: High cardinality * high density = massive data volumes

These characteristics drive storage and query optimization decisions.

Core Modeling Patterns

Wide Table Format

Each timestamp is a row, each series is a column:

CREATE TABLE daily_metrics (
  date DATE PRIMARY KEY,
  revenue DECIMAL(12,2),
  orders INT,
  visitors INT,
  signups INT
);

date	revenue	orders	visitors	signups
2024-02-01	12500	145	8900	52
2024-02-02	11200	132	8100	48

Pros:

Intuitive structure
Easy to query multiple metrics together
Works well for dashboards

Cons:

Schema changes needed for new metrics
Doesn't scale to high cardinality
Sparse series waste storage

Narrow (Long) Table Format

Each measurement is a row:

CREATE TABLE metrics (
  timestamp TIMESTAMP,
  metric_name VARCHAR(100),
  metric_value DECIMAL(20,6),
  PRIMARY KEY (timestamp, metric_name)
);

timestamp	metric_name	metric_value
2024-02-01	revenue	12500
2024-02-01	orders	145
2024-02-01	visitors	8900

Pros:

Flexible - add new metrics without schema changes
Handles sparse data efficiently
Scales to high cardinality

Cons:

Queries across metrics require pivoting
More rows = more storage overhead per value
Cross-metric calculations require self-joins

Entity-Metric Format

Combine entity identification with narrow format:

CREATE TABLE entity_metrics (
  timestamp TIMESTAMP,
  entity_type VARCHAR(50),
  entity_id VARCHAR(100),
  metric_name VARCHAR(100),
  metric_value DECIMAL(20,6),
  PRIMARY KEY (timestamp, entity_type, entity_id, metric_name)
);

timestamp	entity_type	entity_id	metric_name	metric_value
2024-02-01 10:00	server	web-01	cpu_usage	45.2
2024-02-01 10:00	server	web-01	memory_usage	78.5
2024-02-01 10:00	server	web-02	cpu_usage	32.1

This handles both multiple entities and multiple metrics per entity.

Partitioned Time Series

Partition by time for large-scale data:

CREATE TABLE sensor_readings (
  reading_time TIMESTAMP,
  sensor_id VARCHAR(50),
  temperature DECIMAL(5,2),
  humidity DECIMAL(5,2),
  pressure DECIMAL(7,2)
)
PARTITION BY RANGE (reading_time);

-- Monthly partitions
CREATE TABLE sensor_readings_2024_01 PARTITION OF sensor_readings
  FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
CREATE TABLE sensor_readings_2024_02 PARTITION OF sensor_readings
  FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');

Benefits:

Query pruning - only scan relevant time ranges
Efficient data lifecycle - drop old partitions
Parallel loading into different partitions

Temporal Query Patterns

Point-in-Time Queries

Get values at specific timestamps:

SELECT metric_name, metric_value
FROM metrics
WHERE timestamp = '2024-02-01 00:00:00';

For irregular series, get closest value:

SELECT metric_value
FROM metrics
WHERE metric_name = 'price'
  AND timestamp <= '2024-02-01 12:00:00'
ORDER BY timestamp DESC
LIMIT 1;

Time Range Aggregations

Aggregate over time windows:

-- Daily aggregates from minute-level data
SELECT
  DATE_TRUNC('day', timestamp) as day,
  AVG(value) as avg_value,
  MAX(value) as max_value,
  MIN(value) as min_value
FROM metrics
WHERE metric_name = 'cpu_usage'
  AND timestamp >= '2024-02-01'
  AND timestamp < '2024-02-08'
GROUP BY DATE_TRUNC('day', timestamp)
ORDER BY day;

Time Window Functions

Calculations relative to each point in time:

-- 7-day rolling average
SELECT
  date,
  value,
  AVG(value) OVER (
    ORDER BY date
    ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
  ) as rolling_7_day_avg
FROM daily_metrics;

-- Period-over-period comparison
SELECT
  date,
  value,
  LAG(value, 7) OVER (ORDER BY date) as value_7_days_ago,
  value - LAG(value, 7) OVER (ORDER BY date) as week_over_week_change
FROM daily_metrics;

Gap Detection and Filling

Identify missing data points:

-- Find gaps in hourly data
WITH expected_times AS (
  SELECT generate_series(
    '2024-02-01 00:00:00'::timestamp,
    '2024-02-07 23:00:00'::timestamp,
    '1 hour'::interval
  ) as expected_time
),
actual_times AS (
  SELECT DISTINCT DATE_TRUNC('hour', timestamp) as actual_time
  FROM metrics
  WHERE metric_name = 'requests'
)
SELECT expected_time
FROM expected_times e
LEFT JOIN actual_times a ON e.expected_time = a.actual_time
WHERE a.actual_time IS NULL;

Fill gaps with interpolation or carry-forward:

-- Forward-fill missing values
SELECT
  t.time_point,
  COALESCE(
    m.value,
    LAST_VALUE(m.value IGNORE NULLS) OVER (ORDER BY t.time_point)
  ) as value_filled
FROM time_spine t
LEFT JOIN metrics m ON t.time_point = m.timestamp AND m.metric_name = 'price';

Time Spine Tables

A time spine (or date dimension) provides a complete set of time points for joins:

CREATE TABLE time_spine (
  time_point TIMESTAMP PRIMARY KEY,
  date DATE,
  hour INT,
  day_of_week VARCHAR(10),
  month INT,
  quarter INT,
  year INT,
  is_weekend BOOLEAN,
  is_holiday BOOLEAN
);

-- Populated with every minute/hour/day in your data range
INSERT INTO time_spine
SELECT
  ts,
  ts::date,
  EXTRACT(HOUR FROM ts),
  ...
FROM generate_series(
  '2020-01-01'::timestamp,
  '2030-12-31'::timestamp,
  '1 day'::interval
) ts;

Time spines enable:

Gap detection and filling
Consistent time-based aggregation
Joining irregular events to regular time grids
Calendar-aware calculations

Downsampling and Retention

High-frequency data accumulates rapidly. Manage with tiered retention:

retention_policy:
  raw_data: 7_days          # Full second-level detail
  minute_aggregates: 30_days # 1-minute rollups
  hourly_aggregates: 1_year  # Hourly summaries
  daily_aggregates: 5_years  # Daily summaries

Implement with materialized aggregates:

-- Hourly aggregate table
CREATE TABLE metrics_hourly AS
SELECT
  DATE_TRUNC('hour', timestamp) as hour,
  metric_name,
  AVG(metric_value) as avg_value,
  MAX(metric_value) as max_value,
  MIN(metric_value) as min_value,
  COUNT(*) as sample_count
FROM metrics
GROUP BY DATE_TRUNC('hour', timestamp), metric_name;

AI Analytics Considerations

Time Zone Handling

Time zones cause subtle errors. Be explicit about:

What timezone raw data is stored in (prefer UTC)
What timezone business users expect in reports
How to handle entities across time zones

Document in semantic layer:

column:
  name: event_timestamp
  storage_timezone: UTC
  display_timezone: user_local
  business_day_cutoff: "00:00 US/Eastern"

Seasonality Awareness

AI systems need to understand cyclical patterns:

Daily patterns (peak hours, overnight lulls)
Weekly patterns (weekday vs. weekend)
Monthly patterns (end-of-month spikes)
Annual patterns (holidays, seasons)

metric:
  name: daily_active_users
  seasonality:
    weekly: significant   # Weekend dips expected
    annual: moderate      # Summer slowdown, holiday spikes
  comparison_recommendation: same_day_previous_week

Timestamp Precision

Document expected precision and how AI should handle queries:

table:
  name: server_metrics
  timestamp_precision: second
  expected_interval: 60_seconds
  gap_handling: interpolate_linear

This prevents AI from misinterpreting gaps or generating unrealistic precision in results.

Period Boundary Clarity

"Last month" is ambiguous:

Calendar last month (February)?
Last 30 days?
Last 4 weeks?
Fiscal last month?

time_period_definitions:
  last_month:
    type: calendar_month
    timezone: US/Eastern
  last_30_days:
    type: rolling
    days: 30
    inclusive_today: false

AI systems use these definitions rather than guessing.

Performance Optimization

Indexing Strategies

Timestamp index: Required for all time-based queries
Composite indexes: (timestamp, metric_name) for filtered queries
Covering indexes: Include commonly selected columns
Partial indexes: Index only recent data that's queried frequently

Pre-aggregation

Compute common aggregations ahead of time:

-- Materialized daily summary
CREATE MATERIALIZED VIEW daily_summary AS
SELECT
  date,
  SUM(revenue) as total_revenue,
  COUNT(DISTINCT customer_id) as unique_customers
FROM transactions
GROUP BY date;

-- Refresh daily
REFRESH MATERIALIZED VIEW daily_summary;

Columnar Storage

Time series queries typically scan many rows but few columns. Columnar storage (Parquet, columnar database engines) provides significant performance benefits.

Compression

Time series data compresses well due to:

Sequential similarity (adjacent values often similar)
Repeating patterns (same metric names)
Predictable types (timestamps, numbers)

Use columnar formats and database compression features to reduce storage 10-100x.

Time series modeling requires balancing flexibility with query performance. Choose storage formats based on cardinality and query patterns. Implement downsampling for data lifecycle management. Document temporal semantics clearly in the semantic layer so AI systems understand time-based data correctly.