Time Series Data Modeling: Patterns for Temporal Analytics
Time series data - measurements and events ordered by time - requires specialized modeling techniques. Learn the patterns for storing, querying, and analyzing time series data for analytics and AI applications.
Time series data consists of data points indexed in time order - typically measurements or events captured at regular or irregular intervals. Stock prices, server metrics, sensor readings, daily sales, and user activity logs are all time series data. The temporal ordering is not just metadata but the essential structure around which analysis revolves.
Modeling time series data correctly enables powerful temporal analytics - trend detection, seasonality analysis, anomaly identification, and forecasting. Poor time series modeling leads to inefficient queries, incorrect temporal calculations, and AI systems that misinterpret temporal patterns.
Time Series Characteristics
Regular vs. Irregular Intervals
Regular (fixed interval): Measurements at consistent time intervals
- Server metrics every 60 seconds
- Daily closing stock prices
- Hourly weather readings
Irregular (variable interval): Events occurring at variable times
- User login events
- Sales transactions
- Support ticket creation
Regular series enable simpler aggregation and gap detection. Irregular series require more careful handling of time windows.
Single vs. Multiple Series
Univariate: One measurement over time
- Total website traffic per hour
Multivariate: Multiple related measurements over time
- CPU usage, memory usage, and network traffic together
Multiple independent series: Many separate time series in one model
- Metrics for each of 10,000 servers
- Sales for each of 500 products
Density and Cardinality
High cardinality: Many distinct series (one per user, per sensor) High density: Frequent measurements within each series High volume: High cardinality * high density = massive data volumes
These characteristics drive storage and query optimization decisions.
Core Modeling Patterns
Wide Table Format
Each timestamp is a row, each series is a column:
CREATE TABLE daily_metrics (
date DATE PRIMARY KEY,
revenue DECIMAL(12,2),
orders INT,
visitors INT,
signups INT
);
| date | revenue | orders | visitors | signups |
|---|---|---|---|---|
| 2024-02-01 | 12500 | 145 | 8900 | 52 |
| 2024-02-02 | 11200 | 132 | 8100 | 48 |
Pros:
- Intuitive structure
- Easy to query multiple metrics together
- Works well for dashboards
Cons:
- Schema changes needed for new metrics
- Doesn't scale to high cardinality
- Sparse series waste storage
Narrow (Long) Table Format
Each measurement is a row:
CREATE TABLE metrics (
timestamp TIMESTAMP,
metric_name VARCHAR(100),
metric_value DECIMAL(20,6),
PRIMARY KEY (timestamp, metric_name)
);
| timestamp | metric_name | metric_value |
|---|---|---|
| 2024-02-01 | revenue | 12500 |
| 2024-02-01 | orders | 145 |
| 2024-02-01 | visitors | 8900 |
Pros:
- Flexible - add new metrics without schema changes
- Handles sparse data efficiently
- Scales to high cardinality
Cons:
- Queries across metrics require pivoting
- More rows = more storage overhead per value
- Cross-metric calculations require self-joins
Entity-Metric Format
Combine entity identification with narrow format:
CREATE TABLE entity_metrics (
timestamp TIMESTAMP,
entity_type VARCHAR(50),
entity_id VARCHAR(100),
metric_name VARCHAR(100),
metric_value DECIMAL(20,6),
PRIMARY KEY (timestamp, entity_type, entity_id, metric_name)
);
| timestamp | entity_type | entity_id | metric_name | metric_value |
|---|---|---|---|---|
| 2024-02-01 10:00 | server | web-01 | cpu_usage | 45.2 |
| 2024-02-01 10:00 | server | web-01 | memory_usage | 78.5 |
| 2024-02-01 10:00 | server | web-02 | cpu_usage | 32.1 |
This handles both multiple entities and multiple metrics per entity.
Partitioned Time Series
Partition by time for large-scale data:
CREATE TABLE sensor_readings (
reading_time TIMESTAMP,
sensor_id VARCHAR(50),
temperature DECIMAL(5,2),
humidity DECIMAL(5,2),
pressure DECIMAL(7,2)
)
PARTITION BY RANGE (reading_time);
-- Monthly partitions
CREATE TABLE sensor_readings_2024_01 PARTITION OF sensor_readings
FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
CREATE TABLE sensor_readings_2024_02 PARTITION OF sensor_readings
FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');
Benefits:
- Query pruning - only scan relevant time ranges
- Efficient data lifecycle - drop old partitions
- Parallel loading into different partitions
Temporal Query Patterns
Point-in-Time Queries
Get values at specific timestamps:
SELECT metric_name, metric_value
FROM metrics
WHERE timestamp = '2024-02-01 00:00:00';
For irregular series, get closest value:
SELECT metric_value
FROM metrics
WHERE metric_name = 'price'
AND timestamp <= '2024-02-01 12:00:00'
ORDER BY timestamp DESC
LIMIT 1;
Time Range Aggregations
Aggregate over time windows:
-- Daily aggregates from minute-level data
SELECT
DATE_TRUNC('day', timestamp) as day,
AVG(value) as avg_value,
MAX(value) as max_value,
MIN(value) as min_value
FROM metrics
WHERE metric_name = 'cpu_usage'
AND timestamp >= '2024-02-01'
AND timestamp < '2024-02-08'
GROUP BY DATE_TRUNC('day', timestamp)
ORDER BY day;
Time Window Functions
Calculations relative to each point in time:
-- 7-day rolling average
SELECT
date,
value,
AVG(value) OVER (
ORDER BY date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) as rolling_7_day_avg
FROM daily_metrics;
-- Period-over-period comparison
SELECT
date,
value,
LAG(value, 7) OVER (ORDER BY date) as value_7_days_ago,
value - LAG(value, 7) OVER (ORDER BY date) as week_over_week_change
FROM daily_metrics;
Gap Detection and Filling
Identify missing data points:
-- Find gaps in hourly data
WITH expected_times AS (
SELECT generate_series(
'2024-02-01 00:00:00'::timestamp,
'2024-02-07 23:00:00'::timestamp,
'1 hour'::interval
) as expected_time
),
actual_times AS (
SELECT DISTINCT DATE_TRUNC('hour', timestamp) as actual_time
FROM metrics
WHERE metric_name = 'requests'
)
SELECT expected_time
FROM expected_times e
LEFT JOIN actual_times a ON e.expected_time = a.actual_time
WHERE a.actual_time IS NULL;
Fill gaps with interpolation or carry-forward:
-- Forward-fill missing values
SELECT
t.time_point,
COALESCE(
m.value,
LAST_VALUE(m.value IGNORE NULLS) OVER (ORDER BY t.time_point)
) as value_filled
FROM time_spine t
LEFT JOIN metrics m ON t.time_point = m.timestamp AND m.metric_name = 'price';
Time Spine Tables
A time spine (or date dimension) provides a complete set of time points for joins:
CREATE TABLE time_spine (
time_point TIMESTAMP PRIMARY KEY,
date DATE,
hour INT,
day_of_week VARCHAR(10),
month INT,
quarter INT,
year INT,
is_weekend BOOLEAN,
is_holiday BOOLEAN
);
-- Populated with every minute/hour/day in your data range
INSERT INTO time_spine
SELECT
ts,
ts::date,
EXTRACT(HOUR FROM ts),
...
FROM generate_series(
'2020-01-01'::timestamp,
'2030-12-31'::timestamp,
'1 day'::interval
) ts;
Time spines enable:
- Gap detection and filling
- Consistent time-based aggregation
- Joining irregular events to regular time grids
- Calendar-aware calculations
Downsampling and Retention
High-frequency data accumulates rapidly. Manage with tiered retention:
retention_policy:
raw_data: 7_days # Full second-level detail
minute_aggregates: 30_days # 1-minute rollups
hourly_aggregates: 1_year # Hourly summaries
daily_aggregates: 5_years # Daily summaries
Implement with materialized aggregates:
-- Hourly aggregate table
CREATE TABLE metrics_hourly AS
SELECT
DATE_TRUNC('hour', timestamp) as hour,
metric_name,
AVG(metric_value) as avg_value,
MAX(metric_value) as max_value,
MIN(metric_value) as min_value,
COUNT(*) as sample_count
FROM metrics
GROUP BY DATE_TRUNC('hour', timestamp), metric_name;
AI Analytics Considerations
Time Zone Handling
Time zones cause subtle errors. Be explicit about:
- What timezone raw data is stored in (prefer UTC)
- What timezone business users expect in reports
- How to handle entities across time zones
Document in semantic layer:
column:
name: event_timestamp
storage_timezone: UTC
display_timezone: user_local
business_day_cutoff: "00:00 US/Eastern"
Seasonality Awareness
AI systems need to understand cyclical patterns:
- Daily patterns (peak hours, overnight lulls)
- Weekly patterns (weekday vs. weekend)
- Monthly patterns (end-of-month spikes)
- Annual patterns (holidays, seasons)
metric:
name: daily_active_users
seasonality:
weekly: significant # Weekend dips expected
annual: moderate # Summer slowdown, holiday spikes
comparison_recommendation: same_day_previous_week
Timestamp Precision
Document expected precision and how AI should handle queries:
table:
name: server_metrics
timestamp_precision: second
expected_interval: 60_seconds
gap_handling: interpolate_linear
This prevents AI from misinterpreting gaps or generating unrealistic precision in results.
Period Boundary Clarity
"Last month" is ambiguous:
- Calendar last month (February)?
- Last 30 days?
- Last 4 weeks?
- Fiscal last month?
time_period_definitions:
last_month:
type: calendar_month
timezone: US/Eastern
last_30_days:
type: rolling
days: 30
inclusive_today: false
AI systems use these definitions rather than guessing.
Performance Optimization
Indexing Strategies
- Timestamp index: Required for all time-based queries
- Composite indexes: (timestamp, metric_name) for filtered queries
- Covering indexes: Include commonly selected columns
- Partial indexes: Index only recent data that's queried frequently
Pre-aggregation
Compute common aggregations ahead of time:
-- Materialized daily summary
CREATE MATERIALIZED VIEW daily_summary AS
SELECT
date,
SUM(revenue) as total_revenue,
COUNT(DISTINCT customer_id) as unique_customers
FROM transactions
GROUP BY date;
-- Refresh daily
REFRESH MATERIALIZED VIEW daily_summary;
Columnar Storage
Time series queries typically scan many rows but few columns. Columnar storage (Parquet, columnar database engines) provides significant performance benefits.
Compression
Time series data compresses well due to:
- Sequential similarity (adjacent values often similar)
- Repeating patterns (same metric names)
- Predictable types (timestamps, numbers)
Use columnar formats and database compression features to reduce storage 10-100x.
Time series modeling requires balancing flexibility with query performance. Choose storage formats based on cardinality and query patterns. Implement downsampling for data lifecycle management. Document temporal semantics clearly in the semantic layer so AI systems understand time-based data correctly.
Questions
Time series data is specifically ordered by time, with analysis focusing on temporal patterns - trends, seasonality, correlations over time. Transactional data may have timestamps but is analyzed primarily by business dimensions (customer, product, region). The same data can be both - sales transactions are transactional data but can also be analyzed as a time series.