Live Connection vs Extract: Choosing the Right Semantic Layer Integration
Learn when to use live connections versus extracts when connecting BI tools to semantic layers, including performance, governance, and freshness considerations.
When connecting BI tools to a semantic layer, organizations must choose between live connections and extracts (imports). This decision impacts performance, data freshness, governance, and user experience. Understanding the tradeoffs enables informed architecture decisions.
This guide compares live connections and extracts for semantic layer integration, helping you choose the right approach for different use cases with platforms like Codd AI.
Understanding the Options
Live Connections
With live connections, every BI query executes against the semantic layer in real time:
User Action → BI Tool → Query → Semantic Layer → Warehouse → Results
Characteristics:
- Data reflects source freshness
- Each query executes on demand
- Governance applied every time
- Performance depends on source systems
Extracts (Imports)
With extracts, the BI tool pulls data from the semantic layer on a schedule and stores it locally:
Scheduled Extract: Semantic Layer → BI Tool Storage
User Action → BI Tool → Query → Local Storage → Results
Characteristics:
- Data freshness limited to extract schedule
- Queries execute against local copy
- Governance applied at extract time
- Performance depends on BI tool storage
Comparison Matrix
| Factor | Live Connection | Extract |
|---|---|---|
| Data freshness | Real-time (source dependent) | Schedule dependent |
| Query performance | Variable | Consistently fast |
| Governance | Every query | Extract time only |
| Storage | No duplication | Copies data |
| Complex calculations | May be slow | Pre-computed |
| Offline access | Requires connectivity | Works offline |
| Semantic layer load | Continuous | Batch during extract |
When to Use Live Connections
Scenario 1: Operational Dashboards
Dashboards monitoring real-time operations benefit from live connections:
Examples:
- Order fulfillment status
- Customer support queue
- Inventory levels
- Financial transactions
Why live works:
- Users need current data
- Quick, focused queries
- Freshness outweighs performance
- Decisions based on now, not yesterday
Scenario 2: Strict Governance Requirements
When governance must apply to every data access:
Examples:
- Regulatory-sensitive data
- Row-level security requirements
- Audit trail needs
- Dynamic permission changes
Why live works:
- Security checked each query
- No stale permissions in cached data
- Full audit capability
- Real-time access control updates
Scenario 3: Rapidly Changing Metrics
When metric definitions change frequently:
Examples:
- A/B testing metrics
- Promotion performance
- Campaign tracking
- New product launches
Why live works:
- Definition changes apply immediately
- No need to refresh extracts
- Consistent across all queries
- Reduced risk of stale definitions
Scenario 4: Ad-Hoc Exploration
When users explore data interactively:
Examples:
- Data analyst investigation
- Executive questions
- Incident analysis
- Opportunity discovery
Why live works:
- No waiting for extract schedules
- Questions answered immediately
- Access to latest data
- Flexible query patterns
When to Use Extracts
Scenario 1: Complex Analytical Dashboards
Dashboards with heavy calculations benefit from extracts:
Examples:
- Year-over-year comparisons
- Cohort analysis
- Trend calculations
- Statistical aggregations
Why extract works:
- Complex queries pre-computed
- Fast user experience
- Reduced semantic layer load
- Consistent response times
Scenario 2: Large Data Volumes
When querying large datasets:
Examples:
- Multi-year historical analysis
- Transaction-level detail
- High-cardinality dimensions
- Full customer datasets
Why extract works:
- Data compressed and optimized locally
- Faster aggregation performance
- Reduced network transfer
- Better handling of large result sets
Scenario 3: Offline or Disconnected Use
When connectivity is unreliable:
Examples:
- Mobile dashboards
- Field sales presentations
- Conference demonstrations
- Disconnected environments
Why extract works:
- Works without network
- Predictable availability
- No dependency on source systems
- Consistent experience anywhere
Scenario 4: Scheduled Reports
When reports run on fixed schedules:
Examples:
- Daily email reports
- Weekly management decks
- Monthly board packages
- Quarterly business reviews
Why extract works:
- Data consistent with schedule
- Predictable performance
- Reduced load during generation
- Reports reflect point-in-time snapshot
Hybrid Approaches
Pattern 1: Live for Metrics, Extract for Details
Use live connections for key metrics and extracts for supporting detail:
Executive Dashboard:
├── KPI Tiles → Live Connection (always current)
├── Trend Charts → Extract (historical, refreshed daily)
└── Detail Tables → Extract (large volume, refreshed nightly)
Pattern 2: Live Primary, Extract Fallback
Default to live with automatic fallback:
Query Flow:
1. Attempt live query
2. If timeout/error → fall back to extract
3. Display data with freshness indicator
Pattern 3: Real-Time Summary, Batch Detail
Aggregate metrics live, detail on schedule:
Operational Dashboard:
├── Current Order Count → Live
├── Revenue Today → Live
├── Order List → Extract (refreshed hourly)
└── Customer Details → Extract (refreshed daily)
Pattern 4: Time-Based Strategy
Different approaches based on time window:
Time-Based Logic:
├── Last 24 hours → Live connection
├── Last 7-30 days → Daily extract
└── Historical (30+ days) → Weekly extract
Implementation Considerations
Semantic Layer Configuration
Configure the semantic layer to support both patterns:
For live connections:
- Optimize query performance
- Implement caching for repeated queries
- Set appropriate timeout values
- Monitor query load
For extracts:
- Define extract-optimized views
- Pre-aggregate common patterns
- Support incremental extract
- Provide extract monitoring APIs
BI Tool Configuration
Configure BI tools appropriately:
Tableau:
- Live connections via published data sources
- Extracts with scheduled refresh
- Hybrid via multiple data sources
Power BI:
- DirectQuery for live
- Import mode for extracts
- Composite models for hybrid
Looker:
- PDTs for extract-like behavior
- Direct queries for live
- Caching configuration
Performance Optimization
Optimize each pattern differently:
Live connection optimization:
- Semantic layer query caching
- Connection pooling
- Query timeouts
- Index optimization
Extract optimization:
- Incremental refresh
- Partitioned extracts
- Compressed storage
- Parallel processing
Governance Implications
Live Connection Governance
Every query passes through governance:
Advantages:
- Real-time access control
- Complete audit trail
- Immediate permission changes
- Consistent policy enforcement
Considerations:
- Higher semantic layer load
- Query-time latency for security checks
- Dependency on semantic layer availability
Extract Governance
Governance applied at extract time:
Advantages:
- Governance overhead occurs in batch
- No runtime security delays
- Predictable performance
Considerations:
- Permission changes not immediate
- Extract may contain stale data
- Need to manage extract-level security
- Potential for ungoverned copies
Hybrid Governance
Combine approaches based on sensitivity:
Data Classification:
├── Highly Sensitive → Live only, full audit
├── Business Critical → Live preferred, extract acceptable
├── General Analytics → Extract default, live available
└── Public Data → Extract, minimal governance
Codd AI Platform
Codd AI supports both connection patterns:
- Optimized live query execution
- Extract API for scheduled pulls
- Intelligent caching between patterns
- Consistent governance across modes
Organizations can implement hybrid architectures with unified governance regardless of connection pattern.
Decision Framework
Use this framework to choose:
Step 1: Assess Data Requirements
| Question | Live if... | Extract if... |
|---|---|---|
| How fresh must data be? | Minutes/hours | Days acceptable |
| How complex are queries? | Simple aggregations | Multi-table joins |
| How large is the data? | Smaller datasets | Large historical |
| How often do users query? | Continuous | Periodic |
Step 2: Assess Technical Constraints
| Question | Live if... | Extract if... |
|---|---|---|
| Network reliability? | Stable connectivity | Unreliable/offline |
| Semantic layer capacity? | Can handle load | Limited capacity |
| BI tool capabilities? | Supports live well | Better extract support |
| Response time needs? | Seconds acceptable | Sub-second required |
Step 3: Assess Governance Requirements
| Question | Live if... | Extract if... |
|---|---|---|
| Security sensitivity? | High, audit required | Lower sensitivity |
| Permission volatility? | Frequent changes | Stable permissions |
| Compliance requirements? | Real-time enforcement | Batch acceptable |
Best Practices Summary
- Default to live for operational dashboards and governed data
- Use extracts for historical analysis and complex calculations
- Implement hybrid for dashboards with mixed requirements
- Monitor performance and adjust patterns based on actual usage
- Document decisions so users understand data freshness
- Display freshness indicators on dashboards using extracts
- Review periodically as data volumes and requirements evolve
The choice between live connections and extracts is not binary - successful semantic layer integrations often use both patterns strategically, matching data access methods to specific use case requirements.
Questions
Live connection means queries execute against the source in real time, but the underlying data freshness depends on the source system. If the semantic layer queries a data warehouse updated hourly, live connections reflect that hourly freshness.