Semantic Layer - Database-Driven Architecture

🎯 Overview

DataTruth’s semantic layer is fully database-driven - all metrics, dimensions, and business logic are stored in PostgreSQL and managed through the web UI.

🗄️ Database Tables

1. calculated_metrics

Custom metrics created by users via the UI:

What: Business metrics like “Profit”, “Revenue”, “Profit Margin”
Contains: Metric name, formula, aggregation type, synonyms, filters
Example: profit = SUM(amount - cost) where status = ‘completed’

2. field_mappings

AI-generated friendly names for database fields:

What: Maps technical column names to business terms
Contains: Display name, description, synonyms, data type, format
Example: txn_amt → “Transaction Amount” (currency format)

3. semantic_layer_cache

Cached semantic configurations per connection:

What: Performance cache for frequently used configurations
Contains: Entities, relationships, metrics per connection
Purpose: Fast query planning and SQL generation

🎨 User Interface

Users manage the semantic layer through the SemanticLayer UI component:

Navigate: HomePage → Semantic Layer tab
Create Metrics: Define custom calculated metrics
Set Formulas: SQL formulas with aggregations
Add Synonyms: Alternative names for better search
Apply Filters: Default filters for metrics
Format Output: Currency, percentage, number formats

✅ Benefits of Database-Driven Approach

1. Dynamic & Real-time

Changes take effect immediately
No server restart required
No deployment needed

2. Multi-tenant Ready

Different metrics per connection
Per-user/per-team customization
Isolated configurations

3. No Code Required

Everything via web UI
No YAML editing
No file management

4. Version Controlled via Database

Audit trail of all changes
Who created/modified what
Rollback capabilities via backups

5. Scalable

Handles thousands of metrics
Efficient database queries
Cached for performance

🚫 What Was Removed

Previously, DataTruth used YAML files (config/semantic-layer/*.yaml) for:

Default metrics (revenue, profit)
Default dimensions (agent, client, company)
Join definitions
Synonyms

These are now removed - Everything is user-created in the database!

📊 How It Works

User creates metric in UI
        ↓
Saved to PostgreSQL (calculated_metrics table)
        ↓
API reads from database
        ↓
SQL Generator uses metric formula
        ↓
Query executed with custom metrics

🔧 For Developers

Load Metrics from Database:

from src.database.connection import get_internal_db_connection

# Get custom metrics for a connection
with get_internal_db_connection() as conn:
    metrics = conn.execute(
        "SELECT * FROM calculated_metrics WHERE connection_id = %s",
        (connection_id,)
    ).fetchall()

Semantic Layer Loader:

from src.semantic.loader import get_semantic_layer

# Returns empty structure (legacy compatibility)
semantic_layer = get_semantic_layer()

# All real data comes from database tables

🎓 Best Practices

Create Meaningful Names: Use business terms, not technical jargon
Add Synonyms: Include common variations (revenue = sales = income)
Document Formulas: Add clear descriptions
Use Filters Wisely: Default filters should be common use cases
Format Consistently: Use currency/percentage formats appropriately

For SaaS users: Simply use the Semantic Layer UI - no technical knowledge required!