User Activity Tracking & Personalized Suggestions
Overview
This system tracks user interactions (queries, chats, suggestion clicks, feedback) to provide personalized query suggestions based on learned patterns, user role, and preferences.
Features
🎯 Personalized Query Suggestions
- Learning from history: Analyzes user’s past queries to identify patterns
- Role-based suggestions: Leverages common patterns from users with same role
- Preference-aware: Respects user’s preferred/excluded metrics and dimensions
- Smart caching: Caches suggestions for 1 hour to reduce LLM API calls
- Hybrid approach: Combines learned patterns with LLM intelligence
📊 Activity Tracking
- Query logging: Records all queries with results and execution metadata
- Suggestion tracking: Logs which suggestions users click
- Feedback collection: Captures user ratings and comments
- Contextual metadata: Stores user role, goals, department at time of activity
🧠 Pattern Learning
- User-specific patterns: Learns individual query templates
- Role-based patterns: Identifies common queries per role (analyst, trader, executive, etc.)
- Template extraction: Generalizes queries (e.g., “top 10 stocks” → “top {N} {entity}”)
- Success tracking: Monitors query success rate and response times
- Auto-updating: Patterns update automatically as users make queries
⚙️ User Preferences
- Preferred query types: comparison, trend, ranking, aggregation
- Preferred metrics: Prioritize specific metrics in suggestions
- Excluded metrics: Remove unwanted metrics from suggestions
- Preferred dimensions: Prioritize specific dimensions
- Advanced queries toggle: Show/hide complex multi-step queries
- Suggestion count: Configure how many suggestions to show (1-12)
Database Schema
Tables Created (Migration 004)
1. user_activity
Stores all user interactions for learning.
| Column | Type | Description |
|---|---|---|
| id | SERIAL | Primary key |
| user_id | VARCHAR(100) | References users table |
| activity_type | VARCHAR(50) | ‘query’, ‘chat’, ‘suggestion_click’, ‘feedback’ |
| query_text | TEXT | User’s query or message |
| response_data | JSONB | Full response with SQL, results, metadata |
| suggestion_clicked | TEXT | Which suggestion was clicked |
| feedback_rating | INTEGER | Rating 1-5 if feedback |
| metadata | JSONB | Context: role, preferences, goals, etc. |
| created_at | TIMESTAMP | When activity occurred |
Indexes:
idx_user_activity_user_idon user_ididx_user_activity_typeon activity_typeidx_user_activity_created_aton created_at DESCidx_user_activity_user_typeon (user_id, activity_type)
2. query_patterns
Learned query templates aggregated by user or role.
| Column | Type | Description |
|---|---|---|
| id | SERIAL | Primary key |
| pattern_type | VARCHAR(50) | ‘role_based’, ‘user_specific’, or ‘global’ |
| target_id | VARCHAR(100) | user_id or role name |
| query_template | TEXT | Generalized query template |
| frequency | INTEGER | How often this pattern appears |
| success_rate | FLOAT | Percentage of successful queries (0-1) |
| avg_response_time | FLOAT | Average response time in seconds |
| last_used | TIMESTAMP | Last time pattern was used |
| metadata | JSONB | Pattern details (metrics, dimensions used) |
| created_at | TIMESTAMP | When pattern was created |
| updated_at | TIMESTAMP | Last update time |
Indexes:
idx_query_patterns_typeon pattern_typeidx_query_patterns_targeton target_ididx_query_patterns_frequencyon frequency DESCidx_query_patterns_successon success_rate DESC
3. suggestion_cache
Caches generated suggestions to reduce LLM costs.
| Column | Type | Description |
|---|---|---|
| id | SERIAL | Primary key |
| user_id | VARCHAR(100) | References users table |
| context_hash | VARCHAR(64) | MD5 hash of (user_id + role + partial_query) |
| suggestions | JSONB | Array of suggestion objects |
| generated_at | TIMESTAMP | When suggestions were generated |
| expires_at | TIMESTAMP | Expiration time (default +1 hour) |
| hit_count | INTEGER | How many times cache was used |
Indexes:
idx_suggestion_cache_user_idon user_ididx_suggestion_cache_hashon context_hashidx_suggestion_cache_expireson expires_at
4. user_suggestion_preferences
User-specific preferences for suggestions.
| Column | Type | Description |
|---|---|---|
| user_id | VARCHAR(100) | Primary key, references users |
| preferred_query_types | JSONB | Array of preferred types |
| excluded_metrics | JSONB | Metrics to exclude |
| preferred_metrics | JSONB | Metrics to prioritize |
| preferred_dimensions | JSONB | Dimensions to prioritize |
| show_advanced_queries | BOOLEAN | Show complex queries |
| max_suggestions | INTEGER | How many suggestions (default 6) |
| updated_at | TIMESTAMP | Last update time |
API Endpoints
Personalized Suggestions
POST /api/v1/connections/{connection_id}/suggestions/personalized
Get personalized query suggestions based on user history.
Parameters:
connection_id(path): Database connection IDpartial_query(query, optional): User’s current inputmax_suggestions(query, default=6): Maximum suggestions
Response:
{
"connection_id": "my-database",
"partial_query": "show me profit",
"suggestions": [
{
"text": "Show me quarterly profits from last 5 years",
"type": "complete",
"description": "Time-series profit analysis",
"icon": "📈"
},
{
"text": "Top 10 products by profit margin",
"type": "ranking",
"description": "Highest profit margins",
"icon": "🏆"
}
],
"source": "llm_with_patterns",
"user_patterns": [
"show me {metric} for {timeframe}",
"top {N} {entity} by {metric}"
],
"role_patterns": [
"compare {metric} across {dimension}"
],
"count": 6
}
Sources:
cached: Returned from cache (fastest, no LLM cost)llm_with_patterns: Generated with LLM using learned patternsllm_fallback: Basic LLM without patterns (error recovery)
Activity Logging
POST /api/v1/activity/log
Log user activity for learning and personalization.
Request Body:
{
"activity_type": "query",
"query_text": "Show me quarterly profits from last 5 years",
"response_data": {
"data": [...],
"metadata": {
"generated_sql": "SELECT ...",
"execution_time_ms": 245,
"row_count": 20
}
},
"metadata": {
"connection_id": "my-database"
}
}
Activity Types:
query: User executed a querychat: User sent a chat messagesuggestion_click: User clicked a suggestionfeedback: User provided rating/feedback
Response:
{
"success": true,
"activity_id": 12345,
"activity_type": "query"
}
GET /api/v1/activity/history
Get user’s activity history.
Parameters:
activity_type(query, optional): Filter by typelimit(query, default=100): Max activities to return
Response:
{
"user_id": "analyst001",
"activities": [
{
"id": 12345,
"user_id": "analyst001",
"activity_type": "query",
"query_text": "Show me quarterly profits",
"created_at": "2025-12-29T10:30:00Z"
}
],
"count": 15
}
Pattern Management
GET /api/v1/activity/patterns
Get learned query patterns for current user.
Parameters:
limit(query, default=10): Max patterns per type
Response:
{
"user_id": "analyst001",
"user_patterns": [
{
"id": 1,
"pattern_type": "user_specific",
"target_id": "analyst001",
"query_template": "show me {metric} for {timeframe}",
"frequency": 25,
"success_rate": 0.96,
"avg_response_time": 0.245
}
],
"role_patterns": [
{
"id": 2,
"pattern_type": "role_based",
"target_id": "analyst",
"query_template": "compare {metric} across {dimension}",
"frequency": 150,
"success_rate": 0.92
}
],
"user_pattern_count": 5,
"role_pattern_count": 8
}
Preference Management
GET /api/v1/activity/preferences
Get user’s suggestion preferences.
Response:
{
"user_id": "analyst001",
"preferred_query_types": ["comparison", "trend", "ranking"],
"excluded_metrics": ["internal_id", "debug_field"],
"preferred_metrics": ["profit", "revenue", "margin"],
"preferred_dimensions": ["quarter", "product", "region"],
"show_advanced_queries": false,
"max_suggestions": 6,
"updated_at": "2025-12-29T10:00:00Z"
}
PUT /api/v1/activity/preferences
Update user’s suggestion preferences.
Request Body:
{
"preferred_query_types": ["comparison", "trend"],
"excluded_metrics": ["internal_id"],
"max_suggestions": 8
}
Response:
{
"success": true,
"preferences": {
"user_id": "analyst001",
"preferred_query_types": ["comparison", "trend"],
...
}
}
Implementation Guide
1. Run Database Migration
# Run migration to create tables
psql -U <user> -d datatruth_internal -f migrations/004_add_user_activity_tracking.sql
# Or via Docker
docker exec -i <postgres-container> psql -U datatruth -d datatruth_internal < migrations/004_add_user_activity_tracking.sql
2. Add API Routes
The routes are defined in /src/activity/routes_to_add.py.
To integrate:
- Copy the route definitions from
routes_to_add.py - Add them to
/src/api/routes.pyafter the existing suggestions endpoint (around line 2985) - Ensure imports at the top of routes.py include:
from src.activity import get_activity_logger, get_pattern_analyzer, ActivityType
3. Update Query Endpoint to Log Activity
Modify the /query endpoint in routes.py to automatically log queries:
@router.post("/query")
async def execute_natural_language_query(...):
# ... existing code ...
# After successful query execution, log it
try:
from src.activity import get_activity_logger
from src.user import get_user_manager
logger_instance = get_activity_logger()
user_manager = get_user_manager()
user_profile = user_manager.get_user(current_user.get("user_id", current_user["username"]))
logger_instance.log_query(
user_id=current_user.get("user_id", current_user["username"]),
query_text=natural_query,
response_data={
"data": result.rows[:10], # Sample rows
"metadata": response["metadata"]
},
user_role=user_profile.role.value if user_profile else None,
user_goals=user_profile.goals if user_profile else None
)
except Exception as e:
logger.warning(f"Failed to log query activity: {e}")
return response
4. Update Frontend to Use Personalized Suggestions
In your chat/query input component:
// Use personalized endpoint instead of regular suggestions
const fetchSuggestions = async (partialQuery: string) => {
try {
const response = await fetch(
`http://localhost:8000/api/v1/connections/${connectionId}/suggestions/personalized?partial_query=${encodeURIComponent(partialQuery)}&max_suggestions=6`,
{
method: 'POST',
headers: {
'Authorization': `Bearer ${token}`,
'Content-Type': 'application/json'
}
}
);
const data = await response.json();
setSuggestions(data.suggestions);
// Optionally show source and patterns
if (data.source === 'cached') {
console.log('💾 Using cached suggestions');
} else if (data.user_patterns.length > 0) {
console.log('🧠 Personalized with patterns:', data.user_patterns);
}
} catch (error) {
console.error('Failed to fetch suggestions:', error);
}
};
Log suggestion clicks:
const handleSuggestionClick = async (suggestion: string) => {
setQuery(suggestion);
// Log the click
try {
await fetch('http://localhost:8000/api/v1/activity/log', {
method: 'POST',
headers: {
'Authorization': `Bearer ${token}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({
activity_type: 'suggestion_click',
suggestion_clicked: suggestion
})
});
} catch (error) {
console.error('Failed to log suggestion click:', error);
}
};
Log query execution:
const handleQuerySubmit = async () => {
const response = await executeQuery(query);
// Log the query
try {
await fetch('http://localhost:8000/api/v1/activity/log', {
method: 'POST',
headers: {
'Authorization': `Bearer ${token}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({
activity_type: 'query',
query_text: query,
response_data: {
data: response.data,
metadata: response.metadata
}
})
});
} catch (error) {
console.error('Failed to log query:', error);
}
};
5. Add Preferences UI (Optional)
Create a preferences panel where users can customize suggestions:
const PreferencesPanel = () => {
const [prefs, setPrefs] = useState<any>(null);
useEffect(() => {
fetchPreferences();
}, []);
const fetchPreferences = async () => {
const response = await fetch('http://localhost:8000/api/v1/activity/preferences', {
headers: { 'Authorization': `Bearer ${token}` }
});
const data = await response.json();
setPrefs(data);
};
const updatePreferences = async (updates: any) => {
await fetch('http://localhost:8000/api/v1/activity/preferences', {
method: 'PUT',
headers: {
'Authorization': `Bearer ${token}`,
'Content-Type': 'application/json'
},
body: JSON.stringify(updates)
});
fetchPreferences();
};
return (
<div>
<h3>Suggestion Preferences</h3>
{/* Add form controls for preferences */}
</div>
);
};
How It Works
Pattern Learning Flow
- User makes a query: “Show me quarterly profits from last 5 years”
- Activity logged: Stored in
user_activitywith role, goals, response metadata - Pattern extracted: Generalized to “Show me {metric} for {timeframe}”
- Pattern stored/updated: In
query_patternswith frequency and success rate - Next time: Suggestion engine uses this pattern for similar queries
Suggestion Generation Flow
- User types partial query: “show me prof…”
- Check cache: Hash (user_id + role + query) and lookup in
suggestion_cache- If found and not expired: Return cached suggestions ✅
- Get learned patterns:
- Query
query_patternsfor user_specific patterns - Query
query_patternsfor role_based patterns
- Query
- Get preferences: Load from
user_suggestion_preferences - Filter metrics/dimensions: Apply preferred/excluded lists
- Generate with LLM: Send to GPT-4o-mini with:
- Available metrics/dimensions
- User’s learned patterns
- Role’s common patterns
- Partial query
- Cache results: Store in
suggestion_cachewith 1-hour expiry - Return personalized suggestions: With source indicator
Template Generalization
Queries are generalized to identify patterns:
| Original Query | Generalized Template |
|---|---|
| “Top 10 stocks by volume” | “Top {N} {entity} by {metric}” |
| “Show me profit for Q1 2024” | “Show me {metric} for {quarter} {year}” |
| “Compare revenue across regions” | “Compare {metric} across {dimension}” |
| “Average price by product category” | “Average {metric} by {dimension}” |
Numbers, dates, quarters, months are replaced with placeholders.
Performance Optimizations
Caching Strategy
- 1-hour cache for identical contexts (user + role + query)
- Hit count tracking to identify popular suggestions
- Auto-cleanup of expired entries on insert
Cost Reduction
- Cache hit rate target: 40-60% (reduces LLM calls by half)
- Learned patterns first: Use templates before LLM when possible
- GPT-4o-mini: Most affordable model (~$0.15 per 1M tokens)
- Token limits: Max 400 tokens per suggestion request
Background Processing
- Pattern analysis triggered on query log (synchronous for now)
- Future enhancement: Use Celery/RQ for async pattern updates
- Batch processing: Analyze patterns for all users nightly
Monitoring & Analytics
Key Metrics to Track
-- Cache hit rate
SELECT
COUNT(*) FILTER (WHERE hit_count > 0) * 100.0 / COUNT(*) as cache_hit_rate_percent
FROM suggestion_cache
WHERE generated_at >= NOW() - INTERVAL '24 hours';
-- Most common patterns by role
SELECT
target_id as role,
query_template,
frequency,
success_rate
FROM query_patterns
WHERE pattern_type = 'role_based'
ORDER BY frequency DESC
LIMIT 10;
-- User activity breakdown
SELECT
activity_type,
COUNT(*) as count,
COUNT(DISTINCT user_id) as unique_users
FROM user_activity
WHERE created_at >= NOW() - INTERVAL '7 days'
GROUP BY activity_type;
-- Average suggestions per user
SELECT
user_id,
COUNT(*) as total_queries,
COUNT(suggestion_clicked) as clicked_suggestions,
COUNT(suggestion_clicked) * 100.0 / NULLIF(COUNT(*), 0) as click_through_rate
FROM user_activity
WHERE activity_type IN ('query', 'suggestion_click')
AND created_at >= NOW() - INTERVAL '30 days'
GROUP BY user_id
ORDER BY click_through_rate DESC;
Future Enhancements
Planned Features
- Collaborative filtering: “Users like you also queried…”
- Temporal patterns: Suggest queries based on time of day/week
- Context awareness: Consider previous queries in session
- A/B testing: Test different suggestion strategies
- Feedback loop: Use ratings to refine patterns
- Export patterns: Download learned patterns for analysis
- Pattern similarity: Find users with similar query patterns
- Anomaly detection: Flag unusual query patterns
- Multi-language support: Localize suggestions
Integration Opportunities
- Insights system: Suggest insights based on query patterns
- Data quality: Identify frequently failed query patterns
- Performance optimization: Auto-index columns from common patterns
- Training datasets: Use patterns to generate synthetic queries
Troubleshooting
Patterns not updating
- Check if activity logging is working:
SELECT COUNT(*) FROM user_activity; - Verify pattern analyzer runs: Check logs for “Updated patterns for X users”
- Manually trigger: Call
/activity/patternsendpoint
Suggestions not personalized
- Verify user has activity history:
SELECT COUNT(*) FROM user_activity WHERE user_id = '...'; - Check if patterns exist:
SELECT * FROM query_patterns WHERE target_id = '...'; - Look for cache hits:
SELECT * FROM suggestion_cache WHERE user_id = '...';
Cache not working
- Check for context hash collisions:
SELECT context_hash, COUNT(*) FROM suggestion_cache GROUP BY context_hash HAVING COUNT(*) > 1; - Verify expiration:
SELECT COUNT(*) FROM suggestion_cache WHERE expires_at < NOW(); - Clear cache:
DELETE FROM suggestion_cache;
High LLM costs
- Check cache hit rate (target: >40%)
- Increase cache expiration time
- Reduce max_suggestions
- Use autocomplete for short queries
Summary
This system provides intelligent, personalized query suggestions by:
- Tracking all user interactions in structured format
- Learning query patterns at individual and role levels
- Caching suggestions to reduce costs and latency
- Respecting user preferences for metrics/dimensions
- Combining learned patterns with LLM intelligence
- Adapting continuously as users interact with the system
The result is a smarter, more efficient query experience that learns from each user’s unique workflow while leveraging collective intelligence from similar roles.