Security

DataTruth is designed to be trusted with production databases. Here’s how it keeps your data safe.

Read-Only by Design
SQL Validation & Sandboxing
Role-Based Access Control
Credential Security
Audit Trail
Network Security
Data Residency
Compliance Readiness

Read-Only by Design

DataTruth never writes to your database. All connections are established with read-only credentials. Even if an AI-generated query attempted a DROP, DELETE, or UPDATE, the database user has no permission to execute it.

This is enforced at the infrastructure level — not just by application code.

SQL Validation & Sandboxing

Every query generated by the AI goes through a multi-layer validation pipeline before execution:

Intent classification — the query intent is checked against allowed operations
SQL parsing — the generated SQL is parsed to detect dangerous patterns (DML, DDL, multiple statements, etc.)
Schema validation — columns and tables are verified against the connected schema
Sandboxed execution — queries run with strict timeouts and row limits

If any check fails, the query is blocked and the user receives a safe error message.

Role-Based Access Control

DataTruth enforces access control at multiple levels:

Level	How It Works
Authentication	JWT tokens with configurable expiry; sessions invalidated on logout
Role permissions	Each role has a defined set of allowed actions and visible features
Data-level access	Admins can restrict which databases and schemas each role can query
API access	API tokens are scoped to specific operations

Credential Security

Database credentials are never stored in plain text:

Encrypted at rest using AES-256
Never exposed in logs, API responses, or the UI after initial entry
Accessible only by the DataTruth backend process

Audit Trail

Every action in DataTruth is logged:

User logins and logouts
Every natural language query and the SQL it generated
Admin actions (user creation, role changes, connection edits)
Data quality scan results

Audit logs are immutable and can be exported for compliance reviews.

Network Security

All traffic between browser and DataTruth is served over HTTPS (TLS 1.2+)
Nginx reverse proxy handles TLS termination
Internal service communication stays within the Docker network, not exposed externally
Rate limiting on all API endpoints to prevent abuse

Data Residency

DataTruth is designed for self-hosted deployment — your data never leaves your infrastructure. The AI model (GPT-4) receives only the SQL query and a subset of your schema metadata — never the actual row data from your database.

For fully air-gapped deployments, DataTruth supports local LLM models (e.g. Ollama) so no data leaves your network at all.

Compliance Readiness

DataTruth’s architecture supports compliance with:

SOC 2 Type II — audit trail, access control, encryption
GDPR — data stays in your infrastructure, user data deletion supported
HIPAA — available with appropriate deployment configuration

For security issues or responsible disclosure, please email security@datatruth.ai rather than opening a public GitHub issue.