Security
DataTruth is designed to be trusted with production databases. Here’s how it keeps your data safe.
Table of Contents
- Read-Only by Design
- SQL Validation & Sandboxing
- Role-Based Access Control
- Credential Security
- Audit Trail
- Network Security
- Data Residency
- Compliance Readiness
Read-Only by Design
DataTruth never writes to your database. All connections are established with read-only credentials. Even if an AI-generated query attempted a DROP, DELETE, or UPDATE, the database user has no permission to execute it.
This is enforced at the infrastructure level — not just by application code.
SQL Validation & Sandboxing
Every query generated by the AI goes through a multi-layer validation pipeline before execution:
- Intent classification — the query intent is checked against allowed operations
- SQL parsing — the generated SQL is parsed to detect dangerous patterns (DML, DDL, multiple statements, etc.)
- Schema validation — columns and tables are verified against the connected schema
- Sandboxed execution — queries run with strict timeouts and row limits
If any check fails, the query is blocked and the user receives a safe error message.
Role-Based Access Control
DataTruth enforces access control at multiple levels:
| Level | How It Works |
|---|---|
| Authentication | JWT tokens with configurable expiry; sessions invalidated on logout |
| Role permissions | Each role has a defined set of allowed actions and visible features |
| Data-level access | Admins can restrict which databases and schemas each role can query |
| API access | API tokens are scoped to specific operations |
Credential Security
Database credentials are never stored in plain text:
- Encrypted at rest using AES-256
- Never exposed in logs, API responses, or the UI after initial entry
- Accessible only by the DataTruth backend process
Audit Trail
Every action in DataTruth is logged:
- User logins and logouts
- Every natural language query and the SQL it generated
- Admin actions (user creation, role changes, connection edits)
- Data quality scan results
Audit logs are immutable and can be exported for compliance reviews.
Network Security
- All traffic between browser and DataTruth is served over HTTPS (TLS 1.2+)
- Nginx reverse proxy handles TLS termination
- Internal service communication stays within the Docker network, not exposed externally
- Rate limiting on all API endpoints to prevent abuse
Data Residency
DataTruth is designed for self-hosted deployment — your data never leaves your infrastructure. The AI model (GPT-4) receives only the SQL query and a subset of your schema metadata — never the actual row data from your database.
For fully air-gapped deployments, DataTruth supports local LLM models (e.g. Ollama) so no data leaves your network at all.
Compliance Readiness
DataTruth’s architecture supports compliance with:
- SOC 2 Type II — audit trail, access control, encryption
- GDPR — data stays in your infrastructure, user data deletion supported
- HIPAA — available with appropriate deployment configuration
For security issues or responsible disclosure, please email security@datatruth.ai rather than opening a public GitHub issue.