Security

DataTruth is designed to be trusted with production databases. Here’s how it keeps your data safe.


Table of Contents

  1. Read-Only by Design
  2. SQL Validation & Sandboxing
  3. Role-Based Access Control
  4. Credential Security
  5. Audit Trail
  6. Network Security
  7. Data Residency
  8. Compliance Readiness

Read-Only by Design

DataTruth never writes to your database. All connections are established with read-only credentials. Even if an AI-generated query attempted a DROP, DELETE, or UPDATE, the database user has no permission to execute it.

This is enforced at the infrastructure level — not just by application code.


SQL Validation & Sandboxing

Every query generated by the AI goes through a multi-layer validation pipeline before execution:

  1. Intent classification — the query intent is checked against allowed operations
  2. SQL parsing — the generated SQL is parsed to detect dangerous patterns (DML, DDL, multiple statements, etc.)
  3. Schema validation — columns and tables are verified against the connected schema
  4. Sandboxed execution — queries run with strict timeouts and row limits

If any check fails, the query is blocked and the user receives a safe error message.


Role-Based Access Control

DataTruth enforces access control at multiple levels:

Level How It Works
Authentication JWT tokens with configurable expiry; sessions invalidated on logout
Role permissions Each role has a defined set of allowed actions and visible features
Data-level access Admins can restrict which databases and schemas each role can query
API access API tokens are scoped to specific operations

Credential Security

Database credentials are never stored in plain text:

  • Encrypted at rest using AES-256
  • Never exposed in logs, API responses, or the UI after initial entry
  • Accessible only by the DataTruth backend process

Audit Trail

Every action in DataTruth is logged:

  • User logins and logouts
  • Every natural language query and the SQL it generated
  • Admin actions (user creation, role changes, connection edits)
  • Data quality scan results

Audit logs are immutable and can be exported for compliance reviews.


Network Security

  • All traffic between browser and DataTruth is served over HTTPS (TLS 1.2+)
  • Nginx reverse proxy handles TLS termination
  • Internal service communication stays within the Docker network, not exposed externally
  • Rate limiting on all API endpoints to prevent abuse

Data Residency

DataTruth is designed for self-hosted deployment — your data never leaves your infrastructure. The AI model (GPT-4) receives only the SQL query and a subset of your schema metadata — never the actual row data from your database.

For fully air-gapped deployments, DataTruth supports local LLM models (e.g. Ollama) so no data leaves your network at all.


Compliance Readiness

DataTruth’s architecture supports compliance with:

  • SOC 2 Type II — audit trail, access control, encryption
  • GDPR — data stays in your infrastructure, user data deletion supported
  • HIPAA — available with appropriate deployment configuration

For security issues or responsible disclosure, please email security@datatruth.ai rather than opening a public GitHub issue.


Back to top

DataTruth © 2025. Built with ♥ using FastAPI, React, and OpenAI.