How to Build a Safe, Harness-Agnostic Text-to-SQL Engine for AI Agents
As software developers, we spend a massive amount of time querying databases. In my day-to-day work, I frequently needed to verify schema layouts and run diagnostics on our development database hosted in Azure.
This process was tedious: navigating to the Azure Portal, signing in, opening the query tool, writing the query, and analyzing the results. I wanted a way to query our database in natural language directly from my terminal or IDE.
To solve this, I built Harness-Agnostic Text-to-SQL, a Python library that translates natural language queries into safe SQL statements and executes them. It was designed from the ground up to solve two main challenges:
- Developer Friction: Eliminating the need to go to the Azure Portal to query databases.
- AI Agent Tooling: Providing a highly secure, plug-and-play tool for autonomous AI agents (like those in the Antigravity IDE) to query schemas and retrieve records safely.
Here is a deep dive into the design and implementation of the library.
1. The Core Architecture: Model & Database Agnostic
Many text-to-sql libraries force you into tight coupling with specific LLM providers (e.g., hardcoding OpenAI's client) or specific databases. This library avoids that by using two core abstractions: Callbacks and Adapters.
┌─────────────────────────────────────────────────────────┐
│ TextToSQL │
└───────────────────────────┬─────────────────────────────┘
│
┌───────────────┴───────────────┐
▼ ▼
┌────────────────────┐ ┌────────────────────┐
│ LLM Callback │ │ Database Adapter │
│ (User Custom) │ │ (SQLite/SQLAlchemy)│
└────────────────────┘ └────────────────────┘
- Gemini, OpenAI, - SQLiteAdapter
Claude, Local LLM - SQLAlchemyAdapter
Abstracting the Database: The Adapter Pattern
Database operations are handled by adapters implementing a standard interface:
from abc import ABC, abstractmethod
class DatabaseAdapter(ABC):
@abstractmethod
def get_schema(self) -> str:
"""
Extracts database schema (tables, columns, types).
Returns a formatted schema representation suitable for the prompt.
"""
pass
@abstractmethod
def execute_query(self, sql: str) -> dict:
"""
Executes the SQL query against the database and returns a dictionary with execution results.
"""
pass
Out of the box, the package includes:
SQLiteAdapter: For local dev databases and testing.SQLAlchemyAdapter: For any database supported by SQLAlchemy (PostgreSQL, MS SQL, Oracle, etc.), featuring dynamic schema reflection.
2. Automatic Entra ID Token Connections to Azure SQL Server
Enterprise databases hosted on Azure (like our development database under the awo schema on the server fastcrib-dev-sql-001.database.windows.net) rely on secure authentication via Microsoft Entra ID (formerly Azure Active Directory).
Historically, setting up a token-based pyodbc connection in SQLAlchemy required developers to manually retrieve the token using DefaultAzureCredential, pack it with struct.pack, and intercept the connection creation handshake via connection pool attributes.
To eliminate this developer friction, Harness-Agnostic Text-to-SQL now handles Microsoft Entra ID token acquisition and PyODBC connection handshake configuration automatically under the hood. You only need to include Authentication=ActiveDirectoryDefault in your database connection string, and the adapter does the rest:
from text_to_sql import TextToSQL
# Just pass the connection string with Authentication=ActiveDirectoryDefault!
conn_str = (
"Server=fastcrib-dev-sql-001.database.windows.net;"
"Database=fastcrib-dev-sql-001;"
"Authentication=ActiveDirectoryDefault;"
"Encrypt=yes;"
"TrustServerCertificate=no;"
)
# The library detects ActiveDirectoryDefault, resolves the Entra ID token,
# and injects it into the SQLAlchemy PyODBC attributes automatically.
translator = TextToSQL(
db_uri=conn_str,
schema="awo",
llm_callback=my_llm_callback
)
Headless & Agent Hardening
In headless environments (like CLIs running in background worker processes or autonomous agent executions in the Antigravity IDE), a generic token request can block indefinitely if it attempts to launch a web browser for interactive user login.
To harden this process, the adapter configures the token credentials internally as:
credential = DefaultAzureCredential(exclude_interactive_browser_credential=True)
This ensures that it fails fast with a clear authentication exception rather than hanging the CLI or agent thread.
3. The Strict Safety Validator Layer
Allowing an LLM or an AI agent to execute generated queries directly on a database is a massive security hazard. AI models can hallucinate or be tricked (via prompt injection) into writing commands that alter or delete data.
To prevent this, the library routes all execution paths through a SafetyValidator.
Safeguard 1: Read-Only Verification
By default, the validator operates in read_only=True mode, blocking any modification statements:
BLOCKED_KEYWORDS = {
"insert", "update", "delete", "drop", "alter",
"truncate", "replace", "grant", "revoke", "schema", "rename"
}
Safeguard 2: Word Boundary Enforcement
To avoid false positives on valid identifiers (e.g. a column named updated_at or a table called system_grant), the validator checks matching terms strictly on word boundaries:
pattern = r"\b(" + "|".join(BLOCKED_KEYWORDS) + r")\b"
if re.search(pattern, clean_query, re.IGNORECASE):
raise ValueError("Safety validation failed: Modification query detected.")
Safeguard 3: SQL Comment Scrubbing
Attackers can try to hide malicious statements inside SQL comments (e.g., -- delete from users or /* drop table */). The validator first scrubs comments (both single-line -- and multi-line /* */) from the query before running checks:
def scrub_comments(sql: str) -> str:
# Remove single line comments
sql = re.sub(r"--.*", "", sql)
# Remove multi-line comments
sql = re.sub(r"/\*[\s\S]*?\*/", "", sql)
return sql
4. Making it Agentic: Integrating as an IDE Plugin Skill
One of the most exciting aspects of this library is how easily it integrates into agentic workflows. In IDE platforms like Antigravity, we can expose the Text-to-SQL tool directly to our assistant agents.
The repository includes a pre-packaged Plugin configuration:
The CLI Executable
In setup.py, we expose a console entry point:
entry_points={
"console_scripts": [
"text-to-sql-query=text_to_sql.cli:main",
],
}
This registers a global CLI executable text-to-sql-query that the agent can trigger directly.
Production CLI Conveniences
To simplify real-world usage, the CLI executable includes several critical enhancements:
- Automatic
.envFile Loading: The CLI automatically scans the current working directory for a.envfile and loads environment configurations (such asDATABASE_URL). Developers don't need to pass--db-uriexplicitly on every call. - Robust JSON Serialization: Standard JSON libraries throw serialization exceptions when database records contain decimal points (
decimal.Decimal), timestamps (datetime), orUUIDidentifiers. The CLI now includes a specializedDatabaseJSONEncoderthat automatically formats these types safely (e.g. convertingDecimaltofloatanddatetimeto ISO strings) during CLI JSON formatting.
The Skill definition (SKILL.md)
We document instructions for the agent explaining when and how to call the CLI tool:
# Skill: Text-to-SQL Database Agent
Use this skill when you need to query database schemas or fetch raw records.
## Usage
### Get Database Schema
```bash
text-to-sql-query get-schema
Execute safe query
text-to-sql-query execute "SELECT name FROM awo.Asset LIMIT 5;"
By adding this configuration, our local AI assistant can automatically discover the database layout, run safe SELECT queries, retrieve real records, and answer technical questions for us—all without us ever having to open a query console.
---
## 5. Getting Started
You can install the stable release from PyPI:
```bash
pip install harness-agnostic-text-to-sql
For custom callbacks or setting up adapters for your database, check out the source code and examples on the GitHub repository.