How to Build a Safe, Model-Agnostic Text-to-SQL Engine for AI Agents

As software developers, we spend a massive amount of time querying databases. In my day-to-day work, I frequently needed to verify schema layouts and run diagnostics on our development database hosted in Azure.

This process was tedious: navigating to the Azure Portal, signing in, opening the query tool, writing the query, and analyzing the results. I wanted a way to query our database in natural language directly from my terminal or IDE.

To solve this, I built Model-Agnostic Text-to-SQL, a Python library that translates natural language queries into safe SQL statements and executes them. It was designed from the ground up to solve two main challenges:

Developer Friction: Eliminating the need to go to the Azure Portal to query databases.
AI Agent Tooling: Providing a highly secure, plug-and-play tool for autonomous AI agents (like those in the Antigravity IDE) to query schemas and retrieve records safely.

Here is a deep dive into the design and implementation of the library.

1. The Core Architecture: Model & Database Agnostic

Many text-to-sql libraries force you into tight coupling with specific LLM providers (e.g., hardcoding OpenAI's client) or specific databases. This library avoids that by using two core abstractions: Callbacks and Adapters.

  ┌─────────────────────────────────────────────────────────┐
  │                        TextToSQL                        │
  └───────────────────────────┬─────────────────────────────┘
                              │
              ┌───────────────┴───────────────┐
              ▼                               ▼
   ┌────────────────────┐          ┌────────────────────┐
   │    LLM Callback    │          │  Database Adapter  │
   │    (User Custom)   │          │ (SQLite/SQLAlchemy)│
   └────────────────────┘          └────────────────────┘
     - Gemini, OpenAI,               - SQLiteAdapter
       Claude, Local LLM             - SQLAlchemyAdapter

Decoupling the LLM: The Callback Pattern

Instead of wrapping SDKs, the library expects a simple callable function llm_callback(prompt: str) -> str that accepts the formulated prompt (containing schema info and query instructions) and returns a raw SQL string. This makes it trivial to use Gemini, OpenAI, Claude, or a local llama model.

Abstracting the Database: The Adapter Pattern

Database operations are handled by adapters implementing a standard interface:

from abc import ABC, abstractmethod
from typing import Any, Dict, List

class BaseAdapter(ABC):
    @abstractmethod
    def get_schema_info(self) -> str:
        """Returns a string representation of the DB schema."""
        pass

    @abstractmethod
    def execute(self, query: str) -> Dict[str, Any]:
        """Executes the query and returns success status, columns, and rows."""
        pass

Out of the box, the package includes:

SQLiteAdapter: For local dev databases and testing.
SQLAlchemyAdapter: For any database supported by SQLAlchemy (PostgreSQL, MS SQL, Oracle, etc.), featuring dynamic schema reflection.

2. Dynamic Entra ID Token Connections to Azure SQL Server

Enterprise databases hosted on Azure (like our development database under the awo schema on the server fastcrib-dev-sql-001.database.windows.net) rely on secure authentication via Microsoft Entra ID (formerly Azure Active Directory).

To connect the SQLAlchemyAdapter to Azure using active directory token-based pyodbc connection, the engine must inject a temporary Access Token on every connection handshake. Here is how we configure it:

import struct
from azure.identity import DefaultAzureCredential
from sqlalchemy import create_engine
from text_to_sql import TextToSQL, SQLAlchemyAdapter

# 1. Retrieve the Entra ID active token
credential = DefaultAzureCredential()
token_obj = credential.get_token("https://database.windows.net/.default")
token_bytes = token_obj.token.encode("utf-16-le")

# 2. Setup pyodbc connection attributes
driver = "ODBC Driver 17 for SQL Server"
server = "fastcrib-dev-sql-001.database.windows.net,1433"
database = "fastcrib-dev-sql-001"
conn_str = f"DRIVER={{{driver}}};SERVER={server};DATABASE={database};Encrypt=yes;TrustServerCertificate=no;Connection Timeout=30;"

# pyodbc constant to set access token attribute before connection
SQL_COPT_SS_ACCESS_TOKEN = 1256
packed_token = struct.pack("=i", len(token_bytes)) + token_bytes

# 3. Create the SQLAlchemy Engine
engine = create_engine(
    f"mssql+pyodbc:///?odbc_connect={conn_str}",
    connect_args={"attrs_before": {SQL_COPT_SS_ACCESS_TOKEN: packed_token}}
)

# 4. Initialize Adapter focusing on schema 'awo'
adapter = SQLAlchemyAdapter(engine, schema="awo")

This configuration enables secure connections from our local developer environment directly to the dev Azure database using our local CLI environment, avoiding hardcoded database passwords entirely.

3. The Strict Safety Validator Layer

Allowing an LLM or an AI agent to execute generated queries directly on a database is a massive security hazard. AI models can hallucinate or be tricked (via prompt injection) into writing commands that alter or delete data.

To prevent this, the library routes all execution paths through a SafetyValidator.

Safeguard 1: Read-Only Verification

By default, the validator operates in read_only=True mode, blocking any modification statements:

BLOCKED_KEYWORDS = {
    "insert", "update", "delete", "drop", "alter", 
    "truncate", "replace", "grant", "revoke", "schema", "rename"
}

Safeguard 2: Word Boundary Enforcement

To avoid false positives on valid identifiers (e.g. a column named updated_at or a table called system_grant), the validator checks matching terms strictly on word boundaries:

pattern = r"\b(" + "|".join(BLOCKED_KEYWORDS) + r")\b"
if re.search(pattern, clean_query, re.IGNORECASE):
    raise ValueError("Safety validation failed: Modification query detected.")

Safeguard 3: SQL Comment Scrubbing

Attackers can try to hide malicious statements inside SQL comments (e.g., -- delete from users or /* drop table */). The validator first scrubs comments (both single-line -- and multi-line /* */) from the query before running checks:

def scrub_comments(sql: str) -> str:
    # Remove single line comments
    sql = re.sub(r"--.*", "", sql)
    # Remove multi-line comments
    sql = re.sub(r"/\*[\s\S]*?\*/", "", sql)
    return sql

4. Making it Agentic: Integrating as an IDE Plugin Skill

One of the most exciting aspects of this library is how easily it integrates into agentic workflows. In IDE platforms like Antigravity, we can expose the Text-to-SQL tool directly to our assistant agents.

The repository includes a pre-packaged Plugin configuration:

The CLI Executable

In setup.py, we expose a console entry point:

entry_points={
    "console_scripts": [
        "text-to-sql-query=text_to_sql.cli:main",
    ],
}

This registers a global CLI executable text-to-sql-query that the agent can trigger directly.

The Skill definition (`SKILL.md`)

We document instructions for the agent explaining when and how to call the CLI tool:

# Skill: Text-to-SQL Database Agent

Use this skill when you need to query database schemas or fetch raw records.

## Usage

### Get Database Schema
```bash
text-to-sql-query get-schema

Execute safe query

text-to-sql-query execute "SELECT name FROM awo.Asset LIMIT 5;"


By adding this configuration, our local AI assistant can automatically discover the database layout, run safe SELECT queries, retrieve real records, and answer technical questions for us—all without us ever having to open a query console.

---

## 5. Getting Started

You can install the stable release from PyPI:
```bash
pip install model-agnostic-text-to-sql

For custom callbacks or setting up adapters for your database, check out the source code and examples on the GitHub repository.