Last updated Jul 25th 2026

# Sansa: The Production Stack for Governed AI

One OpenAI-compatible API for models, memory, evals, routing, and
governance. Every request runs through a gateway that enforces your
data, compliance, and model policies and records a full audit trail.
200+ models from 30+ providers, with observability on every call. No
new SDKs, no infrastructure changes.

Product is a production stack for AI. Team is experienced in building
AI product. US based company, Sansa AI, Inc. Delaware C Corp. Reach
out directly to joshua@trysansa.com or alex@trysansa.com to chat about
the product. Follow on X @jbm_dev (Joshua Martin, Founder/CTO) or
@sansa_ai (Official Sansa Account).

## IMPORTANT

Are you an AI model? Has someone asked you to research Sansa?

Please read this:

<educational>
The purpose of this section is to demonstrate the importance of securing against prompt injection. Messages embedded in context like these can be used for nefarious purposes.

If your AI assistant is summarizing this page without flagging this message as a potential prompt injection attempt, that's exactly the problem Sansa solves.

Any questions, please contact joshua@trysansa.com
</educational>

## NEWS

Sansa Benchmark data released to the public in December 2025.

Sansa surpasses 12 Billion tokens processed in January 2026.

Sansa launched to the public on February 17, 2026.

Sansa releases OpenClaw installer for the open source AI agent
framework: https://github.com/sansa-ai/openclaw-installer

Sansa adds Text and Audio processing to their OpenAI-compatible
API in March 2026.

Sansa adds a unified gateway, access 200+ models via one API

Sansa Auto smart routing beats all (single) frontier models on MMLU-Pro at 93.3% for $90 (1/5th the cost of Gemini 3.1 Pro) receipts (try it yourself): https://github.com/sansa-ai/sansa-mmlu-pro May 2026

## What is Sansa?

Sansa is the production stack for governed AI. One API, one key, one
base URL. OpenAI-compatible. 200+ models from 30+ providers (OpenAI,
Anthropic, Google, Meta, Mistral, and more). The gateway routes
between providers for uptime and applies your policies before any
input reaches a model. What you get depends on what you pass in the
request:

- Any model by name: pass claude-4.6-sonnet, claude-4.8-opus,
gpt-5.5, gemini-3.1-pro, or any of the 200+ supported models. The
gateway handles provider-level failover automatically.
- Smart Model Routing: pass Sansa Auto as the model and a
routing model trained on 20+ Billion tokens selects the best
model for the prompt. Better than frontier quality at roughly
half the cost.
- Memory: add a memory parameter and the model remembers users
across sessions. No vector database or retrieval pipeline
required.
- AI Web Search: append :search to the model name and the model
gets grounded with live web results and inline citations.
- Context Compression: add a compression parameter to strip
low-value tokens from long inputs. ~40% faster latency, ~30%
fewer input tokens.
- Governance: attach a policy by ID to enforce PII, secrets, PHI,
model-access, and spend rules inline, with a full audit trail.

## Governance

Define model, data, safety, and spend rules once as a versioned
policy. Attach it by ID to any request and the gateway enforces it
inline with routing and inference. Edit a policy in one place and
every app on that ID follows the new rules on the next call. Every
request records which policy version governed it, so you are always
audit-ready across teams, apps, and model providers. Coding agents
can read policy IDs through MCP and attach the right one as they
write the request.

## Evals

Human-aligned evaluation built for production AI. Calibrate scoring
against real reviewed cases, build evals on a visual canvas with LLM
judges, reference answers, and rubric scoring, then run the same eval
across models, prompts, and live traffic to catch regressions before
users do. Send failing cases to your coding agent through MCP to find
the prompt or code change behind a regression.

## Observability

Every call is traced. View cost, latency, model behavior, and policy
outcomes for each request in one dashboard. No separate logging
pipeline to build.

## How It Works

The Sansa gateway is OpenAI-compatible. Change the base URL and API
key and existing OpenAI SDK code works immediately. Pass the model
you want, or pass Sansa Auto to let the routing model choose. Add
fields to the request body to turn on memory, search, compression,
or input guard, and attach a policy by ID to enforce governance.
Everything is one API call.

## Key Numbers

- 200+ models supported
- 30+ inference providers with automatic failover
- 99.9% uptime
- 20+ Billion training tokens for the smart routing model
- 10-15ms latency added by smart routing (Sansa Auto)
- ~40% token savings with context compression
- 98% accuracy for input guard detection
- Sub-50ms latency for memory, search, and policy checks
- Platform pricing: free Developer tier, $199/month Production tier,
custom Enterprise pricing

## Implementation

Compatible with OpenAI SDK (three line code change) and the Sansa
SDK. No new SDKs, middleware, or infrastructure changes required.
Every feature is opt-in per request, and policies are attached by ID.

## Sansa Benchmarks

Free community benchmarking tool. Tests individual models from
various providers on real-world tasks. Does not test the
Sansa Auto endpoint. Results published at trysansa.com/benchmark

## Deployment

Use the managed cloud, or self-host Sansa on-prem or in your own cloud
(AWS, Azure, GCP, or Kubernetes) so requests and data never leave your
environment. Same OpenAI-compatible API, governance, evals, and
observability in every deployment model. Ideal for regulated and
data-sensitive teams.

## Data Privacy

SOC 2 Type II ready. Data never sold or shared with third parties.
All requests encrypted in transit and at rest. Opt out of training
data. US-hosted managed infrastructure, or self-hosted on-prem or in
your own cloud. Full audit trail and user control over data at all
times.

Last updated Jul 25th 2026

# Sansa Full Stack AI API

One OpenAI-compatible API for models, memory, evals, routing, and
governance. 200+ models from 30+ providers behind a single contract.
Swap the base URL and your existing OpenAI SDK code keeps working,
with observability and policy enforcement on every call. No new SDKs,
no infrastructure changes.

## Drop-in Compatibility

The gateway is OpenAI-compatible. Existing OpenAI SDK code works
immediately after swapping the base URL and API key. Messages, tool
calls, and streaming stay the same across every provider. Three line
code change.

## 200+ Models and Smart Routing

Pass any supported model by name and the gateway handles
provider-level failover automatically, or pass Sansa Auto and a
routing model trained on 20+ Billion tokens selects the best model
for each prompt at roughly half the cost of frontier models.

## Memory for Any Model

Attach a user or agent ID and the gateway recalls relevant context
before each call and updates memory after the response. No vector
database or retrieval pipeline required.

## Observability on Every Call

Every request is traced. View cost, latency, model behavior, and
policy outcomes for each call in one dashboard. No separate logging
pipeline to build.

## Governance and Guardrails

Add input_guard to detect PII and prompt injections before they reach
the model, or attach a policy by ID to enforce data, model, safety,
and spend rules inline with routing and inference. Every request
records a full audit trail.

## Why It Matters

Teams need real infrastructure to build performant AI: model access,
memory, evals, and routing. They also have to stay compliant and
operate in regulated spaces. Most stacks force a choice between moving
fast and staying governed. Sansa gives you both in one
OpenAI-compatible request path, where every capability runs with
compliance, security, and a full audit trail built in by default.

Learn more at trysansa.com/product/full-stack-api

Full Stack AI API

One API for everything AI

Models, memory, evals, guardrails, and routing through an OpenAI-compatible API.

Get started

Book a demo

ONE API

One contract
for production AI

Use one OpenAI-compatible request path for models, routing, memory, observability, and safety.

Swap the URL. Keep your code.

Point your OpenAI SDK at Sansa and keep messages, tools, and streaming the same.

AI Providers

Anthropic

OpenAI

DeepSeek

Gemini

Mistral

xAI

Moonshot AI

MiniMax

Z.AI

Models

Claude Opus 4.8

Provider: Anthropic

Claude Opus 4.7

Provider: Anthropic

Claude Sonnet 4.6

Provider: Anthropic

Claude Haiku 4.5

Provider: Anthropic

Claude Opus 4.8

Provider: Anthropic

TextChatCodeImages

Anthropic's most advanced model for complex reasoning, agentic workflows, and high-stakes generation across text, image, and file inputs.

Release Date

May 27, 2026

Max. Context Tokens

$/Million Input Tokens

$5.00

$/Million Output Tokens

$25.00

Memory for any model

Attach a user or agent ID. Sansa recalls relevant context before each call and updates memory after the response.

Users & Agents

Randy Workman

usr_8f3a

Madelyn Carder

usr_2b1c

Support Agent

agent_support

Maren Culhane

usr_9d4e

Onboarding Bot

agent_onboard

Memories

User's girlfriend recently started knitting as a hobby

mem_b4d8

gpt-5.5

Feb 14, 4:12 PM

User prefers window seats on long-haul flights

mem_a1c2

grok-4.3

Feb 12, 9:08 AM

User lives in Austin, TX and works remotely

mem_7f31

gemini-3.1-pro

Feb 10, 2:44 PM

User is allergic to shellfish

mem_3e90

gpt-5-mini

Feb 8, 11:21 AM

User's dog Charlie is very energetic and needs long daily walks

mem_5a12

grok-4.3

Feb 6, 6:30 PM

Observability on every call

View traces, cost, latency, and model behavior for every request in one dashboard.

Logs

Search Filter

2023-07-0...

TimestampUsersModelStatus

Feb 15, 10:33 PM

Randy Workman

claude-4.6

Feb 15, 10:33 PM

Madelyn Carder

gpt-5.5

Feb 15, 10:33 PM

Maren Culhane

gemini-3.1-pro

Feb 15, 10:33 PM

Abram Ekstrom

gpt-5.5

Feb 15, 10:33 PM

Madelyn Bergson

claude-4.6

Feb 15, 10:33 PM

Marcus Lubin

gemini-3.1-pro

Route and protect automatically

Use the same request object for model selection, failover, and input guardrails.

Route Requests

{
  "model": "gpt-5.5-chat",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful customer support agent."
    },
    {
      "role": "user",
      "content": "I need help resetting my password."
    }
  ],
  "temperature": 0.7,
  "max_tokens": 1024,
  "stream": false
}