Everything you need to know about SWIRL, from quick start to OEM embedding.
SWIRL 5 sharpens SWIRL into the private knowledge layer for enterprise AI. New in this release: a home dashboard with workspaces and recent files from every connected source; canonical answers your team pins and approves; a three-pass relevancy pipeline that runs locally; a first-class MCP server so any agent can call SWIRL; a hallucination warning when an answer isn't grounded in its sources; and a business console with AI-Yield analytics, semantic caching, and deduplication. All of it without copying, indexing, or moving your data.
Relevance runs in three passes, and both models run locally so nothing is sent over the wire: (1) keyword and BM25 across every source, with quoted phrases and exact terms honored first; (2) embedding re-ranking with E5-Large-V2, using title-aware chunking and hybrid keyword+vector fusion (RRF); (3) a MS-MARCO cross-encoder that reads the query and document together to score real relevance, not vector distance. The passage that actually answers the question rises to the top and is fed to your chosen LLM.
No. SWIRL uses hybrid retrieval - keyword to pull candidates, then re-rank with embeddings and a cross-encoder - which Meta's XetHub team benchmarked as higher-recall than vector-only, concluding "no vector database necessary." That means no second copy of your data, no embedding pipeline to keep in sync, and no index to secure.
SWIRL 5 lets a team pin the canonical result for a query - the approved policy, the governing clause, the right precedent. Every later search, by a person or by an agent calling SWIRL over MCP, returns the endorsed answer instead of the model's best guess. The technical layer finds the candidates; your business decides which one is authoritative. An AI-Yield metric tracks how often searches return an endorsed answer.
Every generated answer is checked against the sources it cited. When a claim isn't supported by what was actually retrieved, SWIRL flags it - so your team knows when to verify, and nothing ungrounded slips through unnoticed. It's built for putting AI in front of people who act on the output, such as lawyers, analysts, and clinicians, where "usually right" is a liability.
SWIRL 5 opens on a dashboard that organizes knowledge into workspaces and topics. Recent files from every connected OAuth2 source - OneDrive, Box, iManage, ServiceNow, and more - surface right there, and a user can add any of them to a workspace with one click. Adding a new source is configuration, not code.
Yes. SWIRL 5 is headless and API-first, exposing a REST API and a first-class MCP server. Any AI - Claude, Copilot, ChatGPT, or your own agents - calls SWIRL as its governed knowledge layer and gets ranked, permissioned, organization-approved answers, with no copy of your data leaving your tenant. The Galaxy UI is a demo and admin surface, not a requirement.
As of SWIRL 5, the standalone AI Search Assistant is deprecated in favor of headless, agentic retrieval. Rather than ship our own chatbot, SWIRL 5 is API-first with a first-class MCP server, so you bring the LLM or agent you already use - Claude, Copilot, ChatGPT, or your own - and SWIRL is the governed knowledge layer beneath it. The Galaxy UI remains as a demo and admin surface. In short, SWIRL makes your chosen LLM better instead of competing with it.
The Business Console lets the business boost or block sources and spotlight the right answer, with no code. Under the hood, a semantic cache resolves repeated queries in milliseconds instead of seconds, and a deduplication pipeline merges near-identical results from different sources before they reach the user. AI-Yield analytics measure endorsed answers versus best-guess retrieval so you can track and improve quality.
SWIRL is the private knowledge layer for enterprise AI. It runs federated AI search and RAG across 150+ enterprise platforms in real time - no copying, no index, no vector database - and serves ranked, permissioned, organization-approved answers to your people and your AI agents. In short: it makes your chosen LLM better, without moving a byte of your data.
Federated search means SWIRL queries multiple data sources in parallel - without copying, indexing, or migrating any data. Your sources stay where they are. SWIRL reaches them at query time, collects results, re-ranks them, and returns a unified response. Zero ETL. Zero data movement.
SWIRL is open-source and hosted on GitHub. Visit github.com/swirlai/swirl-search to download the latest packages, read documentation, log issues, and join the community.
Full documentation - installation, configuration, SearchProviders, connectors, and API reference - is at docs.swirlaiconnect.com.
SWIRL supports 100+ LLMs including OpenAI, Azure OpenAI, Anthropic Claude, Google Gemini, Mistral, and on-premises models via Ollama and compatible frameworks. You can switch models without changing anything else in the stack.
SearchProviders are the core configuration element of SWIRL. Each one defines how SWIRL connects to a data source - credentials, query format, result mapping, and re-ranking weight. They're JSON dictionaries, and SWIRL ships with 150+ pre-built ones. Adding a new source typically takes minutes, not days. See the SearchProvider Guide.
Community Edition is open-source under Apache 2.0 and ideal for searching repositories that don't require authentication. Enterprise Edition adds the SWIRL 5 capabilities: canonical answers (Pinned Results), the three-pass relevancy pipeline, a first-class MCP server, the hallucination warning, a business console with AI-Yield analytics, semantic caching and deduplication, SSO/OIDC with auto-provisioning, the authenticated PageFetcher with RAG across 1,500+ file formats, PII redaction, and multi-provider LLM support. See the SWIRL Overview.
SWIRL runs inside your own environment - on-prem, private VPC, or your cloud tenant - via Docker Compose or Kubernetes. SWIRL Enterprise is also available pre-installed through the Azure Marketplace. See the Installation Guide and Kubernetes Guide.
SWIRL is built on Python and Django, with Celery and Redis handling asynchronous retrieval and RAG, and PostgreSQL recommended for production. It exposes a fully documented REST/Swagger API and ships with the Galaxy UI. See the Developer Guide.
Retrieval-Augmented Generation (RAG) grounds an LLM in real source content. SWIRL fetches results in real time, re-ranks them with embeddings, extracts the most relevant passages, and sends them to your configured LLM with citations. No data is indexed or copied. See the RAG Guide.
Yes. SWIRL Enterprise includes built-in MCP tools: SWIRL_search, SWIRL_search_and_rag, SWIRL_get_sources, and SWIRL_read_document. A standalone MCP proxy is also available for Community Edition and external MCP clients. See the MCP Guide.
Yes. AIProviders are customer-configured. You can point SWIRL at OpenAI, Azure OpenAI, Anthropic, Cohere, any LiteLLM-supported endpoint, or an on-prem inference server (Ollama, vLLM, etc.). Roles like chat, query rewriting, RAG, and embeddings can each use a different model. See Connecting to GAI/LLMs.
SWIRL ships with 150+ pre-built SearchProviders for enterprise apps, cloud platforms, databases, and search engines. Each is a JSON record that defines the endpoint, credentials, query format, and result mapping. Adding a new source typically takes minutes. See the full list at swirlaiconnect.com/connectors.
SWIRL Enterprise can extract and apply RAG to 1,500+ file formats, including PDFs, Office documents, structured tables and charts, and text inside images. Content is fetched at query time via the authenticated PageFetcher and held only for the duration of the request.
No. SWIRL is a federated metasearch engine. Queries are dispatched to source systems at request time and results are re-ranked in memory. Full text retrieved for RAG is held only for the duration of the request. SWIRL's own database stores only metadata - users, groups, SearchProvider definitions, and per-user result pointers. See the Security Guide.
SWIRL runs entirely inside your trust boundary - on-prem, private VPC, or your own cloud tenant. SWIRL Corporation has no access to your data, user identities, or query logs unless you explicitly share them for support purposes.
SWIRL Enterprise supports SSO via OAuth2 / OpenID Connect and has been deployed with PingFederate, Microsoft Entra ID, Okta, Auth0, and Google. Per-source OAuth2 handles Microsoft 365, Google Workspace, and Box. Token and HTTP Basic authentication are available for API clients. Brute-force lockout via django-axes is enabled by default.
Two layers. (1) SWIRL Groups can restrict which users see and execute a given SearchProvider. (2) The source system's own ACLs are preserved - SWIRL passes the user's identity through, so users only see results they're already authorized to access at the source.
Infrastructure secrets (Django SECRET_KEY, database, Redis, OIDC client secret, license) are read from environment variables - never committed to code or baked into images. User-context sources use per-user OAuth tokens; service-account credentials live on the SearchProvider record. In Kubernetes, store them in a Secret, not a ConfigMap.
Yes. SWIRL Enterprise integrates Microsoft Presidio to detect and redact names, emails, phone numbers, credit card numbers, and national IDs before results are returned to the user or sent to an LLM. Redaction is opt-in per SearchProvider, so you can enforce it on high-sensitivity connectors only.
The Security Guide documents the controls SWIRL provides and the controls the customer owns (encryption at rest, IdP/MFA configuration, SIEM forwarding, retention, etc.). Customers operating under regulated frameworks configure SWIRL to meet their specific obligations; SWIRL's enterprise team can provide configuration guidance and, under NDA, additional artifacts to support audits.
Yes - customers may pen-test their own SWIRL deployments at any time without prior approval. Report suspected vulnerabilities to support@swirlaiconnect.com with SECURITY in the subject line. SWIRL's Security Team acknowledges within 72 hours.
Yes. The Source Code license tier gives you full branding rights - your product name, your UI, your logo. Customers see your product only. This requires the Source Code licensing agreement; contact sales to discuss terms.
SWIRL exposes a REST API and a first-class MCP server. Your platform calls SWIRL like any other service - passing queries, receiving ranked results. You render the results in your own UI. No SWIRL branding required. Typical integration projects take days, not months.
Yes, and it's a core feature. SWIRL can run entirely inside your customer's network - Docker, Kubernetes, or bare metal. No data ever leaves their environment. This is critical for regulated industries, classified environments, and security-conscious enterprise buyers.
The canonical version finder lets organizations pin the official answer for any query. When your customers embed SWIRL, their teams can endorse specific results as authoritative - approved clauses, governing documents, correct answers. Every subsequent AI call returns the endorsed result, not a model guess. You deliver that organizational intelligence layer as part of your product.
The Source Code tier is a commercial license for OEM partners who want to embed SWIRL in their own product, modify the source code, and ship it under their own brand. Pricing is negotiated based on deployment scale and use case. Contact us to start the conversation.