CLUSTER · AI SECURITY
Shadow AI: Detection and Governance Playbook for Mid-Market
Your employees are already using ChatGPT, Claude, Gemini, and a dozen browser-extension AI tools with corporate data — whether or not IT has sanctioned it. This playbook covers how to detect shadow AI, set a usable policy, deploy technical controls, and respond when confidential data has already left the building.
QUICK ANSWER
The 50-Word Answer
Shadow AI is employee use of AI tools without IT or security approval — typically public ChatGPT, Claude, Gemini, and Perplexity. Detection combines DNS/proxy logs, endpoint telemetry, and CASB signals. Governance requires three elements working together: a sanctioned enterprise LLM alternative, an acceptable use policy employees can actually follow, and DLP controls enforcing them both.
DEFINITION
What Is Shadow AI
Shadow AI is the emerging-technology sibling of shadow IT: an employee-driven adoption of tools that move faster than the governance, security, and procurement processes designed to sanction them. In 2022 and 2023 the surface was primarily public ChatGPT. In 2024 it expanded to include Claude, Gemini, Perplexity, and Copilot accessed outside tenant context. By 2026 the surface includes browser-extension AI assistants, AI-enabled features embedded in consumer SaaS tools (Notion AI, Grammarly, Loom AI summaries), and code assistants accessed through personal GitHub accounts. Any time an employee sends work-relevant content to an AI system outside IT's visibility and control, you have shadow AI.
The risk profile differs meaningfully from classic shadow IT. A shadow spreadsheet on an unsanctioned SaaS tool stores data in that vendor's database — bad, but bounded. A shadow AI interaction can pass confidential data into the training corpus of a frontier LLM, from which it may later emerge in the outputs other customers see. This training-data contamination risk has no clean equivalent in traditional shadow IT and is the reason regulators, boards, and general counsel offices treat shadow AI as a distinct category of exposure worth dedicated attention.
A useful mental model: shadow AI is to data loss prevention what BYOD was to endpoint management in 2014. The controls, policies, and sanctioned alternatives exist but have not yet reached broad operational maturity. Organizations that took BYOD seriously in 2014 came out of the decade with strong MDM and DLP programs. Organizations that take shadow AI seriously in 2026 will come out of the decade with defensible AI governance. The ones that punt will spend 2028 and 2029 cleaning up.
DATA LEAKAGE
The Data Leakage Risk (Samsung, Amazon, OpenAI)
Three incidents define the shadow AI data leakage risk landscape, and every executive and board member making decisions about AI governance should know them cold.
Samsung Confidential Source Code (2023)
Samsung semiconductor engineers pasted proprietary source code into public ChatGPT on three separate occasions within weeks of each other — once to debug code, once to optimize test sequences, once to summarize meeting notes containing product strategy. OpenAI's consumer ChatGPT terms of service at the time permitted training-data retention from user inputs. Samsung could not retrieve the data, could not prevent it from entering model training, and could not know whether it would emerge in other users' outputs. The company banned generative AI use internally in May 2023. The incident became the canonical shadow AI case study.
Amazon Internal Q&A Leak (2024)
Amazon employees were observed pasting internal Q&A content, interview notes, and performance discussions into ChatGPT. An internal Amazon Slack channel surfaced the practice, and Amazon's legal team issued guidance explicitly restricting generative AI use for confidential material. The episode received less press than Samsung but has the same structural pattern: capable, well-intentioned employees reaching for a useful tool, without technical controls to prevent the data from leaving.
OpenAI ChatGPT Conversation-History Exposure (2023)
In March 2023 a Redis client library bug in ChatGPT exposed conversation titles and in some cases payment information to other users. OpenAI patched within hours, but the incident demonstrated a different shadow AI failure mode: even when the user trusts the AI vendor with their data, the vendor's own operational incidents can expose it. Any organization relying on consumer AI for sensitive work accepts the vendor's incident response as part of its own risk surface.
WHAT EMPLOYEES USE
Common Shadow AI Tools
| Tool | Primary Use Case | Shadow AI Risk | Enterprise Alternative |
|---|---|---|---|
| ChatGPT (consumer) | General writing, coding help, research | Default training retention on free tier | ChatGPT Enterprise, Microsoft 365 Copilot |
| Claude (consumer) | Long-document analysis, coding | API terms vary, consumer does not train by default but still out of tenant | Claude for Work, Enterprise |
| Google Gemini (consumer) | Research, email drafting, summarization | Google may use for personalization | Gemini Enterprise, Google Workspace AI |
| Microsoft Copilot (non-tenant) | Browser-based assistant | Data flows through Microsoft consumer endpoints | Copilot for Microsoft 365 (tenant-bound) |
| Perplexity | Research, web search augmentation | Queries leave tenant; answers may cite untrusted sources | Perplexity Enterprise or sanctioned RAG |
| Browser AI extensions | Summaries, rewriting, translation | Often unclear data handling; extension-level access to all tabs | Sanctioned extensions only; block others |
| Grammarly, Notion AI, etc. | Embedded AI in SaaS tools | Data flows to embedded AI vendor | Vet individually; enterprise tier where available |
DETECTION
How to Actually Detect Shadow AI
DNS Logging and Analysis
Every shadow AI session starts with a DNS query. Enable DNS logging on corporate resolvers (Cisco Umbrella, Microsoft DNS, internal BIND) and regularly query for the top 30 AI domains: chat.openai.com, chatgpt.com, claude.ai, gemini.google.com, bard.google.com, copilot.microsoft.com (outside tenant context), perplexity.ai, pi.ai, character.ai, and a long tail of consumer AI apps. Correlate resolved queries with user identity through AD/Entra logs. This is the fastest initial assessment — most mid-market organizations can produce a first shadow AI report inside a week with no new tooling.
Web Proxy and CASB Logs
Secure web gateways (Zscaler, Netskope, Cisco Umbrella SWG, Palo Alto Prisma Access) and CASB platforms classify AI traffic and attribute it to users. More precise than DNS alone because they see the full URL, session duration, and upload volume — which matters, because a ten-second ChatGPT query is different from a ninety-minute session with a four-megabyte document upload. CASB also enables policy-based blocking, warning banners, and selective allowlisting.
Endpoint Telemetry
EDR platforms (CrowdStrike, SentinelOne, Microsoft Defender) see browser processes, installed extensions, and local AI application traffic. The browser-extension surface is impossible to cover with network-layer tooling alone — a user installing an AI-powered summarizer extension exposes every tab they visit, and only endpoint telemetry reveals it. Expand detection rules to flag new AI-category browser extensions and new desktop apps from known AI vendors.
DLP and CASB Content Inspection
Modern DLP (Microsoft Purview, Forcepoint, Netskope) can inspect content uploaded to AI endpoints and block based on classification. If an employee attempts to paste customer PII, credit card numbers, source code, or classified project content into ChatGPT, DLP can block the paste at the browser level or the upload at the network edge. This is the control that converts shadow AI from an undetected risk into a governed channel with guardrails.
POLICY
Creating an Acceptable Use Policy
A usable AI acceptable use policy (AUP) has six sections. Each addresses a common failure mode Armorstack has observed in mid-market deployments.
1. Scope. Explicitly define what tools and services the policy covers. Do not write “artificial intelligence” and assume employees know what that means. List the categories: generative AI chat tools, AI coding assistants, AI embedded in SaaS, AI-powered browser extensions, AI agents with tool access. Update quarterly as the market changes.
2. Sanctioned tools. Name the enterprise alternatives the organization provides: Microsoft 365 Copilot, ChatGPT Enterprise, Claude for Work, etc. Explain where to find access, how to request access, and who approves exceptions. If you do not give employees a legitimate path, they will find an illegitimate one.
3. Prohibited uses. Specify data categories that must never be submitted to any AI tool regardless of sanctioned or unsanctioned status: source code, customer PII, healthcare PHI, payment card data, confidential M&A or financial content, attorney-client privileged material, HR records. Include consequences: disciplinary action up to termination for willful violations.
4. Attribution and disclosure. Address when AI-assisted work must be disclosed — to clients, in regulatory submissions, in academic or legal contexts. This is the section that matters most for professional services firms, healthcare providers, and regulated entities.
5. Accuracy and verification. Require human review of AI-generated outputs before client or external distribution. LLMs hallucinate. A policy that does not make users responsible for verifying AI output is a policy that will eventually produce an embarrassing client-facing mistake.
6. Reporting and incident response. Name the channel for employees to report suspected AI incidents (a Copilot that behaved oddly, a colleague's inappropriate use, accidental pasting of confidential data into a public LLM). Treat self-reports with proportional response — a scared employee who self-reports is an intelligence source, not a disciplinary target. The goal is to learn about incidents as they happen, not to prosecute the reporters.
CONTROLS
Technical Controls
Policy without enforcement is aspiration. Three technical control categories turn an AUP into an operational program.
DLP on the Egress Path
Deploy DLP rules that inspect content submitted to AI endpoints for classified data types — source code markers, customer record patterns, payment card formats, regulated content categories. When a violation is detected, block the submission, notify the user with an explanation, and log the event. Microsoft Purview, Forcepoint, and Netskope all ship with AI-endpoint-aware rule templates as of 2026.
URL Filtering and Warning Banners
Block consumer AI endpoints on corporate networks when a sanctioned alternative exists. Use warning banners (“this site processes data outside the corporate environment”) for lower-risk cases where full blocking is not yet warranted. Allowlist sanctioned endpoints explicitly so that policy exceptions are visible and auditable.
Enterprise LLM Allowlist and Tenant-Bound Access
Deploy enterprise Copilot, ChatGPT Enterprise, or Claude for Work with tenant-bound access that enforces SSO, conditional access, and data residency. Train users that the enterprise-sanctioned tool is the one they should use, and make it at least as good an experience as the consumer alternative. Friction is the enemy of policy compliance — an enterprise Copilot deployment that is harder to access than public ChatGPT will drive users back to shadow AI despite the policy.
CULTURE
Training and Culture Change
The most durable shadow AI defense is a workforce that understands the risk and makes the right choice because they want to, not because they have to. This is a training and communication program, not a tooling purchase. Organizations that lead with policy and blocking without the cultural layer produce resentment and workarounds. Organizations that lead with communication and sanctioned alternatives produce compliance.
Effective AI training programs share a common structure: one 30-minute module at onboarding that covers the risk, the policy, and the sanctioned tools; quarterly refreshers keyed to specific emerging risks or incidents in the news; a simple channel (a dedicated Teams channel, an email alias) where employees can ask questions and raise novel situations; and leadership communication from the CEO or CIO explicitly modeling correct behavior — including acknowledging when and how leadership uses AI tools, which signals the organization's values.
INCIDENT RESPONSE
When Shadow AI Has Already Exposed Data
Assume it has already happened. Run the five-step response playbook proactively for the first discovered incident, then let that serve as the template for subsequent incidents.
Step 1: Scope the exposure. What data was submitted? To which vendor? From which user account? Over what timeframe? DNS and CASB logs are the primary forensic sources. Interview the employee if cooperative; use endpoint artifacts if not.
Step 2: Assess legal and regulatory obligations. GDPR Article 33 mandates 72-hour breach notification for personal data exposure. Wisconsin and most U.S. states have breach notification laws for regulated personal information. Customer contracts frequently mandate notice for confidential data exposure. Healthcare clients have HIPAA implications; defense contractors have DFARS cyber incident reporting obligations. General counsel drives this step; IT and security support.
Step 3: Request deletion from the AI vendor. Where vendor terms of service permit, formally request deletion of the submitted data from training corpora and conversation history. OpenAI, Anthropic, and Google all have data subject request processes. Document the request and the response for audit.
Step 4: Rotate and remediate. If credentials, API keys, or other secrets appeared in the exposed content, rotate them. If source code was exposed, conduct a secrets scan on the exposed content and rotate anything found. Treat the AI vendor's training corpus as a potential breach source.
Step 5: Lessons-learned retrospective. Run a blameless post-incident review focused on the control gaps that allowed the exposure. Update policy, tighten DLP rules, expand training content, and make the sanctioned alternative more accessible. File the retrospective — auditors and regulators increasingly ask for evidence of institutional learning after AI incidents.
ARMORSTACK APPROACH
How Armorstack Handles Shadow AI Risk
Shadow AI governance is a cross-portfolio engagement at Armorstack. VERITY leads the advisory work: AUP development, governance design, board-level AI risk reporting, sanctioned-tool selection. CORE manages the technical implementation: DLP tuning, CASB deployment, identity integration for enterprise LLM access, browser-extension hardening. SENTRY operates the detection and incident response layer: continuous monitoring of AI traffic, alerting on policy violations, running the incident response playbook when exposures occur.
Clients typically engage Armorstack with an initial Shadow AI Discovery — a two-to-four-week assessment that produces a baseline inventory of AI usage across the organization, a mapped risk register, and a prioritized remediation roadmap. From there the managed engagement folds into ongoing VERITY/SENTRY operations. The philosophical commitment: shadow AI is not a project you finish. It is a steady-state risk surface you manage forever, and the organizational capability to manage it is what you are actually building.
FREQUENTLY ASKED
Shadow AI: Q&A
What exactly counts as shadow AI?
Shadow AI is any AI tool or service an employee uses for work purposes without IT or security approval. The most common examples are public ChatGPT, Claude, Gemini, Perplexity, and browser-extension AI assistants accessed through personal accounts on corporate devices or personal devices with corporate data. It is the AI equivalent of shadow IT and shares the same root cause: employees need capability faster than governance can sanction it.
How is shadow AI different from shadow IT?
Shadow IT typically involves unsanctioned SaaS applications that store data in a third-party system. Shadow AI adds a distinct risk: the AI vendor may use the submitted data to train future models, meaning data can leak into outputs seen by other customers. Samsung's 2023 source-code leak and the Amazon internal Q&A leak in 2024 both illustrate this training-data-contamination class of risk, which has no direct equivalent in traditional shadow IT.
What is the fastest way to detect shadow AI in our environment?
DNS and proxy log analysis are the fastest first pass. Query outbound traffic to the top 30 consumer AI domains (chat.openai.com, claude.ai, gemini.google.com, perplexity.ai, copilot.microsoft.com in non-tenant contexts, etc.) and correlate with user identity. A CASB like Netskope or Microsoft Defender for Cloud Apps accelerates this and adds policy enforcement. Expect to discover 5 to 15 times more shadow AI usage than leadership expects.
Should we ban public AI tools entirely?
Outright bans rarely work — they drive shadow usage underground onto personal devices where detection is impossible. The credible playbook is a sanctioned enterprise alternative (Microsoft 365 Copilot, enterprise ChatGPT, enterprise Claude), paired with a clear acceptable use policy, technical blocks on consumer AI when sanctioned equivalents exist, and user training on why the distinction matters. The goal is to channel demand, not suppress it.
How should we respond if shadow AI has already exposed confidential data?
Treat it as a data breach and work the playbook: identify what data was exposed and to which vendor, assess legal and contractual obligations (GDPR breach notification, state data breach laws, customer contractual notice requirements), request deletion from the AI vendor where terms of service permit, document the exposure for audit, rotate any credentials that may have been in the leaked content, and run a lessons-learned retrospective that drives policy and technical updates.
Want to know what shadow AI is already running in your environment?
Armorstack's Shadow AI Discovery produces a baseline inventory, risk register, and remediation roadmap in two to four weeks. Most clients discover 5 to 15 times more shadow AI usage than leadership expected. No sales pitch, no obligation.