AI Vendor Risk Assessment: A Questionnaire That Works
What an AI vendor risk assessment is, and why generic TPRM is not enough
An AI vendor risk assessment is a structured review of a third party whose product touches your data via large language models, machine learning systems, or autonomous agent functionality. It extends — but does not replace — the third-party risk management (TPRM) workflow your organization is already running.
The reason a generic TPRM questionnaire is no longer sufficient is straightforward: the questions that matter for AI vendors (training data handling, model isolation, prompt logging, retention of inferred data, fine-tuning rights, sub-processor disclosure for the underlying foundation model) are not in standard TPRM templates. The result is an assessment process that approves AI vendors with the same depth of review as a static SaaS app — and absorbs the same level of risk the Observability Gap describes.
This piece publishes the 22-question assessment we issue on behalf of clients, with notes on what a defensible answer looks like.
Why it matters
Three reasons AI vendor risk warrants its own assessment track.
1. AI vendors handle your data differently than non-AI vendors. Prompts are a new class of data the vendor receives. Retrieval payloads are another. The vendor’s retention, logging, and use of those payloads — including for model training — needs to be reviewed before the contract closes, not after.
2. The vendor’s own dependencies are now part of your supply chain. Most AI features are built on a foundation model from a small number of providers. Your vendor’s posture is, in part, a function of their upstream provider’s posture. The questionnaire has to surface that chain.
3. Existing certifications do not cover the AI surface. A SOC 2 Type II report or an ISO 27001 certification does not, by itself, demonstrate that the vendor’s AI features are being run with appropriate isolation, logging, or training-data controls. Those certifications are necessary but not sufficient for AI vendors.
When to issue the assessment
Three triggers should escalate a vendor into the AI track.
- The vendor has launched any LLM or generative AI feature that touches your data, prompts, or documents — even if the feature is “optional.”
- The vendor’s contract or DPA was last reviewed before mid-2024, the period in which most SaaS platforms began shipping AI features. Existing agreements likely do not contemplate model-layer use.
- The vendor handles regulated data (PHI, PCI, CUI, FERPA, GLBA) and has any AI feature in scope. Sector-specific obligations make the AI assessment non-optional.
- Critical pass rate. All C questions answered clearly with defensible posture. Anything less is a hold.
- High coverage rate. Target 80%+ on H questions. Below 60% triggers an executive escalation.
- Documentation completeness. SOC 2, DPA, BAA, sub-processor list, model card, AI-specific addendum all available.
- The full AI Security guide — the pillar resource
- The Observability Gap — the strategic frame
- Shadow AI Detection — finding the vendors to assess
- NIST AI RMF Implementation — the governance layer
- VERITY portfolio — vCISO and advisory services
The output of a Shadow AI Discovery typically produces the working list. See Shadow AI Detection for the discovery method.
The 22-question AI vendor risk assessment
We organize the questionnaire into five sections. Each question has a tier: C (critical — disqualifying if unanswered or weak), H (high), M (medium). A defensible mid-market vendor should answer all C questions clearly and provide documentation on H questions.
Section 1 — Data handling (questions 1-6)
1. (C) What customer data does the AI feature receive?
Look for: explicit list — prompts, attached documents, retrieved records, inferred metadata, conversation history. A vendor that cannot enumerate this list is not ready for assessment.
2. (C) Is customer data ever used to train, fine-tune, or improve the underlying model?
Look for: a clear “no” by default, with an opt-in path documented if applicable. The vendor should distinguish between aggregated telemetry, fine-tuning data, and reinforcement-learning feedback data.
3. (C) What is the retention period for prompts, outputs, and any associated metadata?
Look for: stated retention period (e.g., 30 days, 90 days), with the ability to negotiate shorter retention contractually.
4. (H) Is data segregated between customers at the model layer?
Look for: confirmation that one customer’s prompts cannot influence another customer’s outputs (no shared context windows, no shared retrieval indexes).
5. (H) Where is data processed and stored, and is regional residency configurable?
Look for: explicit list of regions, including foundation-model provider regions, and documented residency options for regulated workloads.
6. (M) What encryption is applied in transit and at rest, including for prompts and embeddings?
Look for: TLS 1.2+ in transit, AES-256 at rest, customer-managed key (CMK) options for sensitive deployments.
Section 2 — Model and architecture (questions 7-11)
7. (C) Which foundation model(s) power the AI feature, and which provider hosts the model?
Look for: explicit named models (e.g., Claude 4.x, GPT-4o, Gemini 1.5) and named hosting providers (Anthropic, OpenAI, AWS Bedrock, Azure OpenAI, etc.). The chain should be transparent.
8. (H) What is the vendor’s process for model updates, and how is customer behavior tested before rollout?
Look for: documented model-update governance, customer notification on material changes, regression-testing process.
9. (H) Is the AI feature deterministic or stochastic, and how is variability managed for high-stakes use cases?
Look for: acknowledgement of stochasticity, documented temperature/sampling controls, deterministic-mode availability if relevant.
10. (M) Are there documented controls for prompt injection and indirect prompt injection?
Look for: explicit references to layered controls — input filtering, retrieval-source allowlisting, output filtering, action gating. See Prompt Injection Prevention for the eight-layer taxonomy.
11. (M) For agent or tool-use features, what is the action authorization model?
Look for: per-action authorization, allowlisted action set, audit logs, customer-controlled approval gates for high-impact actions.
Section 3 — Security and incident response (questions 12-15)
12. (C) Will the vendor provide their SOC 2 Type II report, ISO 27001 certificate, and any AI-specific attestations?
Look for: current report (within 12 months), no major exceptions, AI-relevant control coverage.
13. (C) What is the AI-specific incident response process?
Look for: notification SLA (typically 72 hours or better), explicit handling of model-layer incidents (data leakage via inference, prompt injection compromise, supply chain compromise of the foundation model).
14. (H) Is there a published vulnerability disclosure program covering the AI feature?
Look for: VDP scope explicitly including the AI surface, named contact, response SLA.
15. (H) How is access to prompts, outputs, and model logs by vendor employees controlled and logged?
Look for: least-privilege access, MFA, audit logs of vendor employee access to customer data.
Section 4 — Compliance and governance (questions 16-19)
16. (C) Does the vendor support contractual no-train terms, BAA, DPA addenda for AI processing?
Look for: standard willingness to execute these documents, with AI-specific terms (no training, retention, sub-processor disclosure).
17. (C) Are sub-processors — including the foundation model provider — explicitly disclosed and contractually bound?
Look for: complete sub-processor list, disclosed in contract or DPA, with customer notification on additions.
18. (H) Is the vendor aligned with NIST AI RMF, EU AI Act risk tiers, or comparable frameworks?
Look for: explicit framework mapping, model cards, governance documentation. See NIST AI RMF Implementation.
19. (M) For regulated data (PHI, PCI, CUI, FERPA), are AI-specific compliance controls in place?
Look for: sector-specific certifications, BAA terms that cover AI processing, documented evidence of regulator-acceptable handling.
Section 5 — Operational (questions 20-22)
20. (H) What logging and audit-trail data is available to the customer for the AI feature?
Look for: customer-accessible logs of prompts, outputs, retrievals, agent actions. Without this, Layer 2 of the Observability Gap cannot be closed for that vendor.
21. (M) What is the customer’s ability to disable, restrict, or audit the AI feature post-purchase?
Look for: feature-level disable, role-based access controls, ability to restrict the feature to specific user populations.
22. (M) What is the vendor’s roadmap and end-of-life policy for the AI feature?
Look for: stable product positioning, customer-protective end-of-life policy, data export rights on sunset.
How to score and decide
We score the questionnaire on three axes:
The output is a tiered recommendation: approve, approve with conditions (e.g., contractual addenda, configuration restrictions, sunset clause), or reject. We then feed the result into the client’s existing TPRM register so AI risk integrates with the broader vendor risk view rather than running in a parallel spreadsheet.
Common questions
Q: Can we just bolt these questions onto our existing TPRM questionnaire?
A: Yes — and we encourage that. The point is not a separate process. It is an extended set of questions that get asked of any vendor in the AI track, integrated into the same TPRM workflow, scored against the same risk register.
Q: What if the vendor refuses to answer a C question?
A: Treat the refusal as the answer. A vendor who will not disclose foundation-model provider, training-data use, or sub-processor list is a vendor whose risk you cannot quantify. Hold the deal pending escalation.
Q: Our vendors keep saying “we use AWS Bedrock so we’re fine.” Is that sufficient?
A: Bedrock answers some questions (hosting, residency, encryption, no-train by default for foundation models). It does not answer the questions about the vendor’s own application-layer handling of prompts, outputs, retention, or agent actions. The vendor still owes you the application-layer answers.
Q: How often should we re-assess?
A: Annually as a baseline. Trigger an out-of-cycle reassessment on (a) any material model change, (b) any sub-processor change, (c) any AI-related incident affecting the vendor or its foundation-model provider.
Q: We have hundreds of vendors. Where do we start?
A: Risk-rank by data sensitivity, then by AI-feature criticality. Top decile gets the full 22-question assessment in cycle 1. Mid-tier gets a 10-question subset (the C questions). Tail-tier gets noted but deferred.
Q: Where does Armorstack help?
A: Two engagement types. (1) A questionnaire-as-a-service offering — we issue the assessment on the client’s behalf, score the responses, and feed results into their TPRM. (2) A vCISO-led TPRM-AI integration — embedding the AI track into the client’s existing TPRM operating model. Both run from the VERITY portfolio.
Related reading
Get help
If you want this 22-question assessment delivered against a specific vendor list, Armorstack runs the engagement as a fixed-scope service — we issue, score, and integrate the results into your TPRM register. Book a 30-minute discovery call at armorstack.ai/contact/ or call 877-890-5508.
Last reviewed: 2026-05-01. Authored by Dale Boehm, CEO Armorstack. CISA + CDPP.