NIST AI Risk Management. Trustworthy AI Governed by Design.
AI Risk Management Framework
NIST AI 100-1 implementation across the full AI lifecycle. Govern organizational AI risk culture. Map context-specific impacts and stakeholders. Measure trustworthiness characteristics with quantitative and qualitative metrics. Manage risk treatment with continuous monitoring. Governance evidence generated from connected infrastructure, not retrospective documentation.
AI Governance
AI risk management starts with governance structure. Not a checklist of ethical principles.
The NIST AI Risk Management Framework defines four core functions for managing AI risk across the entire lifecycle: Govern, Map, Measure, Manage. Most organizations treat AI governance as a policy statement. They publish responsible AI principles, designate an ethics board, and consider the matter addressed. The AI RMF requires more. It requires organizational structures that integrate AI risk into existing enterprise risk management, measurement systems that quantify trustworthiness characteristics, and management processes that respond to measured risks with documented treatment decisions. Redoubt Forge operationalizes that structure. Connect your AI systems, assess against all four functions, and generate governance evidence from your running infrastructure.
The NIST AI Risk Management Framework (AI 100-1) was published in January 2023 as a voluntary, rights-preserving framework for organizations designing, developing, deploying, evaluating, or acquiring AI systems. It emerged from Executive Order 13960 (Promoting the Use of Trustworthy Artificial Intelligence in the Federal Government, December 2020) and was further reinforced by Executive Order 14110 (Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence, October 2023). The framework is not a compliance mandate in the traditional regulatory sense. It is a risk management structure that organizations adopt to demonstrate responsible AI governance. Federal agencies are directed to align their AI activities with the framework. Private sector organizations increasingly adopt it as a governance baseline, particularly those operating in regulated industries, those pursuing federal contracts involving AI systems, and those subject to emerging state-level AI legislation that references the NIST framework as a recognized standard.
The AI RMF identifies seven characteristics of trustworthy AI systems. Valid and reliable: the system performs as intended under expected and unexpected conditions. Safe: the system does not endanger human life, health, property, or the environment. Secure and resilient: the system withstands adversarial attacks, maintains confidentiality and integrity of data, and recovers from disruptions. Accountable and transparent: the organization can explain who is responsible for AI decisions and how those decisions are made. Explainable and interpretable: stakeholders can understand the AI system's outputs and the reasoning behind them. Privacy-enhanced: the system protects individual privacy throughout data collection, processing, and output generation. Fair with harmful bias managed: the system does not produce outputs that systematically disadvantage particular groups, and mechanisms exist to detect and mitigate bias when it occurs. These characteristics are not independent. They interact, compete, and require trade-off decisions that the governance structure must address.
Unlike prescriptive frameworks such as NIST 800-53, the AI RMF does not define a catalog of specific controls with implementation requirements. It provides a structure for AI governance organized into four core functions: Govern, Map, Measure, and Manage. Each function contains categories and subcategories that describe outcomes the organization should achieve, but the specific mechanisms for achieving those outcomes are left to the organization's discretion based on its risk context. This flexibility is both the framework's strength and its implementation challenge. Organizations must determine which subcategories apply to their AI systems, define metrics and thresholds for measuring trustworthiness, and establish governance processes that integrate AI risk management with their existing enterprise risk management structures. The framework applies across the full AI lifecycle: from initial conception and design through development, testing, deployment, operation, monitoring, and eventual decommissioning. It applies to any organization that develops AI systems, deploys third-party AI systems, or uses AI capabilities embedded in products and services they acquire.
AI systems introduce risk categories that traditional cybersecurity and compliance frameworks were never designed to address. Bias in training data or model architecture produces systematically unfair outputs that affect real people: loan denials, hiring decisions, risk scores, medical recommendations. Hallucination occurs when generative models produce confident, plausible outputs that are factually wrong, creating liability when those outputs inform business decisions or public-facing responses. Model drift degrades performance over time as the statistical relationships in production data diverge from training data distributions, causing a system that tested well to perform poorly after deployment. Opacity in complex model architectures makes it difficult or impossible to explain why a particular output was generated, undermining accountability and stakeholder trust. Autonomy risk increases as organizations delegate consequential decisions to AI systems without adequate human oversight mechanisms. These risks compound when AI systems interact with other AI systems, creating cascading failure modes that no single system's risk assessment anticipated.
Traditional security frameworks address confidentiality, integrity, and availability of information systems. They cover access control, encryption, audit logging, incident response, and vulnerability management. These controls remain necessary for AI systems, but they are not sufficient. NIST 800-53 does not contain controls for measuring model bias. FedRAMP does not require documentation of training data provenance. CMMC does not assess whether an AI system's outputs are explainable to affected stakeholders. SOC 2 Trust Service Criteria do not address the unique supply chain risks of pre-trained foundation models, fine-tuned adapters, or third-party inference endpoints. An AI system can satisfy every control in NIST 800-53 Moderate and still produce biased outputs, hallucinate in production, drift from its validated performance baseline, and operate with no mechanism for affected parties to understand or contest its decisions. The security posture is intact. The AI risk posture is unmanaged. Organizations need both: traditional security controls AND AI-specific governance structures that address the risks unique to AI systems.
The regulatory landscape reinforces this urgency. The EU AI Act establishes mandatory requirements for high-risk AI systems, including conformity assessments, transparency obligations, and human oversight mechanisms, with substantial penalties for non-compliance. In the United States, state-level AI legislation is accelerating: Colorado's SB 24-205 requires developers and deployers of high-risk AI systems to implement risk management frameworks. Multiple states have proposed or enacted laws governing automated decision-making, algorithmic discrimination, and AI transparency. Federal agencies are subject to OMB memoranda requiring AI governance aligned with the NIST AI RMF. Executive Order 14110 directs agencies to develop AI governance structures and report on AI-related risks within their operations. Defense contractors face emerging requirements to demonstrate responsible AI governance for systems used in defense applications. Organizations that wait for mandatory compliance deadlines to establish AI governance structures will face the same scramble that characterized early CMMC adoption: insufficient time, insufficient tooling, and insufficient institutional knowledge to stand up governance programs under regulatory pressure. The time to build AI governance infrastructure is before the mandate arrives, not after.
The Govern function is the foundation of the AI RMF. It is the only function that applies across the entire organization rather than to individual AI systems. Govern establishes the policies, processes, roles, and organizational culture that enable effective AI risk management. This includes defining who has authority over AI risk decisions, how AI risk management integrates with existing enterprise risk management structures, what organizational values guide AI development and deployment, and how the organization ensures that personnel involved in AI activities have the knowledge and skills required. The Govern function contains six categories: GV.1 (policies and procedures), GV.2 (accountability structures), GV.3 (workforce diversity and AI expertise), GV.4 (organizational culture), GV.5 (stakeholder engagement processes), and GV.6 (risk management integration). Each category contains subcategories that define specific governance outcomes. Unlike the other three functions, Govern is not applied per-system. It is applied once at the organizational level and inherited by every AI system the organization operates.
Effective AI governance requires integration with existing risk management structures, not a parallel bureaucracy. Organizations already have enterprise risk management programs, information security governance committees, compliance functions, and change management processes. The AI RMF's Govern function asks organizations to extend those structures to address AI-specific risks rather than creating isolated AI ethics boards with no authority or operational connection. AI risk policies must align with the organization's existing risk appetite statements. AI risk decisions must flow through established governance hierarchies. AI risk metrics must integrate with existing risk dashboards and reporting cycles. AI workforce competency requirements must map to existing human resources frameworks for training, certification, and performance evaluation. Organizations that create a separate AI governance silo discover that their AI policies conflict with their security policies, their AI risk appetite is never reconciled with their enterprise risk appetite, and their AI governance committee issues guidance that operational teams have no mechanism to implement.
Rampart captures the organizational governance structure as a living document set within the compliance workspace. AI governance policies, role definitions, accountability matrices, and stakeholder engagement records are stored as structured evidence with version history and approval chains. Artificer guides organizations through governance establishment by asking targeted questions: Who has authority to approve AI system deployments? What criteria determine whether an AI application is high-risk? How does your organization define acceptable bias thresholds? What mechanisms exist for affected stakeholders to contest AI-generated decisions? Artificer adapts its questions based on the organization's existing governance maturity: an organization with an established enterprise risk management program receives questions focused on extending that program to cover AI risks, while an organization building governance from scratch receives foundational questions about risk appetite, organizational structure, and decision authority. The governance documentation in Rampart is not static policy text. It is connected evidence: when a governance policy requires annual review of AI risk assessments, Sentinel tracks the review schedule and escalates when the deadline approaches without a completed review.
The Map function identifies the context in which an AI system operates. Context determines risk. The same AI model deployed for internal document summarization carries different risks than the same model deployed for medical triage recommendations. Map requires organizations to articulate the AI system's intended purpose, the stakeholders affected by its outputs (both directly and indirectly), the known limitations of the underlying technology, the potential beneficial and harmful impacts across different populations, and the legal and regulatory requirements that apply given the deployment context. The Map function contains five categories: MP.1 (intended purpose and context), MP.2 (interdependencies and stakeholder identification), MP.3 (known risks and benefits documentation), MP.4 (risk prioritization), and MP.5 (impact characterization). Each category requires the organization to move beyond abstract risk statements and document the specific risks that apply to this specific AI system in this specific deployment context.
Risk identification for AI systems requires domain expertise that sits at the intersection of machine learning, the application domain, and the affected population. A fraud detection model may exhibit higher false positive rates for transactions originating from specific geographic regions, creating disparate impact along demographic lines. A natural language processing system may perform poorly on non-standard English dialects, creating accessibility barriers for specific populations. A recommendation system may optimize for engagement metrics that conflict with user wellbeing. A predictive maintenance model trained on historical data from one facility may produce unreliable predictions when deployed to a facility with different equipment, operating conditions, or maintenance histories. These risks are not discoverable through generic security assessments. They require structured analysis of the AI system's training data, model architecture, deployment context, and affected stakeholder groups. Risk identification must also account for risks that emerge from interactions between AI systems, third-party model dependencies, and the cumulative impact of multiple AI-informed decisions on the same individuals over time.
Rampart provides structured templates for AI system context mapping that align with the Map function's categories and subcategories. Each AI system registered in the platform receives a context profile that captures its intended purpose, deployment environment, data sources, model architecture (at the level of detail needed for risk assessment, not proprietary model internals), known limitations documented by the development team, and identified stakeholder groups with their relationship to the system's outputs. Artificer helps identify context-specific risks by analyzing the system profile and asking probing questions: Does this system's output directly influence decisions about individuals? What happens when the system produces an incorrect output? Who is affected, and do they have a mechanism to identify and contest the error? Artificer draws on the organization's existing risk register and governance policies to suggest risks that may apply based on similar systems, the same data domains, or analogous deployment contexts. The risk identification process produces a structured risk register for each AI system that feeds directly into the Measure and Manage functions, creating a traceable chain from identified risk to measured metric to management action.
The Measure function quantifies AI risks identified during the Map phase. Measurement transforms abstract risk statements into actionable data: not "this model may exhibit bias" but "the false positive rate for demographic group A is 12% compared to 3% for demographic group B, exceeding the organization's defined threshold of 5% disparity." The Measure function contains four categories: MS.1 (appropriate methods and metrics), MS.2 (AI system evaluation and testing), MS.3 (metrics tracking and monitoring), and MS.4 (measurement approach feedback). Measurement must address each of the seven trustworthiness characteristics relevant to the system's context. Validity and reliability require performance metrics: accuracy, precision, recall, F1 scores, calibration curves, and out-of-distribution detection rates. Fairness requires disaggregated performance metrics across protected demographic groups: equalized odds ratios, demographic parity differences, and disparate impact ratios. Explainability requires interpretability measures: feature importance rankings, attention visualizations, counterfactual explanations, and stakeholder comprehension assessments.
Measurement is not a one-time validation exercise. AI systems operate in dynamic environments where data distributions shift, user behavior evolves, and the real-world context changes. A model that tests well before deployment may degrade within weeks as production data diverges from training distributions. Bias that was within acceptable thresholds at launch may amplify through feedback loops as the system's outputs influence the data it subsequently processes. Performance metrics computed on a held-out test set reflect the system's behavior on that specific data distribution; they make no guarantee about performance on tomorrow's production traffic. Continuous measurement requires infrastructure: monitoring pipelines that compute trustworthiness metrics on production data, alerting systems that fire when metrics breach defined thresholds, and comparison frameworks that track metric trajectories over time. The measurement infrastructure must also account for the quality and representativeness of the data used for evaluation. Metrics computed on biased evaluation datasets produce misleading results, and the organization must document the known limitations of its measurement approach as part of the Measure function's requirements.
Vanguard scans AI system components for security vulnerabilities, dependency risks, and code quality issues that affect trustworthiness. Vulnerable dependencies in model serving infrastructure affect the secure and resilient characteristic. Insufficient input validation in inference endpoints creates adversarial attack surfaces. Hardcoded credentials in model training pipelines compromise data privacy. Sentinel monitors AI system behavior in production through connected infrastructure: inference latency distributions, error rates, output confidence distributions, and data pipeline health indicators. When Sentinel detects anomalous patterns, such as a sudden shift in output distribution, a spike in low-confidence predictions, or degraded throughput on the inference endpoint, it maps the anomaly to the relevant trustworthiness characteristics and surfaces it in Citadel's action queue. Rampart tracks measurement results over time, maintaining a historical record of every trustworthiness metric for every AI system, so the organization can demonstrate to regulators, auditors, and stakeholders that measurement is continuous, not a point-in-time certification exercise.
The Manage function translates measured risks into treatment decisions. For each identified and measured risk, the organization must decide: mitigate (implement controls or modifications to reduce the risk), accept (acknowledge the risk within the organization's risk appetite with documented justification), transfer (shift the risk to another party through contractual mechanisms, insurance, or outsourcing), or avoid (eliminate the risk by not deploying the AI system or removing the risk-producing capability). The Manage function contains four categories: MG.1 (risk treatment and response), MG.2 (risk tracking), MG.3 (risk response documentation and communication), and MG.4 (AI system decommissioning and incident response). Treatment decisions must align with the governance policies established in the Govern function. A treatment decision that accepts a bias level exceeding the organization's defined threshold requires escalation through the governance hierarchy. A treatment decision that mitigates a privacy risk through data anonymization must be verified through the Measure function to confirm the mitigation is effective.
Continuous management of AI risk requires monitoring infrastructure that detects when risks evolve beyond their treated state. A bias risk that was mitigated through training data augmentation may re-emerge as production data distributions shift. A reliability risk that was mitigated through ensemble averaging may re-emerge as one model in the ensemble degrades faster than the others. A privacy risk that was mitigated through differential privacy mechanisms may re-emerge as new attack techniques reduce the effective privacy guarantee. Risk management is not a decision made at deployment time and filed in a governance document. It is an ongoing process that monitors treated risks, detects when treatment effectiveness degrades, escalates when metrics breach thresholds, and triggers re-evaluation of treatment decisions. The Manage function also requires organizations to plan for AI system decommissioning: what happens to the data, the model artifacts, the downstream systems that depend on the AI system's outputs, and the stakeholders who rely on its continued operation.
Sentinel monitors AI systems continuously and detects when measured risks drift beyond their treated parameters. When a trustworthiness metric breaches its defined threshold, Sentinel generates an event that maps to the specific Manage subcategory, triggering the response workflow documented in the governance structure. Rampart tracks every treatment decision with its justification, approval chain, and linked evidence: the measured risk level that prompted the decision, the treatment selected, the expected residual risk after treatment, and the monitoring plan for verifying treatment effectiveness. When treatment decisions require updating, Rampart maintains the full decision history so the organization can demonstrate to regulators that risk management is deliberate and documented, not reactive. Citadel surfaces management actions in the action queue, prioritized by risk severity and organizational impact. When Sentinel detects drift in a trustworthiness metric, the action appears in Citadel with the measured values, the defined thresholds, the treatment history, and recommended response options generated by Artificer based on the organization's governance policies and the specific risk context.
Model cards document an individual model's intended use, performance metrics, known limitations, training data characteristics, and evaluation results. System cards document the complete AI system: the model or models it incorporates, the data pipelines that feed them, the business logic that interprets their outputs, the human oversight mechanisms, and the deployment infrastructure. Impact assessments evaluate the potential effects of the AI system on individuals, communities, and society, including disproportionate impacts on specific demographic groups. These are not abstract documentation exercises. They are governance artifacts required by the AI RMF, referenced by emerging regulations, and increasingly demanded by enterprise customers, federal contracting officers, and external auditors. An organization that cannot produce a current model card for a deployed AI system cannot demonstrate the transparency and accountability characteristics required by the Govern function. An organization that cannot produce a current impact assessment cannot demonstrate that it has completed the Map function.
Documentation requirements span the entire AI lifecycle, and they compound. During development: training data documentation (provenance, known biases, representativeness assessments, preprocessing decisions), architecture decisions (model selection rationale, hyperparameter choices, trade-off analysis between trustworthiness characteristics), and validation results (test set composition, disaggregated performance metrics, adversarial robustness testing). During deployment: deployment context documentation (production environment specifications, access controls, monitoring configuration, rollback procedures), human oversight documentation (who monitors the system, what authority they have to intervene, what triggers intervention), and stakeholder notification documentation (how affected parties are informed that AI is involved in decisions that affect them). During operation: monitoring records (continuous trustworthiness metrics, incident reports, drift detection logs), update records (model retraining events, data pipeline modifications, configuration changes), and governance records (periodic review results, treatment decision updates, policy compliance assessments). This documentation must be maintained, not just created. A model card written at development time and never updated after deployment fails to satisfy the AI RMF's continuous governance requirements.
Artificer generates governance documentation from evidence collected across the platform. Model cards are assembled from development artifacts, validation results stored in Rampart, and production metrics tracked by Sentinel. System cards incorporate infrastructure information from Garrison's inventory, security scan results from Vanguard, and access control configurations from connected identity providers. Impact assessments draw on the risk register maintained in Rampart, stakeholder mappings from the Map function, and measurement results from the Measure function. Artificer does not fabricate documentation. It synthesizes governance artifacts from evidence that already exists in the platform, then presents them for human review and approval. When the underlying evidence changes, such as a model retrained on new data, a deployment configuration modified, or a performance metric trending downward, Artificer flags the affected governance documents as requiring update. The documentation lifecycle is tied to the evidence lifecycle, ensuring that governance artifacts remain current without requiring manual reconciliation between what the documents say and what the systems actually do.
The AI RMF does not exist in isolation. It maps to NIST 800-53 through AI governance overlays that extend traditional security controls to address AI-specific risks. NIST 800-53 controls in the Program Management (PM), Risk Assessment (RA), System and Services Acquisition (SA), and System and Information Integrity (SI) families contain provisions that apply to AI systems when interpreted through the AI RMF lens. PM-9 (Risk Management Strategy) extends to encompass AI risk management strategy. RA-3 (Risk Assessment) extends to encompass AI-specific risk identification, including bias, drift, and opacity. SA-4 (Acquisition Process) extends to encompass due diligence on third-party AI components, pre-trained models, and inference services. SI-4 (System Monitoring) extends to encompass continuous monitoring of AI trustworthiness metrics. Organizations that have already implemented NIST 800-53 controls have a foundation for AI governance. The AI RMF overlay adds the AI-specific subcategories, metrics, and processes that traditional security controls do not address.
The relationship between the AI RMF and emerging AI-specific regulations is increasingly consequential. NIST AI 600-1 (the Generative AI Profile) extends the AI RMF with specific guidance for generative AI systems, addressing risks unique to large language models, image generators, and other generative technologies: content provenance, synthetic media identification, training data copyright considerations, and the amplified potential for misinformation. NIST IR 8596 provides a crosswalk between the AI RMF and international AI governance frameworks, supporting organizations that operate across multiple jurisdictions. The EU AI Act mandates conformity assessments, transparency obligations, and human oversight requirements for high-risk AI systems. Organizations that implement the AI RMF proactively position themselves to satisfy EU AI Act requirements because the underlying governance structures, measurement systems, and documentation practices overlap substantially. State-level AI legislation in the United States increasingly references the NIST AI RMF as a recognized governance standard, creating a de facto compliance benefit for organizations that adopt the framework before mandatory requirements take effect.
AI governance does not replace existing compliance obligations. It extends them. Rampart maintains the cross-reference engine that connects AI RMF subcategories to controls in every other framework in the catalog. When an organization satisfies Govern subcategory GV.1.1 (legal and regulatory requirements are identified) for an AI system, Rampart maps that governance activity to related controls in NIST 800-53 (PM-9, PM-28), FedRAMP (where applicable), and the EU AI Act's governance requirements. When the organization completes a Map function assessment for an AI system, the context documentation simultaneously contributes evidence toward NIST 800-53 RA-3 (Risk Assessment) and any sector-specific requirements that mandate AI impact assessments. The cross-framework computation operates continuously: as AI governance evidence accumulates in the platform, Rampart recalculates readiness percentages across all activated frameworks. An organization that establishes AI governance under the AI RMF does not start from zero when EU AI Act compliance becomes mandatory or when a federal contract requires documented AI risk management. The governance infrastructure is already in place. The evidence already exists. The cross-framework mappings are already computed.
Something is being forged.
The full platform is under active development. Reach out to learn more or get early access.