NIST AI RMF in Practice: GOVERN, MAP, MEASURE, MANAGE in an Agile Team

The NIST AI Risk Management Framework 1.0, published in January 2023, is the closest thing the United States has to a national AI governance standard. It has been referenced in federal AI executive orders, adopted by CISA and other federal agencies, and increasingly specified in enterprise procurement requirements. Understanding it is no longer optional for teams building AI products for regulated industries or government customers.

The problem is that NIST AI RMF documentation reads like it was written for the board of a large financial institution, not for the engineering team building the model. This guide translates the four functions—GOVERN, MAP, MEASURE, MANAGE—into concrete artefacts and sprint-cycle activities that a 5–15 person AI product team can actually implement.

The NIST AI RMF Structure

NIST AI RMF 1.0 organises AI risk management into four functions. These are not sequential phases—they are concurrent and interdependent activities that together form a risk management programme:

GOVERN: Establishes organisational culture, policies, processes, and accountabilities for AI risk management. This is the governance layer that makes the other three functions possible.
MAP: Contextualises AI risks for specific systems. Identifies the purpose, deployment context, affected populations, and likely risks for each AI system in scope.
MEASURE: Analyses and quantifies AI risks using defined metrics. Tracks performance, bias, drift, and robustness on an ongoing basis.
MANAGE: Responds to identified risks. Maintains a risk register, implements mitigations, accepts residual risks, and manages incidents.

GOVERN is unusual because it wraps around the other three functions—it sets the organisational context that makes MAP, MEASURE, and MANAGE coherent across multiple systems and teams. Without GOVERN, you have ad hoc risk activities on individual systems but no organisational learning, no consistency, and no escalation path.

GOVERN in Practice

GOVERN requires your organisation to establish AI risk management as an explicit organisational practice with ownership, policy, and process. The minimum viable GOVERN artefacts for a product team are:

1. AI Use Case Policy

A short (1–3 page) document that defines:

What types of AI use cases are in scope for risk management (typically: any AI system that makes or substantially informs decisions affecting users, customers, or third parties).
The approval process for new AI use cases (who reviews, what criteria trigger escalation to senior review).
The risk tier definitions your team uses (high / medium / low, or a more granular scale).
The review cadence for existing AI systems.

2. AI Ownership Matrix

A table or RACI that maps each AI system to: the product owner accountable for its outcomes, the engineering lead responsible for its technical operation, the compliance or legal contact for risk escalation, and the executive sponsor who accepts residual risk. Without this matrix, risk findings fall into organisational gaps where nobody acts on them.

3. Incident Escalation Path

A defined process for what happens when an AI system produces harmful, biased, or unexpected outputs at scale. Who is notified? Within what timeframe? What is the decision authority to suspend the system? This does not need to be elaborate—a one-page flowchart is sufficient for most teams—but it must exist and be known to all relevant staff.

GOVERN Integration in Sprints

GOVERN activities are mostly one-time setup plus regular reviews. In a two-week sprint cycle: create the policy and ownership matrix once, review and update them quarterly, and run an annual GOVERN audit to confirm they remain accurate and are being followed.

MAP in Practice

MAP is where risk management becomes system-specific. For each AI system in scope, MAP requires you to develop a contextual understanding of the system's purpose, operating environment, affected populations, and risk profile. The key artefact is a per-system context card.

The AI System Context Card

This is a structured 1–2 page document (or structured database record) for each AI system, containing:

System name and version
Intended purpose: What task does the system perform? What decisions does it inform or make?
Data sources: What data does the system consume? Where does it come from? Who controls it?
Affected populations: Who is impacted by the system's outputs? Are any groups potentially disadvantaged?
Deployment context: Where and by whom is the system used? Is it internal-only, B2B, or consumer-facing?
Risk tier: High, medium, or low based on your GOVERN policy criteria.
Known limitations: What is the system not designed to do? What conditions cause degraded performance?
Dependencies: Third-party models, APIs, or data providers the system depends on.

MAP Integration in Sprints

Create a context card before deploying any new AI system. Update it when the system's purpose, data sources, or deployment context materially changes. In a sprint cycle, this typically happens as part of the discovery phase for new AI features: the context card is created during sprint planning and reviewed by the product owner and engineering lead before the feature enters development.

MEASURE in Practice

MEASURE is where risk becomes quantified. It requires ongoing evaluation of AI system performance against defined metrics across multiple risk dimensions. For a detailed treatment of MEASURE metrics and tooling, see our dedicated MEASURE function guide. Here is the sprint-level integration:

Metrics to Track Per System

Accuracy: F1, precision/recall, or AUC depending on the task. Tracked on a representative held-out test set updated periodically.
Fairness: Demographic parity, equalised odds, or individual fairness metrics depending on the use case. Measured across protected characteristics relevant to the deployment context.
Drift: Population stability index (PSI) or KL divergence on input distributions. Alerts when inputs shift significantly from training distribution.
Robustness: Out-of-distribution performance, adversarial test results (at least annually for high-risk systems).

MEASURE Integration in Sprints

Set up continuous monitoring pipelines in the first sprint after deployment. In each sprint retrospective, review the monitoring dashboard for any metric alerts triggered in the previous two weeks. Any metric breach triggers a risk register entry (MANAGE). Quarterly: run a full MEASURE review including fairness metrics and robustness tests. Annually: update the held-out test set with recent production data and re-evaluate all metrics.

MANAGE in Practice

MANAGE is the response function. It requires maintaining a risk register of identified AI risks, implementing mitigations, formally accepting residual risks, and running an incident response process when things go wrong in production.

The AI Risk Register

A risk register entry for an AI system should contain:

Risk description: what could go wrong and under what conditions.
Risk source: where did this risk come from (MAP assessment, MEASURE alert, incident, third-party audit)?
Likelihood and impact ratings (qualitative: low/medium/high).
Current mitigations: what controls are in place?
Residual risk rating: likelihood and impact after mitigations.
Risk owner: who is responsible for tracking this risk?
Residual risk acceptance: who signed off that the residual risk is acceptable? Date of sign-off.
Review date: when will this risk entry be reassessed?

Incident Response Playbook

A brief playbook for AI incidents should define: what constitutes an AI incident (harmful output, significant bias discovery, data breach involving training data, adversarial attack); how to detect incidents (monitoring alerts, user reports, internal testing); who is notified and in what timeframe; what the suspension criteria are; and how post-incident review feeds back into MAP and MEASURE.

MANAGE Integration in Sprints

The risk register is reviewed in sprint planning to confirm no open high-risk items are blocking deployment. New risks identified during the sprint (from MEASURE alerts, user reports, or code review) are added to the register before the sprint closes. Residual risk sign-off for new AI system deployments happens in the sprint review before release.

Tooling That Helps

Credo AI: Policy-driven governance platform that can hold your GOVERN policies, MAP context cards, and MANAGE risk register, with tracking against NIST AI RMF subcategories. Best for teams with regulatory exposure requiring documented compliance evidence.
Weights & Biases: Experiment tracking that naturally captures MEASURE data (model performance metrics, training run metadata). Not a governance platform, but integrates well with one.
Evidently AI: Open-source library for MEASURE activities: data drift detection, model performance monitoring, fairness metrics. Produces reports that can be exported as MEASURE artefacts.
AWS AI Service Cards / Google Model Cards: Useful MAP artefact templates if you are building on top of cloud AI services. Supplement with your own context card for the system you are building.
Notion / Confluence + a risk register template: For smaller teams, a structured document system with a risk register template is sufficient for MANAGE. The tooling matters less than the process consistency.

Common Mistakes

Conflating GOVERN with MAP: GOVERN is organisational policy; MAP is system-specific context. Teams that only do MAP without GOVERN have no escalation path and no consistency across systems. Teams that only do GOVERN without MAP have policies with nothing underneath them.
Treating MEASURE as a one-time evaluation: Running bias tests at model launch and never again is one of the most common failures. MEASURE is explicitly an ongoing function—models drift, data distributions shift, and fairness properties can degrade silently over time.
Risk register entries without owners: A risk with no named owner is a risk that nobody will act on. Every entry must have a person (not a team) responsible for it.
Residual risk acceptance without documented sign-off: Engineering leads should not be the final authority on residual risk acceptance for high-risk AI systems. The GOVERN ownership matrix should define who has the authority to accept risk at each tier.
Copying another team's context card: MAP is system-specific. A context card for a churn prediction model cannot be reused for a credit scoring model, even if they use similar techniques. Affected populations, data sources, and risk profiles are different.

NIST AI RMF in Practice: How to Run GOVERN, MAP, MEASURE, MANAGE in an Agile Team