Credo AI is a governance platform, not a model evaluation tool. That distinction matters enormously for how you set it up and what you should expect it to do. Its core workflow is: define your AI systems as records in the platform, attach those records to policy packs (frameworks like EU AI Act or NIST AI RMF), collect evidence against each policy control, and generate a compliance status report. The evidence collection is mostly manual or semi-automated via the Python SDK—Credo AI itself does not evaluate your model unless you push metrics to it.
This guide walks through the setup process for a team configuring Credo AI for EU AI Act and NIST AI RMF compliance, including the Python SDK integration pattern and the most common configuration mistakes.
Step 1: Create Your AI System Record
Every AI system you want to track in Credo AI must be registered as an AI system record. This is the central object in Credo AI's data model—everything else (policy packs, assessments, evidence) is linked to it.
Navigate to AI Systems → Create New and fill in:
- System name: Use a consistent naming convention (e.g.,
prod-credit-scoring-v2). This will be the reference in your technical file and any audit reports. - Description: Intended purpose, a brief technical description, and the business unit responsible. This maps to EU AI Act Article 11 (Annex IV section 1) and NIST AI RMF MAP context requirements.
- Risk tier: Credo AI supports custom risk tiers; align these with your GOVERN-level risk classification policy. If you have classified the system as high-risk under EU AI Act, note this in the system record.
- Owner: Assign the product owner or engineering lead responsible for this system's governance. This creates accountability linkage in the platform.
- Tags: Use tags to group systems by regulatory framework, business unit, or deployment environment. Useful for filtering compliance status across a portfolio.
Create one AI system record per distinct AI system. Do not aggregate multiple models or use cases into a single record—Credo AI's assessment and evidence model is designed per-system, and aggregating creates ambiguity that will cause problems during external audit.
Step 2: Connect Your Model via SDK
Without SDK integration, all evidence in Credo AI must be uploaded manually as documents. The Python SDK enables automated push of model evaluation metrics directly from your evaluation pipeline, which is significantly more efficient and produces more auditable evidence.
Install the SDK:
pip install credoai-lens
Basic SDK integration pattern for a classification model evaluation:
import credoai.lens as cl
from credoai.artifacts import ClassificationModel, TabularData
# Wrap your model and test data
model = ClassificationModel(
name="credit-scoring-v2",
model_like=your_sklearn_pipeline
)
eval_data = TabularData(
name="holdout-test-q1-2026",
X=X_test,
y=y_test,
sensitive_features=X_test[["gender", "age_group"]]
)
# Create a Lens instance linked to your Credo AI system record
lens = cl.Lens(
model=model,
assessment_data=eval_data,
# Use the system ID from Step 1
credo_ai_connect={"use_case_id": "your-system-id"}
)
# Run evaluations
lens.run(["ModelFairness", "Performance", "DataEquity"])
# Push results to Credo AI
lens.push()
The SDK computes accuracy metrics (precision, recall, F1, AUC), fairness metrics (demographic parity, equalised odds), and data equity metrics, then pushes the results directly to the linked AI system record in Credo AI as evidence artefacts. Each pushed result appears as a machine-generated evidence item in the platform, timestamped and linked to the model version.
SDK Limitations to Know
- The SDK supports scikit-learn compatible models natively. For TensorFlow, PyTorch, or custom inference pipelines, you need to wrap the model in a compatible interface.
- LLM-based systems require a different evaluation approach; the standard Lens evaluators are designed for classification and regression models. Credo AI's LLM evaluation is a separate module with different setup requirements.
- The SDK evaluates on the data you provide at evaluation time—it does not connect to production traffic. For production monitoring evidence, you need to export production metrics from a monitoring tool and upload them separately.
Step 3: Configure the Policy Pack
A policy pack is a set of governance controls organised by regulatory framework. Credo AI provides pre-built policy packs for EU AI Act, NIST AI RMF, and several other frameworks. You activate a policy pack on an AI system record to create an assessment instance.
EU AI Act Policy Pack Configuration
Navigate to AI System → Assessments → Add Policy Pack → EU AI Act. Before activating, configure the scope parameters:
- Risk classification: Set whether the system is high-risk (full Chapter III obligations apply), GPAI (Chapter V), limited risk (Article 50 transparency obligations), or minimal risk. The policy pack will filter which controls are applicable based on this selection.
- Provider vs deployer role: Set whether your organisation is the provider (developer) or deployer of the system. Provider and deployer obligations differ under Article 13 and Article 26; the policy pack adjusts accordingly.
- Deployment geography: Confirm EU market scope. Required for the policy pack to flag the full set of applicable obligations.
Once activated, the EU AI Act policy pack generates a control checklist mapped to specific articles. Key control groups include:
- Article 9: Risk management system documentation controls.
- Article 10: Data governance controls (training data, bias analysis).
- Article 11 + Annex IV: Technical file completeness controls.
- Article 13: Instructions for use documentation controls.
- Article 14: Human oversight design controls.
- Article 15: Accuracy and robustness documentation controls.
- Article 43: Conformity assessment procedure controls.
- Article 72: Post-market monitoring controls.
NIST AI RMF Policy Pack Configuration
Navigate to AI System → Assessments → Add Policy Pack → NIST AI RMF. Configuration options:
- Profile selection: Full framework or a sector-specific profile (financial services, healthcare). Start with the full framework unless you have a specific sector profile requirement.
- Risk tier: Maps to GOVERN-level risk classification. Higher-tier systems have more active monitoring and documentation controls activated.
- Subcategory scope: You can enable or suppress specific subcategories. For an initial implementation, activate all subcategories and suppress after gap assessment rather than pre-filtering. Suppressing subcategories without documented rationale creates audit exposure.
The NIST AI RMF policy pack organises controls by function (GOVERN, MAP, MEASURE, MANAGE) and subcategory (e.g., GOVERN-1.1, MAP-2.3, MEASURE-2.2). Evidence uploaded against each control is linked to the corresponding subcategory in the framework documentation.
Step 4: Run Assessments and Collect Evidence
With the policy pack configured, Credo AI generates an assessment instance: a list of all applicable controls with their current evidence status (no evidence / evidence submitted / evidence reviewed / control satisfied). Work through the control list systematically:
Automatically Populated Evidence (via SDK)
If you have completed Step 2, the following control types will already have machine-generated evidence from the SDK push:
- Performance metrics (accuracy, precision, recall, F1, AUC).
- Fairness metrics (demographic parity gap, equalised odds gap by protected characteristic).
- Data equity metrics (representation analysis across sensitive feature groups).
Evidence Requiring Manual Upload
The following control types require manual evidence upload regardless of SDK integration:
- Risk management system documentation (Article 9 / GOVERN controls): upload your risk management SOP or policy document.
- Technical file / Annex IV documentation (Article 11): upload the technical file PDF or link to your version-controlled document repository.
- Instructions for use (Article 13): upload the current version of the instructions for use document.
- Human oversight design (Article 14): upload the human oversight protocol or design specification.
- Post-market monitoring plan (Article 72 / MANAGE controls): upload the monitoring plan document.
- Conformity assessment record (Article 43): upload the self-assessment declaration or notified body certificate.
- Internal audit records (GOVERN controls): upload internal audit reports and corrective action records.
Step 5: Generate Evidence and Review Cadence
Once the assessment has sufficient evidence coverage, Credo AI can generate a compliance status report showing which controls are satisfied, which have evidence under review, and which have gaps. This report can be exported as PDF for board reporting, regulatory submission preparation, or enterprise customer due diligence.
Set a review cadence before you close the initial setup:
- Monthly: Review any SDK-generated evidence alerts (if metrics have breached thresholds). Update evidence for any AI system that has had a model update or deployment change.
- Quarterly: Full assessment review: are all controls still satisfied? Have any new controls become applicable (e.g., system deployed to a new context triggering additional obligations)? Update risk classification if the system's use case has changed.
- Annually: Re-run the full SDK evaluation suite on an updated test set. Review the policy pack for any framework updates (Credo AI updates policy packs as regulatory guidance evolves). Produce an annual compliance report for board or senior management review.
Integration Patterns
CI/CD Integration via GitHub Actions
Run Credo AI Lens evaluations automatically as part of your model deployment pipeline:
# .github/workflows/model-governance.yml
name: Model Governance Check
on:
push:
paths:
- 'models/**'
- 'training/**'
jobs:
governance:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Run Credo AI evaluation
env:
CREDO_API_KEY: ${{ secrets.CREDO_API_KEY }}
run: |
pip install credoai-lens
python scripts/run_governance_eval.py
The run_governance_eval.py script runs the Lens evaluation suite on the held-out test set and pushes results to Credo AI. Any assessment that breaches a defined threshold (e.g., fairness gap exceeds 5%) can be configured to fail the CI check, blocking deployment until the governance issue is resolved.
REST API for Evidence Upload
For evidence types that cannot be automated via SDK (document uploads, manual assessment responses), Credo AI's REST API supports programmatic evidence upload. This enables integration with your existing document management system: when a new version of the technical file is published in your DMS, a webhook can trigger an API call to update the corresponding evidence item in Credo AI automatically.
Common Setup Mistakes
- Treating assessment completion as compliance: The most damaging mistake. A completed Credo AI assessment means you have collected and organised evidence against a policy pack's controls. It does not mean you are legally compliant with the EU AI Act or NIST AI RMF. Compliance is a legal determination; Credo AI is an evidence management tool.
- Not setting a review cadence: Credo AI assessments that are completed once and never reviewed become stale within months. Set calendar reminders or configure Credo AI's review reminder feature before closing the initial assessment.
- Missing the deployer context: If your organisation deploys a third-party AI system (rather than developing its own), you have different obligations than a provider. The policy pack must be configured for the deployer role; the Article 26 deployer controls are different from the Article 13 provider controls and require different evidence.
- Aggregating multiple models into one system record: Creates assessment ambiguity. A fine-tuned model is technically a different system from the base model it was derived from. Create separate records and link them via the platform's provenance features.
- Uploading draft documents as evidence: Evidence in Credo AI should reflect the approved, version-controlled state of your documentation. Uploading working drafts creates a misleading compliance picture and can cause problems if the platform's outputs are shared externally.
- Not configuring threshold alerts: SDK-generated metrics without alert thresholds do not trigger any action. Configure thresholds for each metric in Credo AI so that a degradation in fairness or accuracy automatically flags the assessment for review.