The governance wrapper is the last line of defence before any engine output reaches a product. It maps internal labels to legally safe language, runs harm-detection gates, and flags outputs that require human review. No adapter bypasses it.
The Problem the Governance Wrapper Solves
An engine might output score: 0.91, label: "deceptive_pattern_detected". That label, surfaced directly in a CRM, is a defamation risk. In a legal platform, it could prejudice a client hearing. The governance wrapper replaces it with "guarded communication posture observed" — a behaviorally accurate phrase that does not create clinical or legal liability.
The replacement is not cosmetic. It is a policy decision encoded in a version-controlled language map.
The Safe Language Map
// governance/safe-language-map.js
const SAFE_LANGUAGE_MAP = {
'deceptive_pattern_detected': 'guarded communication posture observed',
'emotionally_dysregulated': 'elevated stress indicators present',
'lying_detected': 'significant inconsistency signals present',
'manipulative_tactics_used': 'influence-seeking communication patterns noted',
'high_litigation_risk': 'elevated procedural complexity indicators',
'credibility_compromised': 'consistency of account warrants attention',
};
function applyLanguageMap(engineOutput, map) {
const safeLabel = map[engineOutput.label];
return safeLabel
? { ...engineOutput, label: safeLabel, originalLabel: '[REDACTED]' }
: engineOutput;
}
The originalLabel is stored only in the internal audit log — never in any response payload or database column accessible to product UIs.
Harm Detection Gates
const HARM_GATES = [
{ test: out => out.score > 0.95 && out.domain === 'psychological',
action: 'require_human_review',
reason: 'clinical-threshold-exceeded' },
{ test: out => out.label.includes('mental') || out.label.includes('disorder'),
action: 'block',
reason: 'diagnostic-language-prohibited' },
{ test: out => out.domain === 'legal' && out.score > 0.85,
action: 'require_human_review',
reason: 'high-consequence-legal-output' },
];
function runHarmGates(output, context) {
for (const gate of HARM_GATES) {
if (gate.test(output)) {
if (gate.action === 'block') return null; // output is dropped entirely
if (gate.action === 'require_human_review') {
return { ...output, requiresHumanReview: true, reviewReason: gate.reason };
}
}
}
return output;
}