Agentic AI in Courts: Unpacking the COMPAS Bias Scandal and Recidivism Prediction Failures



Key Takeaways

  • The COMPAS algorithm, an AI tool used in U.S. courts, was found to be nearly twice as likely to falsely label Black defendants as high-risk for reoffending compared to white defendants.
  • This bias stemmed from training the AI on historically biased criminal justice data and the algorithm's "black box" nature, which prevented independent auditing and transparency.
  • As AI becomes more autonomous, the COMPAS failure serves as a critical warning: we must mandate transparency, rigorous auditing, and absolute human oversight for any AI used in high-stakes decisions.

Picture this: a judge is deciding someone’s future—bail, sentencing, parole. To help, they turn to an AI tool designed to be objective. But what if that tool, built to eliminate human bias, was quietly perpetuating the very prejudice it was meant to solve?

That’s not a sci-fi plot. An investigation found that one such algorithm was nearly twice as likely to falsely label Black defendants as future criminals than it did white defendants. This is the story of COMPAS, and it’s a brutal wake-up call.

The Allure of the Algorithm: Why Courts Turned to AI

The Promise of Unbiased Recidivism Prediction

Let's be honest, the idea is seductive. The justice system is plagued by human error and bias. A purely data-driven system that could predict recidivism—the likelihood of someone reoffending—was seen as a revolutionary step toward fairness.

The goal was to replace gut feelings and implicit biases with cold, hard numbers. The system was supposed to only look at the data, not the defendant's skin color. Or so we thought.

Defining Risk Assessment Tools in Sentencing

These tools, like COMPAS (Correctional Offender Management Profiling for Alternative Sanctions), weren't making the final call. They were designed as "decision-support" systems.

They would take in data—prior convictions, age, and answers from a questionnaire—to spit out a risk score. A judge could then use this score to inform their decisions. In theory, it adds a layer of objectivity; in practice, it created a veneer of scientific certainty over a deeply flawed process.

Case Study: The COMPAS System and the ProPublica Exposé

What is COMPAS and How Did It Work?

Developed by Northpointe (now Equivant), COMPAS became one of the most widely used risk-assessment tools in the U.S. justice system. It used 137 data points to generate scores for various risks. The problem? The inner workings of the algorithm were a trade secret, a proprietary black box.

The Bombshell Findings: Documenting Racial Bias

In 2016, the investigative journalism outlet ProPublica dropped a bombshell. They analyzed the risk scores assigned to over 7,000 people in Broward County, Florida, and tracked them for two years to see who actually reoffended.

The findings were staggering. While the algorithm's overall accuracy was mediocre for both groups, the types of mistakes it made were profoundly biased.

The Data Breakdown: False Positives and Skewed Error Rates

The data speaks for itself:

  • False Positives (Labeled High-Risk, But Didn't Reoffend): The algorithm was almost twice as likely to get it wrong for Black defendants. 45% of Black defendants who did not reoffend were incorrectly flagged as high-risk, compared to just 23% for white defendants.
  • False Negatives (Labeled Low-Risk, But Did Reoffend): The error flipped. White defendants were nearly twice as likely to be mislabeled as low-risk. 48% of white reoffenders were flagged as safe, compared to only 28% of Black reoffenders.

The system was systematically overestimating the risk posed by Black individuals while underestimating the risk of white individuals. It created a data-driven narrative that reinforced existing racial disparities.

Deconstructing the Bias: Where Did the Algorithm Go Wrong?

Poisoned Data: The 'Garbage In, Garbage Out' Principle

This is the cardinal sin of machine learning. An AI model is only as good as the data it's trained on. The U.S. justice system has a long, documented history of racial bias in policing and arrests.

If you train an algorithm on historical data that reflects these biases, the AI will learn to associate race with criminality. It doesn't "know" it's being racist; it's just identifying patterns in biased data.

This is a classic case of rushing to deploy a system without proper oversight, much like the Technical Debt Tsunami we see when AI-generated code is deployed without human review. The core principle is the same: unchecked automation can embed and scale hidden flaws.

The Black Box Problem: When Transparency is Sacrificed

Because COMPAS was proprietary, its developers refused to reveal the exact calculations. This "black box" nature made it impossible for independent researchers to challenge its logic. We were asked to trust an algorithm whose decision-making process was a secret, even when it was influencing people's freedom.

The Mathematical Impossibility of 'Fairness': Competing Definitions

Here’s the mind-bending part. Northpointe defended its algorithm, claiming it was fair based on a metric called "predictive parity." This means a high-risk score meant roughly the same probability of reoffending, regardless of race.

The problem is that it is mathematically impossible for an algorithm to satisfy both predictive parity and have equal false positive/negative rates across groups with different base rates. You must choose which definition of "fairness" to prioritize, and COMPAS chose the one that let glaring racial disparities slide.

Beyond COMPAS: The Coming Wave of Agentic AI in Law

From Predictive Tools to Autonomous Agents: Raising the Stakes

COMPAS was just a predictive tool. But the technology is evolving at a terrifying pace. We are moving from simple models to truly agentic AI systems that can analyze evidence, draft legal arguments, and even propose judicial outcomes.

The lessons from COMPAS are more critical than ever. As I've written about in Agentic AI Super Agents in 2026 Federal Workflows, if a simple risk score can go this wrong, imagine the damage when autonomous agents are influencing federal-level decisions without ethical guardrails.

Key Lessons from Recidivism Failures for Future Legal AI

The COMPAS scandal taught us three invaluable lessons: 1. Data is Not Neutral: Historical data is a reflection of a biased world and cannot be used blindly. 2. Transparency is Non-Negotiable: "Trust me, it works" is not an acceptable answer when freedom is on the line. 3. Fairness is a Social, Not Just Technical, Problem: We can't let engineers in a lab decide what "fairness" means.

Accountability and Oversight in an Age of Agentic Systems

When an agentic system makes a biased recommendation, who is accountable? This is a huge, unanswered question. It touches on thorny issues similar to the debate over whether agentic systems can claim patent rights, raising fundamental questions of responsibility for non-human decisions.

Conclusion: Forging a Path for Just Technology in the Courts

The Imperative of Human-in-the-Loop Oversight

I’m a tech enthusiast, not a tech utopian. The single most important lesson from COMPAS is that we cannot remove human oversight from critical decision-making loops. AI should be a tool to augment human judgment, not replace it.

A Call for Algorithmic Audits and Radical Transparency

The path forward isn't to ban AI but to demand a new standard. Any AI tool used in the justice system must be subject to independent, third-party audits. Its source code, training data, and weighting models should be open for public inspection.

The COMPAS scandal wasn't just a technical failure; it was a moral one. It showed how easily we use the language of innovation to hide old biases. We must demand transparency, enforce accountability, and never, ever trust a black box with human lives.



Recommended Watch

📺 Data Story Case Studies(How ProPublica Analyzed the COMPAS Recidivism Algorithm)
📺 ProPublica finds that Algorithms used in Risk Assessment tools are biased - A-1 Nick

💬 Thoughts? Share in the comments below!

Comments