ProPublica’s 2016 article "Machine Bias: There’s Software Used Across the Country to Predict Future Criminals. And it’s Biased Against Blacks. shows that the COMPAS algorithm designed by Northpointe, now Equivant, is racially biased, specifically, the false positive rates and the false negative rates are not the same when they are calculated separately for black people and white people. The data that ProPublica used to arrive at this result is publicly available on GitHub here.

This note explores ProPublica’s data and plots a few histograms that illustrate the racial breakdown in the data and the risk score score distribution by race. The R code is hidden by default. You can inspect the code by clicking “code” on the right and follow along if you’d like.

Loading and filtering data

library(dplyr)
library(ggplot2)
library(grid)
library(gridExtra)
library(tidyverse)

After loading the raw data, we see the following number of rows:

raw_data <- read.csv("data/compas-scores-two-years.csv") #might have to change the path here
nrow(raw_data)

## [1] 7214

By filtering the data a bit, the number of rows becomes:

df <- dplyr::select(raw_data, age, c_charge_degree, race, age_cat, score_text, sex, priors_count, 
                    days_b_screening_arrest, decile_score, is_recid, two_year_recid, c_jail_in, c_jail_out) %>% 
        filter(days_b_screening_arrest <= 30) %>%
        filter(days_b_screening_arrest >= -30) %>%
        filter(is_recid != -1) %>%
        filter(c_charge_degree != "O") %>%
        filter(score_text != 'N/A')
nrow(df)

## [1] 6172

So we are working with about 6,000 data points, each representing a unique individual with characteristics such as age, prior arrests, etc.

Race breakdown

The database is mostly constituted of blacks (50%), whites (35%) and Hispanics (10%). Asians and native-Americans are a small percentage.

race <- ggplot(df, aes(x=race)) + 
          geom_bar(aes(y=..count../sum(..count..)), colour="black", fill="grey100") +
          xlab("race") + 
          ylab("proportion")
race

Recidivism

The recidivism rate is different across races. It is higher among blacks (almost 50%), while it is comparably lower among whites and Hispanics (about 40%) and even lower among Asians (about 25%).

Recidivism is defined as a rearrest for a criminal offense that occurred within two years after the COMPAS risk assessment took place. Arrest is therefore used as a proxy for criminal activity.

Recidivism (i.e. re-arrest) is coded in the database as “1” and absence of recidivism (i.e. no recorded re-arrest) is coded as “0”.

pblack <- ggplot(data=filter(df, race =="African-American"), aes(x=two_year_recid)) + 
          geom_histogram(aes(y=..count../sum(..count..)), binwidth = 1,  colour="black", fill="grey45")  +                           xlab("recidivism") + 
          ylab("frequency") + 
          ylim(0, 1) +
          scale_x_continuous(breaks = seq(0, 1, by = 1)) +
          theme(plot.title = element_text(hjust = 0.5)) + 
          theme(text=element_text(size = 15))

pwhite <- ggplot(data=filter(df, race =="Caucasian"), aes(x=two_year_recid)) + 
          geom_histogram(aes(y=..count../sum(..count..)), binwidth = 1,  colour="black", fill="grey100")  +                          xlab("recidivism") + 
          ylab("frequency") + 
          ylim(0, 1) +
          scale_x_continuous(breaks = seq(0, 1, by = 1)) +
          theme(plot.title = element_text(hjust = 0.5)) + 
          theme(text=element_text(size = 15))

phispanic <- ggplot(data=filter(df, race =="Hispanic"), aes(x=two_year_recid)) + 
          geom_histogram(aes(y=..count../sum(..count..)), binwidth = 1,  colour="black", fill="grey80")  +                           xlab("recidivism") + 
          ylab("frequency") + 
          ylim(0, 1) +
          scale_x_continuous(breaks = seq(0, 1, by = 1)) +
          theme(plot.title = element_text(hjust = 0.5)) + 
          theme(text=element_text(size = 15))

pasian <- ggplot(data=filter(df, race =="Asian"), aes(x=two_year_recid)) + 
          geom_histogram(aes(y=..count../sum(..count..)), binwidth = 1,  colour="black", fill="grey20")  +                           xlab("recidivism") + 
          ylab("frequency") + 
          ylim(0, 1) +
          scale_x_continuous(breaks = seq(0, 1, by = 1)) +
          theme(plot.title = element_text(hjust = 0.5)) + 
          theme(text=element_text(size = 15))

grid.arrange(pblack, pwhite, phispanic, pasian, ncol = 4, top=textGrob("Recidivism: Black, White, Hispanic, Asian", vjust= 0.4, gp=gpar(fontsize=15,font=4)))

Risk scores distribution by race

COMPAS assigns each individual a score between 1 (very low risk of recidivism) and 10 (high risk of recidivism). We can plot the distribution of these scores by race. The scores of blacks are evenly distributed across all values from 1 to 10. The scores of Hispanics tend to be more concentrated towards lower values. This trend is even more apparent for whites.

pblack_s <- ggplot(data=filter(df, race =="African-American"), aes(x=decile_score)) + 
          geom_histogram(aes(y=..count../sum(..count..)), binwidth = 1,  colour="black", fill="grey45")  +                xlab("risk scores") + 
          ylab("frequency") + 
          ylim(0, 0.35) +
          ggtitle("Blacks") +
          scale_x_continuous(breaks = seq(0, 10, by = 1)) +
          theme(plot.title = element_text(hjust = 0.5)) + 
          theme(text=element_text(size = 15))

pwhite_s <- ggplot(data=filter(df, race =="Caucasian"), aes(x=decile_score)) + 
          geom_histogram(aes(y=..count../sum(..count..)), binwidth = 1,  colour="black", fill="grey100")  +                xlab("risk scores") + 
          ylab("frequency") + 
          ylim(0, 0.35) +
          ggtitle("Whites") +
          scale_x_continuous(breaks = seq(0, 10, by = 1)) +
          theme(plot.title = element_text(hjust = 0.5)) + 
          theme(text=element_text(size = 15))

phispanic_s <- ggplot(data=filter(df, race =="Hispanic"), aes(x=decile_score)) + 
          geom_histogram(aes(y=..count../sum(..count..)), binwidth = 1,  colour="black", fill="grey80")  +                           xlab("risk scores") + 
          ylab("frequency") + 
          ylim(0, 0.35) +
          ggtitle("Hispanics") +
          scale_x_continuous(breaks = seq(0, 10, by = 1)) +
          theme(plot.title = element_text(hjust = 0.5)) + 
          theme(text=element_text(size = 15))

grid.arrange(pblack_s, pwhite_s, phispanic_s, ncol = 3, top=textGrob("Scores overall", vjust= 0.4, gp=gpar(fontsize=15,font=4)))

Risk scores distribution by race (non re-offenders only)

Focus now on those individuals in the database for whom no further criminal activity (i.e. no re-arrest) was recorded within a period of two years. Call them non re-offenders. The distribution of the scores for these individuals should be different, mostly concentrated toward lower values. If they did not re-offend, COMPAS should have been able to predict that and assign them lower scores. This is the case for the three racial groups – a sign that COMPAS is, to some extent, tracking criminal behavior (i.e. re-arrest). However, the scores are more clearly concentrated towards lower values for whites and Hispanics, and much less clearly so for blacks.

Risk scores distribution by race (re-offenders only)

Next, let’s focus on those individuals in the database for whom criminal activity (i.e. re-arrest) was recorded within a period of two years. Call them re-offenders. The distribution of the scores for these individuals should be concentrated toward higher values. If they did re-offend, COMPAS should have been able to predict that and assign them higher scores. This is the case for blacks, but not for whites and Hispanics – a sign that COMPAS is, to some extent, tracking future criminal behavior (i.e. re-arrest) for blacks, but much less so for whites and Hispanics.

Racial disparities in false positives classification

As ProPublica showed, the false positive rate is higher for blacks compared to whites. ProPublica did not compute the false positive rate for Hispanics. It is roughly the same as that of whites and higher than that of blacks. Details below.

The first problme here is that COMPAS does not return a yes/no decision, but only a risk score between 1 and 10. So how we do compute false negatives and false positives? We can force a yes/no decision by stipulating that anyone with a score of, say, 5 or higher is classified as a re-offenders So, a false positive occurs when someone who is not a re-offenders is classified as such by COMPAS. The false positive classification rate is defined as the following fraction: \(\frac{\# [\text{ individual is classified as re-offender but is not}]}{\# [\text{ individual is not a re-offender}]}\).

This is the conditional probability \(Pr(\text{ individual is classified as re-offender} \textit{ given that } \text{individual is not a reoffender})\).

The false positive classification rate is 41% for blacks, 21% for whites and 18% for Hispanics. There is a significant difference between blacks, on one hand, and whites and Hispanics on the other hand. Exact values and calculations are listed below:

black_I <- subset(df, race =="African-American" & is_recid==0)
black_I_Cg <- subset(df, race =="African-American" & is_recid==0 & decile_score>4)

fp_black <- nrow(black_I_Cg)/nrow(black_I)

print(paste0("Compas false positive rate for Blacks (>4): ", fp_black))

## [1] "Compas false positive rate for Blacks (>4): 0.414407988587732"

white_I <- subset(df, race =="Caucasian" & is_recid==0)
white_I_Cg <- subset(df, race =="Caucasian" & is_recid==0 & decile_score>4)

fp_white <- nrow(white_I_Cg)/nrow(white_I)

print(paste0("Compas false positive rate for Whites (>4): ", fp_white))

## [1] "Compas false positive rate for Whites (>4): 0.216436126932465"

hispanic_I <- subset(df, race =="Hispanic" & is_recid==0)
hispanic_I_Cg <- subset(df, race =="Hispanic" & is_recid==0 & decile_score>4)

fp_hispanic <- nrow(hispanic_I_Cg)/nrow(hispanic_I)

print(paste0("Compas false positive rate for Hispanics (>4): ", fp_hispanic))

## [1] "Compas false positive rate for Hispanics (>4): 0.182692307692308"

The disparity does not go away if the threshold is increased to at least 7 instead of 5. The false positive classification rate becomes 21% for blacks, 8% for whites and 10% for Hispanics. The false positive rate decreased across all groups - as one would expect since the decision threshold has been increased - but the racial disparity persists.

black_I <- subset(df, race =="African-American" & is_recid==0)
black_I_Cg <- subset(df, race =="African-American" & is_recid==0 & decile_score>6)

fp_black <- nrow(black_I_Cg)/nrow(black_I)

print(paste0("Compas false positive rate for Blacks (>6): ", fp_black))

## [1] "Compas false positive rate for Blacks (>6): 0.221825962910128"

white_I <- subset(df, race =="Caucasian" & is_recid==0)
white_I_Cg <- subset(df, race =="Caucasian" & is_recid==0 & decile_score>6)

fp_white <- nrow(white_I_Cg)/nrow(white_I)

print(paste0("Compas false positive rate for Whites (>6): ", fp_white))

## [1] "Compas false positive rate for Whites (>6): 0.0781122864117168"

hispanic_I <- subset(df, race =="Hispanic" & is_recid==0)
hispanic_I_Cg <- subset(df, race =="Hispanic" & is_recid==0 & decile_score>6)

fp_hispanic <- nrow(hispanic_I_Cg)/nrow(hispanic_I)

print(paste0("Compas false positive rate for Hispanics (>6): ", fp_hispanic))

## [1] "Compas false positive rate for Hispanics (>6): 0.0993589743589744"

Age

Why is there such a large racial disparity in false positives? Perhaps, the racial disparity is attributable to age differences between races. Blacks tend to be younger than whites and Hispanics, and data and common sense suggest that younger people tend to commit more crime. Could this be the explanation?

black_age <- ggplot(data=filter(df, race =="African-American"), aes(x=age_cat)) + 
          # geom_histogram(aes(y=..count../sum(..count..)), colour="black", fill="grey45")  +                
          geom_bar(aes(y=..count../sum(..count..)), colour="black", fill="grey45") +
          xlab("age") + 
          ylab("frequency") + 
          ggtitle("Blacks") +
          ylim(0, 1) +
          theme(plot.title = element_text(hjust = 0.5)) + 
          theme(text=element_text(size = 10)) +
          theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
          scale_x_discrete(limits = c("Less than 25", "25 - 45", "Greater than 45"))

white_age <- ggplot(data=filter(df, race =="Caucasian"), aes(x=age_cat)) + 
          # geom_histogram(aes(y=..count../sum(..count..)), colour="black", fill="grey100")  +                
          geom_bar(aes(y=..count../sum(..count..)), colour="black", fill="grey100") +
          xlab("age") + 
          ylab("frequency") + 
          ggtitle("Whites") +
          ylim(0, 1) + 
          theme(plot.title = element_text(hjust = 0.5)) + 
          theme(text=element_text(size = 10)) +
          theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
          scale_x_discrete(limits = c("Less than 25", "25 - 45", "Greater than 45"))

hispanic_age <- ggplot(data=filter(df, race =="Hispanic"), aes(x=age_cat)) + 
          # geom_histogram(aes(y=..count../sum(..count..)), colour="black", fill="grey80")  +                
          geom_bar(aes(y=..count../sum(..count..)), colour="black", fill="grey100") +
          xlab("age") + 
          ylab("frequency") +
          ggtitle("Hispanics") +
          ylim(0, 1) + 
          theme(plot.title = element_text(hjust = 0.5)) + 
          theme(text=element_text(size = 10)) +
          theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
          scale_x_discrete(limits = c("Less than 25", "25 - 45", "Greater than 45"))

grid.arrange(black_age, white_age, hispanic_age, ncol = 3, top=textGrob("Age distribution by race", vjust= 0.4, gp=gpar(fontsize=15,font=4)))

Risk scores distribution by age

The score distributions for blacks and whites become more similar if we compare individuals of the same age, say younger than 25.

Racial disparities in false positives classification by age

The disparity in false positive rates shrink significantly (though does not disappear entirely) if we compares whites and blacks of the same age group, say individuals younger than 25. Interestingly, the false positive classification rates go up significantly for both groups, to 60% for young blacks and 48% for young whites.

black_young_I <- subset(df, race =="African-American" & is_recid==0 & age_cat=="Less than 25")
black_young_I_Cg <- subset(df, race =="African-American" & is_recid==0 & decile_score>4 & age_cat=="Less than 25")

fp_young_black <- nrow(black_young_I_Cg)/nrow(black_young_I)

print(paste0("COMPAS false positive rate for blacks (< 25): ", fp_young_black))

## [1] "COMPAS false positive rate for blacks (< 25): 0.591836734693878"

white_young_I <- subset(df, race =="Caucasian" & is_recid==0 & age_cat=="Less than 25")
white_young_I_Cg <- subset(df, race =="Caucasian" & is_recid==0 & decile_score>4 & age_cat=="Less than 25")

fp_young_white <- nrow(white_young_I_Cg)/nrow(white_young_I)

print(paste0("COMPAS false positive rate for whites (< 25): ", fp_young_white))

## [1] "COMPAS false positive rate for whites (< 25): 0.48502994011976"

What else?

Age does not eliminate the racial disparity in false positive entirely. While it might be part of the explanation, it cannot be the whole the story. What else is missing?

Exploring the ProPublica COMPAS data

Marcello Di Bello