In criminal justice systems, credit markets, employment arenas, higher education admissions processes and even social media networks, data-driven algorithms now drive decision-making in ways that touch our economic, social and civic lives. These software systems rank, classify, associate or filter information, using human-crafted or data-induced rules that allow for consistent treatment across large populations.
But while there may be efficiency gains from these techniques, they can also harbor biases against disadvantaged groups or reinforce structural discrimination. In terms of criminal justice, for example, is it fair to make judgments on an individual’s parole based on statistical tendencies measured across a wide group of people? Could discrimination arise from applying a statistical model developed for one state’s population to another, demographically different population?
The public needs to understand the bias and power of algorithms used in the public sphere, including by government agencies. An effort I am involved with, called algorithmic accountability, seeks to make the influences of those sorts of systems clearer and more widely understood.
Existing transparency techniques, when applied to algorithms, could enable people to monitor, audit and criticize how those systems are functioning – or not, as the case may be. Unfortunately, government agencies seem unprepared for inquiries about algorithms and their uses in decisions that significantly affect both individuals and the public at large.
Opening algorithms to public scrutiny
Last year the federal government began studying the pros and cons of using computerized data analysis to help determine prison inmates' likelihood of reoffending upon release. Scoring individuals as low-, medium-, or high-risk can help with housing and treatment decisions, identifying people who can safely be sent to a minimum security prison or even a “halfway house,” or who would benefit from a particular type of psychological care.
That information can make the justice process more efficient and less expensive, and even reduce prison crowding. Treating low-risk offenders like high-risk offenders has been shown in some studies to lead to them internalizing being a “sick” criminal and in need of treatment for their deviant behavior. Separating them can thus reduce the development of negative behaviors that would lead to recidivism upon release.
Data and algorithms for scoring inmates' reoffending risk are already used extensively by states for managing pretrial detention, probation, parole and even sentencing. But it’s easy for them to go unnoticed – they often look like unassuming bureaucratic paperwork.
Typically the algorithms are boiled down to simplified score sheets that are filled out by public servants with little understanding of the underlying calculations. For instance, a case worker might evaluate a prisoner using a form where the case worker marks down that the prisoner had been convicted of a violent crime, was young at the time of the first arrest, and had not graduated from high school or gotten a GED. Those factors and other characteristics about the person and the crime result in a score that suggests whether the inmate might be eligible for parole review.
The form itself, as well as its scoring system, often discloses key features about the algorithm, like the variables under consideration and how they come together to form an overall risk score. But what’s also important for algorithmic transparency is to know how such forms were designed, developed and evaluated. Only then can the public know whether the factors and calculations involved in arriving at the score are fair and reasonable, or uninformed and biased.
Using the Freedom of Information Act
Our primary tool for getting our hands on those forms, and their supporting material, is the law, and specifically, freedom of information laws. They are among the most powerful mechanisms the public has at its disposal for ensuring transparency in government. At the federal level, the Freedom of Information Act (FOIA) allows the public to formally request – and expect to receive in return – documents from the federal government. Analogous statutes exist for each state.
Enacted in 1966, FOIA was created before the widespread use of computing, and well before large reams of data were routinely used in software systems to manage individuals and make predictions. There has been some initial research into whether FOIA is able to facilitate the disclosure of software source code. But a question remains about whether current laws are responsive to the needs of the 21st-century public: can we FOIA algorithms?
A case study in algorithm transparency
I set out to answer this question at the Philip Merrill College of Journalism at the University of Maryland, where I am an assistant professor. In the fall of 2015, working with my colleague Sandy Banisky’s media law class, we guided students in submitting FOIA requests to each of the 50 states. We asked for documents, mathematical descriptions, data, validation assessments, contracts and source code related to algorithms used in criminal justice, such as for parole and probation, bail or sentencing decisions.
As a semester-long project, the effort was necessarily constrained by time, with plenty of hurdles and relatively few successes. As with many journalists' investigations, even figuring out whom to ask – and how – was a challenge. Different agencies may be responsible for different areas of the criminal justice system (sentencing might be done by courts, but parole management done by a Department of Corrections).
Even after identifying the right person, students found government officials used different terminology that made it hard to communicate what information they wanted. At times, students had to work hard to explain “criminal justice algorithms” to a not-so-data-savvy public servant. In retrospect, it might have been more effective to ask for “risk assessment tools,” as that is a term often used by state governments.
Handling the answers
Some states, such as Colorado, flat-out denied our request, saying that the algorithms were contained in software, which was not considered a “document” that open government laws required officials to make public. Different states have different rules about disclosing software use. This has sometimes surfaced in the courts, such as a 2004 suit against the city of Detroit over whether the formula for calculating water fees charged to an adjacent city should be made public.
In our own efforts, we received only one mathematical description of a criminal justice algorithm: Oregon disclosed the 16 variables and their weights in a model used there to predict recidivism. The state of North Dakota released an Excel spreadsheet showing the equation used for determining dates when inmates would be eligible to be considered for parole. From Idaho and New Mexico we received documents with some descriptions of the recidivism-risk assessments those states used, but no details about how they were developed or validated.
Nine states based their refusal to disclose details about their criminal justice algorithms on the claim that the information was really owned by a company. This implication is that releasing the algorithm would harm the firm that developed it. A common recidivism-risk questionnaire, called the LSI-R, turns out to be a commercial product, protected by copyright. States such as Hawaii and Maine claimed that prevented its disclosure to the public.
Louisiana said its contract with the developer of a new risk assessment technique barred the release of the requested information for six months. The state of Kentucky cited its contract with a philanthropic foundation as the reason it could not disclose more details. Concerns about proprietary information may be legitimate, but given that the government routinely contracts with private companies, how do we balance those concerns against an explainable and indeed legitimate justice system?
Much-needed FOIA reform is currently under deliberation by Congress. This provides an opportunity for the law to be modernized, but the proposed changes still do little to accommodate the growing use of algorithms in government. Algorithmic transparency information might be codified into reports that the government generates and makes public on a regular basis, as part of business as usual.
As a society we should require that public information officers be trained so they are literate and indeed fluent in the terminology they may encounter when the public is asking for algorithms. The federal government might even create a new position for an “algorithms czar,” an ombudsman whose task it would be to communicate about and field inquiries into government automation.
None of the documents we received in our research told us how criminal justice risk assessment forms were developed or evaluated. As algorithms govern more and more of our lives, citizens need – and must demand – more transparency.
About The Author
Nicholas Diakopoulos, Tow Fellow, Tow Center for Digital Journalism at Columbia University; Assistant Professor of Journalism, University of Maryland. His research is in computational and data journalism with an emphasis on algorithmic accountability, narrative data visualization, and social computing in the news.