Police Traffic Stop Bias Analysis

In 2017, I did an analysis of Montgomery County (Maryland) Police Department traffic stop data.  These data are available at Data Montgomery, Montgomery County's open data portal.  The data set contained information on traffic stops including location, reason for stop, race of driver, whether a search was conducted and if so whether contraband was found.

The analysis of 2017  clearly showed that Blacks and Latinos (what the data categorized as Hispanics) were disproportionately stopped and searched, without finding additional contraband.  

The Montgomery County Office of Legislative Oversight released a report in 2022, Analysis of dataMontgomery Traffic Violations Dataset, covering the same ground for the years 2018 - 2022.  That report also found disproportionate stops and citations for Blacks and Latinos.  The report did not examine searches.

However, these analyses are based on benchmark tests (simply counting stops and/or searches by race) and outcomes (whether contraband was found during a search).  It is not certain that the disproportionate impact was due to bias or some other factor.  Researchers with the Stanford Open Policing Project at Stanford University performed a nationwide analysis (which did not include Montgomery County Maryland or Washington, DC) of traffic stop data and also found evidence of bias in A large-scale analysis of racial disparities in police stops across the United States.   That study applies not just a benchmark and outcome analysis, but also a threshold test which incorporates both the rate at which searches occur, as well as the success rate of those searches, to infer the standard of evidence (the threshold) applied when determining whom to search (more precisely, the inferred probability of a crime is occuring  used by officers when deciding whether to initiate a search).  Significant differences in the threshold used across races is a strong indicator of bias.

I revisited my earlier analysis by updating to recent data (2023) and applied the threshold test as developed by the Stanford researchers Camelia Simoiu, Sam Corbett-Davies and Sharad Goel in their paper The Problem of Infra-marginality in Outcome Tests for Discrimination, and implemented in Python and R and available in the Stanford Policy Lab Open Policing Project Github repository stanford-policylab/opp.

I implemented benchmark and outcomes tests in R, and called into the Stanford researchers' R function to implement the threshold test.  My code may be found in the R markdown filed:  MoCo_Traffic_Stop_Threshold.rmd.  Here I report my results.

In summary, there is still clear disproportionate stops and searches by race, as well as strong evidence of bias in traffic stops and searches in 2023, but in some surprisingly different ways and the evidence is not always very strong, at least looking at the threshold test.  The main points:

After the statistical tests, I generated a random forest model to the 2023 data to predict whether a search was conducted.  Ranking the most important predictor variables, race was significant but tenth.  Location variables were even more predictive.  GIven that, I then looked at the ssearches done in each police district, which raises some interesting questions (see below).

Detailed statistical results:

I performed a benchmark test (were stops disproportionate by race?), an outcome test (were Blacks or Latinos searched more often than Whites without finding more contraband?) and a threshold test.

Benchmark Test

The data show that in 2023 Blacks were still stopped 1.5x as often as their share of the general population would indicate, while Hispanics (Latinos) were stopped 1.26x as often.

Outcome Test

Items to notice here are that Blacks and Hispanics are searched ,when stopped, about 3x as often as Whites (6% and 7% vs 2%), yet contraband is not found more often when searching Blacks and Hispanics (the hitrate for Blacks and Whites is about the same, while the hitrate for Hispanics is actually lower).

Threshold Test

This test takes some explanation.  The average threshold figures indicate that while an officer decides to search a White driver if he or she believes there is a 43.33% chance of finding contraband, they need only a 40.77% chance before searching a Black driver.  This suggests that police are "quicker" to search Blacks.  But the thresholds are close, and the threshold confidence intervals show a large overlap between Blacks and Whites, suggesting that the threshold test does not indicate, at least with high confidence, bias against Blacks in search decisions.  However, for Hispanics, the threshold is much lower at 36%, with little overlap in confidence interval.  The threshold test does suggest a strong likelihood of bias against Hispanics in search decisions.

Limitations

There are many limitations in these kinds of analyses, including data quality, missing data that could explain the disproportionate stop and search rates other than bias, and limitations in the threshold test, which is an inference based on certain statistical assumptions.

Random Forest Model and its Consequences

I fit a random forest model to the data to predict whether a search was conducted, in order to see which predictor variables the model would consider most important.  The model had a predictive accuracy of over 97%.  The code for the model can be found in MoCoTrafficSearchRF.rmd.

The top ten important predictor variables were as follows:

Accuracy: 0.9710396 

                        No       Yes MeanDecreaseAccuracy MeanDecreaseGini

Time.Of.Stop   0.006948418 0.2980195          0.019667276        366.78871

Violation.Type 0.003925750 0.2638124          0.015288994        124.53951

Longitude      0.011921277 0.1828931          0.019398104        255.26528

Geolocation    0.032538078 0.1403309          0.037251121        263.17211

Latitude       0.031980453 0.1360651          0.036533896        262.21105

Location       0.004584250 0.1224058          0.009734600        255.47231

Arrest.Type    0.001741790 0.1193220          0.006886409         81.62156

SubAgency      0.004681766 0.1174461          0.009610867        106.18332

Date.Of.Stop   0.003904850 0.1141753          0.008724385        234.58751

Race           0.002246621 0.1054858          0.006764351         74.41948

Race was top 10 importance, but many of the location variables were more important (the violation type is either warning, citation or arrest, while the arrest type indicates that if an arrest is made).

Because of the predictive value of location, I then looked at the number of searches by police district:

Note that 3rd district (Silver Spring area) and the HQ/Special Operations "district" have almost half of all searches in the county, with the 4th district (Wheaton area) not far behind.  It is worth asking why so many searches are done in Silver Spring and to a lesser extent in Wheation.  Is there a bias against those areas, drivers (more likely to be of color) in those districts, or are there other good reasons to undertake more searches there?  This question is even more stark when looking at the next chart, showing that there are about the same number of stops in all districts - thus the rate of searches is much higher in the 3rd and 4th districts.  Why?

Another question is why are there so many stops and searches done by HQ/Special Operations?  

More detailed importance charts

For those wanting more detail in the predictor variable importance, two more charts.  The first shows how many times the variable was the root decision point (the more the more important) and how the mean minimum depth across all trees (the lower the depth the more important).

The second shows the minimum depth distribution across all trees.  Besides being very colorful, it shows how Race, while just below Location in mean minimum depth, is actually root or depth 1 more often, and depth 3 or lower the same as Location.