Person wrapped in a blanket stands on rocky terrain overlooking clouds and a sunset sky.

How a British Police Risk-Scoring System Grew Into a Black Box—and Why Officials Quietly Walked Away

Bristol’s predictive policing system collected sensitive data on hundreds of thousands of people—then officials abandoned some models they couldn’t trust.

In short

A Bristol police-council data project used machine learning to score thousands of residents for risk, but internal reviews later found some models were unreliable and difficult to justify. The case is now raising wider questions about transparency, bias and the future of predictive policing in the UK.

  • Bristol’s Think Family Database combined highly sensitive data on nearly half a million residents with machine-learning risk scores.
  • At least two models were later abandoned after officials concluded they could no longer trust the results.
  • Reviews found major transparency gaps, including missing source code and unclear documentation.
  • The case highlights broader concerns about bias, legitimacy and the expanding use of AI in UK policing.

A sprawling police and council data project in Bristol was built to spot danger before it happened. Instead, newly surfaced records suggest parts of the system became so opaque, inconsistent and difficult to defend that officials quietly abandoned some of its most ambitious risk models after deciding they could no longer trust the results.

The database at the center of the effort, known as Think Family, held highly sensitive information on nearly half a million people in Bristol and the surrounding area. It pulled together police intelligence, housing records, school attendance, mental health information, family support data and other personal details in an attempt to help authorities identify children and adults facing harm. On top of that vast repository, staff built machine-learning tools that assigned risk scores to thousands of people, including children flagged as vulnerable to exploitation and adults assessed as likely to offend, miss court, go missing or experience domestic abuse.

What emerged was not a simple early-warning tool, but a regional experiment in predictive governance that reached deep into public services. Internal reviews, records obtained through public information requests and interviews with current and former staff indicate that the project delivered some practical benefits for front-line workers. Yet the same material also shows sustained concern about accuracy, transparency and fairness, culminating in the quiet removal of at least two models after council staff concluded the outputs were unreliable.

Those findings matter well beyond Bristol. Avon and Somerset Police has been one of the more visible UK forces experimenting with data-driven policing, and its former chief constable now leads the College of Policing, the body helping set standards for forces across England and Wales. At a time when British policing is moving toward wider use of artificial intelligence and predictive analytics, the Bristol case offers a detailed look at what can happen when ambition runs ahead of oversight.

What the Think Family Database was meant to do

The project began with a fairly straightforward question: how can agencies dealing with vulnerable families see the full picture early enough to intervene effectively? Child welfare workers, police officers and council staff often hold separate pieces of the puzzle. A school may register chronic absences. Police may have information about domestic abuse in the home. Housing officers may know a family is falling behind on rent. None of those signs, alone, necessarily tells the whole story.

Supporters of the Bristol initiative argued that combining those records could help frontline workers identify hidden patterns and prioritize help before problems became crises. In that sense, the project grew out of a familiar public-sector instinct: if data is fragmented, merge it; if risk is hard to see, score it.

The scale of the resulting system was large. The Think Family Database was launched in 2016 by Bristol City Council and Avon and Somerset Police. It eventually collected sensitive material on close to half a million residents. Among the data types held in the system were police logs, housing status, mental health records, free school meal eligibility, teenage pregnancy information and participation in parenting courses.

Officials then layered machine-learning models over that material to create individual risk assessments. One former police data scientist described the process in strikingly casual terms during a 2022 event about child exploitation: the data, in his telling, was poured into a “bucket,” stirred through a statistical process and turned into a score for each person.

That description captures both the promise and the problem of the project. From the outside, the model may have looked like a neat, data-driven answer to complex social problems. In practice, it depended on assumptions about which variables mattered, whether the data was current and accurate, and whether the relationships the algorithm detected were meaningful or merely statistical noise.

How predictive policing took root in Bristol

To understand why Bristol pushed so hard into predictive analytics, it helps to look at the environment in which the project was born. By the middle of the 2010s, the force was under strain. Budgets were shrinking across UK policing. Leadership instability and criticism over failures to protect some domestic abuse victims had created pressure for a better, more efficient way to allocate resources.

In that climate, predictive analytics looked attractive. If officers could use data to forecast where danger was likely to emerge, perhaps they could intervene earlier, prioritize scarce resources more effectively and justify those decisions in an era of austerity.

At Bristol City Council, senior staff were reasoning along similar lines. Gary Davies, a former police chief superintendent who later led a team working with children and families at the council, recalled that practitioners often saw the consequences of family breakdown only once they had become obvious. The harder problem, he said, was identifying the households moving toward crisis before the situation became visible in a meeting room or through an emergency referral.

That thinking led to a joint effort between the council and police known as Insight Bristol. Beginning in 2015, a small team from both bodies worked out of a police station and tried to merge information from across the public sector. The goal was to create a shared view of children and families so that schools, social workers, health services and police could act from the same set of information.

Inside the logic of “legal gateways”

One important feature of the project was how it handled consent. Officials did not ask every resident for permission before placing their data into the system. Instead, the project relied on what staff called legal gateways: statutory routes that permit data-sharing when an agency believes it is necessary to meet legal responsibilities, such as safeguarding children.

Davies argued that the idea of individual consent was not always realistic in this context, because the authorities were already required to retain certain records. In his view, suggesting that residents had freely opted in would create a misleading impression of choice.

That rationale did not eliminate public unease. At first, residents could not opt out. Later, the council added an opt-out option to council tax letters. Even so, the broader question remained: just because the data sharing was legally defensible, was it publicly legitimate?

That distinction would become increasingly important as scrutiny of the models intensified.

The models multiplied

What began as a family-support database evolved into a much broader predictive analytics program. According to records reviewed for this investigation, Avon and Somerset Police developed at least 23 separate models. These included tools designed to estimate the risk that someone would commit burglary, fail to appear in court, go missing or become a victim of domestic abuse.

One senior officer even referred to a “league table” of the area’s most dangerous criminals, an apparent reference to the Offender Management App, which was intended to store data on roughly 300,000 people across the region.

The system was not just about offenders. Some of the most controversial models targeted children and families. Among them was a model aimed at identifying children at risk of sexual exploitation. Another assessed the likelihood of child criminal exploitation. These tools relied on a mixture of police intelligence and data from schools, housing services, councils and charities.

By design, the models were supposed to detect patterns that human staff might miss. In practice, that ambition created a powerful incentive to ingest more and more data. The larger and more varied the dataset, the more convincing the promise of precision seemed.

But a bigger data pool does not automatically produce better predictions. It can also produce more confusion, more proxy variables and more hidden bias.

What the risk models actually used

The child sexual exploitation model, often referred to as the CSE model, drew on multiple public-sector datasets, including information held by the police, the council and other agencies. Barnardo’s provided anonymized information on 1,000 children known to have been sexually abused. The idea was to use those cases as a template, identifying children with similar attributes or patterns of vulnerability.

The scoring system considered factors such as whether a child was identified as “in need,” whether they were persistently absent from school and whether there were mental health concerns. It also analyzed social connections, looking for links to people already considered vulnerable or believed to be possible perpetrators of exploitation.

The child criminal exploitation model later introduced by Bristol City Council used a similarly broad range of inputs. Those included whether a family received housing support, whether rent arrears had been recorded and whether a child qualified for free school meals.

These features were intended to help identify risk. But critics have long warned that many such variables can act as stand-ins for poverty or instability rather than direct indicators of criminality or abuse.

“The variables being used can in practice be proxies for poverty,” researchers at Cardiff University’s Data Justice Lab said in an earlier review of UK citizen-scoring systems.

That observation goes to the heart of the debate. If an algorithm flags hardship because hardship correlates with later contact with public services, it may appear effective while actually reinforcing existing disadvantage.

Why residents didn’t know much about the system

One of the most striking features of the Bristol project is how little ordinary residents appear to have known about it for years. John Pegram, who leads a local police accountability group, said he only learned of the Offender Management App in 2023, long after it had been built. Once he found out, he suspected he may have been included.

That suspicion led him to ask police how his data was being used. At first, he was refused details. Later, after legal help was brought in, the police confirmed that he was on the app but would not say much more.

Pegram’s experience mirrors a wider concern raised in the investigation: people may be subject to algorithmic scoring without knowing it, without seeing the data used against them and without understanding whether a score can affect their dealings with the state.

For affected residents, that uncertainty can be especially troubling. If the government is using a score to shape decisions about a family, a child or a criminal case, people may want to know three basic things:

  • What data was used?
  • How was the score generated?
  • What effect did the score have on decisions?

In Bristol, those answers were often difficult to obtain.

Warnings emerged early

Concerns about the project were not limited to outsiders. Internal advisory bodies and external reviewers were flagging problems well before the models were abandoned.

In 2016, Avon and Somerset Police’s ethics committee reviewed the approach and urged caution. Members warned that the force needed to think carefully about which data it relied on and how variables were selected. They also emphasized the risk of bias. Importantly, the committee said the public should be told why such processing was taking place and how it worked.

That advice turned out to be prescient. As the models expanded, transparency did not keep pace.

By 2021, officials from the Centre for Data Ethics and Innovation were reportedly hearing about “ethical tensions” in the project. Their concerns included the fact that large volumes of sensitive information had been gathered through legal gateways rather than through a process designed to build trust with the public. Their conclusion was sharp: legality is not the same as legitimacy.

That distinction became a recurring theme in later reviews. Even if authorities had the legal power to collect and share the data, they still needed to justify the method in democratic, practical and ethical terms.

When the models stopped making sense

The turning point appears to have come when the risk scores stopped being useful enough to trust.

According to the records obtained for this investigation, at least two risk-scoring models were quietly dropped after Bristol City Council staff concluded they could no longer rely on them. Public documents suggest those models were the CSE and CCE systems, both of which had been central to the child-protection side of the initiative.

The clearest account of what went wrong comes from a detailed Social Finance review commissioned by Bristol City Council and nearby Somerset Council. The report, which was more than 100 pages long, found that the Think Family Database and related visualizations had practical value for child protection workers and could speed up responses. But it also concluded that the risk-scoring models were the weakest part of the program.

According to the review, accuracy issues undercut their usefulness. Council staff expressed doubts about the outputs, and some said the models were no longer fit for operational use.

One especially telling clue came from an email cited in the records. A staff member noted that people with recent sexual victimization were ranking below those associated with burglary offenses, a result that badly shook confidence in the model.

Another sign of trouble was a change in the data feeding the system. At one stage, police stopped using Bristol City Council data and tried to extend the model across the wider Avon and Somerset boundary, which includes five councils. That shift reduced the richness of the inputs. Instead of relying on a broad range of social indicators, the algorithm increasingly had access only to police-held data, which was not enough to recreate the same performance.

Staff then reported that some children who should have appeared as vulnerable were missing from the results altogether.

One council worker told reviewers they were uneasy about relying on the system because they could not see clearly where the numbers came from or how the model had been built.

Another was even more direct, saying they would not present the model’s output in a meeting because they were not confident it was accurate enough.

What staff said they saw in practice

The feedback from front-line staff suggests the technology gradually lost credibility. People who once felt it mirrored what they already knew later said the outputs no longer matched reality.

Some workers said the same vulnerable young people kept failing to appear in the rankings. Others described spending hours checking names, emailing colleagues and manually verifying what the score had produced. Over time, that extra work became so burdensome that some simply stopped using the outputs as a guide.

That pattern is common in public-sector analytics projects. A tool may begin as an aid to judgment, but if staff repeatedly find they must check, override or interpret it by hand, the supposed efficiency gains evaporate.

The data problems behind the scenes

The Social Finance review identified a second, more fundamental problem: the project lacked adequate documentation.

When the reviewers attempted to test the models themselves, they found they could not locate source code or clear records of the variables used to create the systems. Without that information, they could not fully evaluate how the models were built or how they might behave under different conditions.

That is a serious weakness for any high-stakes algorithm, especially one affecting children and families. If neither the creator nor the reviewer can reconstruct the decision process in detail, external scrutiny becomes almost impossible.

Professor Rob Procter of the University of Warwick, who acted as an expert consultant on the review, said the documentation was not detailed enough to explain how the models were made. He argued that any similar project should be accompanied by much stronger transparency and a public conversation about whether the approach is justified at all.

The absence of documentation also complicates accountability. If a model is abandoned, officials may still be unable to explain exactly why it failed or who was responsible for the choice of design variables, thresholds and updates.

A wider UK pattern of predictive policing

Bristol did not develop its tools in isolation. Across the United Kingdom, police forces have been experimenting with predictive analytics and related AI systems for years, often with mixed results.

Kent Police was the first UK force to test one of the best-known predictive-policing products. It later canceled its contract with the US firm PredPol, saying it had been difficult to show the tool reduced crime. Durham Constabulary faced criticism for using sociodemographic data to predict the risk of reoffending.

In that context, Avon and Somerset at times appeared to be a relative success story. It had a broad system, active leadership support and a strong public narrative about using evidence and data to target intervention. But the Bristol documents suggest that some of the apparent success was based on incomplete information and unresolved operational problems.

The broader lesson is that predictive policing is often presented as a technical fix for public safety challenges that are actually social, political and institutional. Data may help agencies see patterns, but it cannot by itself resolve the question of which patterns should matter, who should be scored and what should happen when the model is wrong.

Andy Marsh and the national push for AI in policing

The Bristol revelations arrive at a moment when British policing is being urged to embrace artificial intelligence more broadly.

Andy Marsh, the former chief constable of Avon and Somerset, now heads the College of Policing, the body that helps set professional standards across England and Wales. He has spoken enthusiastically about the role of AI in law enforcement and has argued that successful tools should be scaled rapidly.

Marsh has said his organization is examining about 100 AI tools already in use by police, including systems linked to predictive policing, and that effective ones should be tested rigorously and then expanded quickly across the service.

His broader argument is that police forces should not wait too long to adopt tools that may improve efficiency. Yet the Bristol case suggests that speed can come at the cost of scrutiny, especially when systems are built on sensitive data, used over many years and not explained clearly to the public.

That tension is likely to shape the next phase of UK policing. Authorities want smarter methods for managing workload, targeting prevention and coping with limited budgets. At the same time, they face rising demands for evidence that AI systems are fair, accurate and independently checked.

Why the Bristol case matters for AI governance

This story is not only about one local project. It speaks to a larger governance challenge that cuts across health, education, social care and law enforcement: when public agencies use AI to sort people into risk categories, how do they prove the system is reliable, proportional and lawful in the fullest democratic sense?

The Bristol experience suggests several lessons.

1. Accuracy is not optional

A predictive system can only be justified if its outputs are meaningfully better than guesswork or existing practice. If staff conclude that the model merely repeats what they already know, or worse, misorders cases in ways that seem absurd, the system loses legitimacy.

2. Data breadth can become a liability

Adding more variables may improve apparent coverage, but it can also import bias, proxy measures and stale information. Without careful design, the model may track deprivation, service contact or prior police attention rather than actual risk.

3. Documentation matters as much as the model

Transparent records of inputs, code, thresholds and changes over time are essential. Without them, neither staff nor outside reviewers can properly assess the system.

4. Legal permission is not enough

Authorities can be within their legal rights and still lose public trust. Residents are more likely to accept data-sharing when they understand how it works and what safeguards are in place.

5. Front-line staff are the real test

If workers do not trust a score, they will either ignore it or waste time checking it. In either case, the promised efficiency gain disappears.

Chronology of the Bristol predictive analytics project

The sequence below shows how the project developed from a family-support database into a larger and more controversial predictive system.

Year Event Why it mattered
2014 Avon and Somerset Police faced budget pressures and criticism over domestic abuse failures. Created pressure to find a more data-driven way to manage risk.
2015 Insight Bristol staff from the council and police began working together. Marked the start of cross-agency data sharing for family support.
2016 Think Family Database launched; police ethics committee reviewed the predictive work. Database and early models began formal operation under cautious internal oversight.
2018 External researchers warned that model variables could be proxies for poverty. Highlighted fairness and bias concerns.
2019 A child criminal exploitation model was introduced, and the force publicly promoted predictive analytics. Showed expansion from child protection into broader operational policing.
2021 National ethics officials reportedly flagged tensions around the project. Raised questions about legitimacy and public trust.
2023 Council staff concluded some models were not fit for operational use and stopped using them. Signaled the collapse of confidence in the most ambitious risk scores.
2024 John Pegram sought information on how his data was being used. Illustrated the continuing opacity of the system for residents.

The trust problem at the center of the story

For all the technical language surrounding machine learning, this is ultimately a story about trust. Authorities asked residents to accept that a large, secretive pool of data could be used in their interests, even as the system remained difficult to inspect from the outside.

Some people inside the project believed the outcome justified the method. Gary Davies said the database improved understanding of risk and helped workers respond faster. From his perspective, the system made it easier to connect the dots and protect children who might otherwise have been missed.

But the voices in the review records show a competing reality: staff worried about errors, missing cases and models that no longer inspired confidence. Investigators could not fully recreate the systems because essential documentation was missing. Residents often had no idea whether they were in the database or how a score might affect them.

Those are not minor administrative flaws. They go to the heart of whether algorithmic public services can operate legitimately at all.

What happens next

The Bristol case is likely to feed a broader debate over AI use in policing and child protection. Supporters of predictive analytics will point to the practical value of shared data and argue that, in a world of scarce resources, authorities should use every available tool to identify risk early.

Critics will say the project shows what happens when “early warning” becomes overconfidence: data sharing expands, scoring systems multiply, documentation falls behind and the public is asked to trust results it cannot inspect.

Either way, the issue is no longer theoretical. Hundreds of pages of records suggest that Bristol’s system was not a clean technological success story, but a messy public-sector experiment that exposed the limits of predictive AI when applied to real people, real services and real decisions.

As more police forces and public agencies adopt AI tools, the Bristol experience provides a warning that is difficult to ignore: if a model cannot be explained, tested and trusted, it should not be allowed to shape people’s lives in the dark.

That lesson may be especially important in child welfare and policing, where the consequences of false positives, false negatives and hidden bias can ripple through families for years. The question now is whether policymakers will treat Bristol as a cautionary tale—or as a template to be repeated at larger scale.

Share this 🚀