Voice as Biomarker
'You sound a bit depressed' we might say to a friend
Not only because of what they say but how they say it
Perhaps their speech is duller than usual, tailing off between words
Lacking their usual lively intonation
There are many ways to boil a voice down into data points
Low-level spectral features, computed from snippets as short as twenty milliseconds
That quantify the dynamism of amplitude, frequency and energy
And those longer range syllabic aspects that human ears are tuned to
Such as pitch and intensity
A voice distilled into data
Becomes the training material for machine learning algorithms
And there are many efforts being made to teach machines
To deduce our mental states from voice analysis
The bet is that the voice is a source of biomarkers
Distinctive data features that correlate to health conditions
Especially the emergence of mental health problems
Such as depression, PTSD, Alzheimers and others
And of course there's the words themselves
We've already trained machines to recognise them
Thanks to the deep neural network method called Long short-term memory (LSTM)
We can command our digital assistants to buy something on Amazon
Rules-based modelling never captured the complexity of speech
But give neural networks enough examples
They will learn to parrot and predict any complex pattern
And voice data is plentiful
So perhaps machines can be trained to detect symptoms
Of different kinds of stress or distress
And this can be looped into an appropriate intervention
To prevent things from getting worse
As data, the features of speech become tables of numbers
Each chunk of sound becomes a row of digits
Perhaps sixteen numbers from a Fourier Transform
And others for types of intensity and rhythmicity
For machine learning to be able to learn
Each row must end in a classification; a number that tags a known diagnosis
Presented with enough labelled examples it will produce a probabilistic model
That predicts the likelihood of a future speaker developing the same condition
It's very clever to model the hair cells in the human ear as forced damped oscillators
And to apply AI algorithms that learn models through backpropagation
But we should ask why we want machines to listen out for signs of distress
Why go to all this trouble when we could do the listening ourselves
One reason is the rise in mental health problems
At the same time as available services are contracting
Bringing professional and patient together costs time and money
But we can acquire and analyse samples of speech via our network infrastructures
Machine listening offers the prospect of early intervention
Through a pervasive presence beyond anything that psychiatry could have previously imagined
Machine learning's skill at pattern finding means it can be used for prediction
As Thomas Insel says, "We are building digital smoke alarms for people with mental illness"
Disruption
Insel is a psychiatrist, neuroscientist and former Director of the US National Institute of Mental Health
Where he prioritised the search for a preemptive approach to psychosis
By "developing algorithms to analyze speech as an early window into the disorganization of thought"
He jumped ship to Google to pursue a big data approach to mental health, then founded a startup called Mindstrong
Which uses smartphone data to 'transform brain health' and 'detect deterioration early'
The number of startups looking for traction on mental states
Through the machine analysis of voice
Suggests a restructuring of the productive forces of mental health
Such that illness will be constructed by a techno-psychiatric complex
HealthRhythms, for example, was founded by psychiatrist David Kupfer
Who chaired the task force that produced DSM-5, the so-called 'bible of psychiatry'
Which defines mental disorders and the diagnostic symptoms for them
The HealthRhythms app uses voice data to calculate a "log of sociability" to spot depression and anxiety
Sonde Health screens acoustic changes in the voice for mental health conditions
With a focus on post-natal depression and dementia
"We're trying to make this ubiquitous and universal" says the CEO
Their app is not just for smartphones but for any voice-based technology
Meanwhile Sharecare scans your calls and reports if you seemed anxious
Founder Jeff Arnold describes it as 'an emotional selfie'
Like Sonde Health, the company works with health insurers
While HealthRhythms' clients include pharmaceutical companies
It's hardly a surprise that Silicon Valley sees mental health as a market ripe for Uber-like disruption
Demand is rising, orthodox services are being cut, but data is more plentiful than it has ever been
There's a mental health crisis that costs economies millions
So it must be time to 'move fast and break things'
But as Simondon and others have tried to point out
The individuation of subjects, including ourselves, involves a certain technicity
Stabilising a new ensemble of AI and mental health
Will change what it is to be considered well or unwell
Samaritans Radar
There's little apparent concern among the startup-funder axis
That all this listening might silence voices
Their enthusiasm is not haunted by the story of the Samaritans Radar
When an organisation which should have known better got carried away by enthusiasm for tech surveillance
This was a Twitter app developed in 2014 by the Samaritans
The UK organisation which runs a 24 hour helpline for anyone feeling suicidal
You signed up for the app and it promised to send you email alerts
Whenever someone you follow on Twitter appeared to be distressed
If any of their tweets matched a list of key phrases
It invited you to get in touch with them
In engineering terms, this is light years behind the sophistication of Deep Learning
But it's a salutory tale about unintended impacts
Thanks to the inadequate involvement of service users in its production
It ignored the fact that the wrong sort of well-meaning intervention at the wrong time might actually make things worse
Or that malicious users could use the app to target and troll vulnerable people
Never mind the consequences of false positives
When the app misidentified someone as distressed
Or the concept of consent
Given that the people being assessed were not even made aware that this was happening
All riding roughshod over the basic ethical principle of 'do no harm'
Although Twitter is a nominally public space
People with mental health issues had been able to hold supportive mutual conversations
With a reasonable expectation that this wouldn't be put in a spotlight
Allowing them to reach out to others who might be experiencing similar things
One consequence of the Samaritans Radar was that many people with mental health issues
Who had previously found twitter a source of mutual support
Declared their intention to withdraw
Or simply went silent
As with the sorry tale of the Samaritans Radar
Without the voices of mental health users and survivors
The hubris that goes with AI has the potential to override the Hippocratic oath
Fairness and Harm
The ubiquitous application of machine learning's predictive power
In areas with real world consequences, such as policing and the judicial system
Is stirring an awareness that its oracular insights
Are actually constrained by complexities that are hard to escape
The simplest of which is data bias
A programme that only knows the data it is fed
And which is only fed data containing a racist bias
Will make racist predictions
This should already be red flag for our mental health listening machines
Diagnoses of mental health are already skewed with respect to race
A high proportion of people from black and ethnic minority backgrounds get diagnosed
And the questions about why are still very much open and contested
But surely, proponents will say, one advantage of automation in general
Is to encode fairness and bypass the fickleness of human bias
To apply empirical and statistical knowledge directly
And cut through the subjective distortions of face-to-face prejudice
Certainly, as the general dangers of reproducing racism and sexism have become clear
There have been conscientous efforts from engineers in one corner of machine learning
To automate ways to de-bias datasets
But here's the rub
Even when you know there's the potential for discrimination
It's mathematically impossible to produce all-round fairness
If you're designing a parole algorithm to predict whether someone will reoffend
You can design it so that the accuracy for high risk offenders is the same for white and black
But if the overall base rates are different
There will be more false positives of black people, which can be considered a harm
Because more black people who would not go on to reoffend will be refused bail than white people
Machine learning's probabilistic predictions are the result of a mathematical fit
The parameters of which are selected to optimise on specific metrics
The are many mathematical ways to define fairness (perhaps twenty-one of them)
And you can't satisfy them all at the same time
Proponents might argue that with machinic reasoning
We should be able to disentangle the reasons for various predictions
So we can make policy choices
About the various trade-offs
But there's a problem with neural networks
Which is that their reasoning is opaque
Obscured by the multiplicity of connections across their layers
Where the weightings are derived from massively parallel calculations
If we apply this deep learning to reveal what lies behind voice samples
Taking different tremors as proxies for the contents of consciousness
The algorithm will be tongue-tied
If asked to explain its diagnosis
And we should ask who these methods will be most applied to
Since to apply machinic methods we need data
Data visibility is not evenly distributed across society
Institutions will have much more data about you if you are part of the welfare system
Than from a comfortable middle class family
What's apparent from the field of child protection
Where algorithms are also seen as promising objectivity and pervasive preemption
Is that the weight of harms from unsubstantiated interventions
Will fall disproportionately on the already disadvantaged
With the net effect of 'automating inequality'
If only we could rely on institutions to take a restrained and person-centred approach
But certainly, where the potential for financial economies are involved
The history of voice analysis is not promising
Local authorities in the UK were still applying Voice Stress Analaysis to detect housing benefit cheats
Years after solid scientific evidence showed that its risk predictions were 'no better than horoscopes'
Machine learning is a leap in sophistication from such crude measures
But as we've seen it also brings new complexities
As well as an insatiable dependency on more and more data
Listening Machines
Getting mental health voice analysis off the ground faces the basic challenge of data
Most algorithms only perform well when there's a lot of it to train on
They need voice data labelled as being from people who are unwell and those who are not
So that the algorithm can learn the patterns that distinguish them
The uncanny success of Facebook's facial recognition algorithms
Came from having huge numbers of labelled faces at hand
Faces that we, the users, had kindly labelled for them
As belonging to us, or by tagging our friends
Without realising we were also training a machine
"if the product is free, you are the training data"
One approach to voice analysis is the kind of clever surveillance trick
Used by a paper investigating 'The Language of Social Support in Social Media
And its Effect on Suicidal Ideation Risk'
Where they collected comments from Reddit users in mental health subreddits like
R/depression, r/mentalhealth, r/bipolarreddit, r/ptsd, r/psychoticreddit
And tracked how many could be identified as subsequently posting in
A prominent suicide support community on Reddit called r/SuicideWatch
Whether or not the training demands of voice algorithms
Are solved by the large scale collection of passive data
The strategies of the Silicon Valley startups make it clear
That the application of these apps will have to be pervasive
To fulfill the hopes for scaling and early identification
Society is already struggling to grapple
With the social effects of pervasive facial recognition
Whether mainstreamed in China's system of social credit
Or marketed to US police forces by Amazon
Where it has at least led to some resistance from employees themselves
The democratic discourse around voice analysis seems relatively hushed
And yet we are increasingly embedded in a listening environment
With Siri and Alexa and Google Assistant and Microsoft's Cortana
And Hello Barbie and My Friend Cayla and our smart car
And apps and games on our smartphones that request microphone access
Where might our voices be analysed for signs of stress or depression
In a way that can be glossed as legitimate under the General Data Protection Regulation
On our work phone? our home assistant? while driving? when calling a helpline?
When will using an app like HealthRhythms, which 'wakes up when an audio stream is detected'
Become compulsory for people receiving any form of psychological care?
Let's not forget that in the UK we already have Community Treatment Orders for mental health
Surveillance is the inexorable logic of the data-diagnostic axis
Merging with the benificent idea of Public Health Surveillance
With its agenda of epidemiology and health & safety
But never quite escaping the long history of sponsorship of speech recognition research
By the Defense Advanced Research Projects Agency (DARPA)
A history that Apple doesn't often speak of
That it acquired Siri from SRI International
Who'd developed it through a massive AI contract with DARPA
As the Samaritans example made clear
We should pause before embedding ideas like 'targeting' in social care
Targeting people for preemptive intervention is fraught with challenges
And forefronts the core questions of consent and 'do no harm'
Before we imagine that "instead of waiting for traditional signs of dementia and getting tested by the doctor
The smart speakers in our homes could be monitoring changes in our speech as we ask for the news, weather and sports scores
And detecting the disease far earlier than is possible today"
We need to know how to defend against the creation of a therapeautic Stasi
Epistemic Injustice
It might seem far fetched to say that snatches of chat with Alexa
Might be considered as signficant as a screening interview with a psychatrist or psychologist
But this is to underestimate the aura of scientific authority
That comes with contemporary machine learning
What algorithms offer is not just an outreach into daily life
But the allure of neutrality and objectivity
That by abstracting phenomena into numbers that can be statistically correlated
In ways that enable machines to imitate humans
Quantitative methods can be applied to areas that were previously the purview of human judgement
Big data seems to offer senstive probes of signals beyond human perception
Of vocal traits that are "not discernable to the human ear" i
Nor 'degraded' by relying on self-reporting
It doesn't seem to matter much that this use of voice
Pushes the possibility of mutual dialogue further away
Turning the patient's opinions into noise rather than signal
Machinic voice analysis of our mental states
Risks becoming an example of epistemic injustice
Where an authoritative voice comes to count more than our own
The algorithm analysis of how someone speaks causing others to "give a deflated level of credibility to a speaker's word"
Of course we could appeal to the sensitivity and restraint of those applying the algorithms
Context is everything when looking at the actual impact of AI
Knowing whether it is being adopted situations where existing relations of power
Might indicate the possibility of overreach or arbitrary application
The context of mental health certainly suggest caution
Given that the very definition of mental health is historically varying
The asymmetries of power are stark, because treatment can be compulsory and detention is not uncommon
And the life consequences of being in treatment or missing out on treatment can be severe
Mental health problems can be hugely challenging for everyone involved
And in the darkest moments of psychosis or mania
People are probably not going to have much to say about how their care should be organised
But, in between episodes, who is better placed to help shape specific ideas for their care
Than the person who experiences the distress
They have the situated knowledge
The danger with all machine learning
Is the introduction of a drone-like distancing from messy subjectivities
With the danger that this will increase thoughtlessness
Through the outsourcing of elements of judgement to automated and automatising systems
The voice as analysed by machine learning
Will become a technology of the self in Foucault's terms
Producing new subjects of mental health diagnosis and intervention
Whose voice spectrum is definitive but whose words count for little
User Voice
The lack of user voice in mental health services has been a bone of contention since the civil right's struggles of the 1960s
With the emergence of user networks that put forward alternative views
Seeking to be heard over the stentorian tones of the psychiatric establishment
Forming alliances with mental health workers and other advocates
Groups like Survivors Speak Out, The Hearing Voices Network, the National Self-Harm Network and Mad Pride
Putting forward demands for new programmes and services
Proposing strategies such as 'harm minimization' and 'coping with voices'
Making the case for consensual, non-medicalised ways to cope with their experiences
And forming collective structures such as Patients' Councils
While these developments have been supported by some professionals
And some user participation has been assimilated as the co-production of services
The validity of user voice, especially the collective voice, is still precarious within the mental health system
And is undermined by coercive legislation and reductionist biomedical models
The introduction of machinic listening
That dissects voices into quantifiable snippets
Will tip the balance of this wider apparatus
Towards further objectification and automaticity
Especially in this era of neoliberal austerity
And yet, ironically, it's only the individual and collective voices of users
That can rescue machine learning from talking itself into harmful contradictions
That can limit its hunger for ever more data in pursuit of its targets
And save classifications from overshadowing uniquely significant life experiences
Designing for justice and fairness not just for optimised classifications
Means discourse and debate have to invade the spaces of data science
Each layer of the neural networks must be balanced by a layer of deliberation
Each datafication by caring human attentiveness
If we want the voices of the users to be heard over the hum of the data centres
They have to be there from the start
Putting the incommensurability of their experiences
Alongside the generalising abstractions of the algorithms
And asking how, if at all
The narrow intelligence of large-scale statistical data-processing machines
Could support more Open Dialogue, where speaking and listening aim for shared understanding
More Soteria type houses based on a social model of care
The development of progressive user-led community mental health services
And an end to the cuts
Computation and Care
As machine learning expands into real world situations
It turns out that interpretability is one of its biggest challenges
Even DARPA, the military funder of so much research in speech recognition and AI
Is panicking that targeting judgements will come without any way to interrogate the reasoning behind them
Experiments to figure out how AI image recognition actually works
Probed the contents of intermediary layers in the neural networks
By recursively applying the convolutional filters to their own outputs
Producing the hallucinatory images of 'Inceptionism'
We are developing AI listening machines that can't explain themselves
That hear things of significance in their own layers
Which they can't articulate to the world but that they project outwards as truths
How would these AI systems fare if diagnosed against DSM-5 criteria?
And if objectivity, as some post-Relativity philosphers of science have proposed
Consists of invariance under transformation
What happens if we transform the perspective of our voice analysis
Looking outwards at the system rather than inwards at the person in distress
To ask what our machines might hear in the voices of the psychiatrists who are busy founding startups
Or in the voices of politicians justifying cuts in services because they paid off the banks
Or in the voice of the nurse who tells someone forcibly detained under the Mental Health Act
"This ain't a hotel, love"
It's possible that prediction is not a magic bullet for mental health
And can't replace places of care staffed by people with time to listen
In a society where precarity, insecurity and austerity don't fuel generalised distress
Where everyone's voice is not analysed but heard
In a context which is collective and democratic
The dramas of the human mind have not been scientifically explained
And the nature of consciousness still slips the net of neuroscience
Still less should we restructure the production of truths about the most vulnerable
On computational correlations
The real confusion behind the Confusion Matrix
That table of machine learning accuracy that includes percentages of false positives and negatives
Is that the impact of AI in society doesn't pivot on the risk of false positives
But on the redrawing of boundaries that we experience as universal fact
The rush towards listening machines tells us a lot about AI
And the risk of believing it can transform intractable problems
By optimising dissonance out of the system
If human subjectivities are intractably co-constructed with the tools of their time
We should ask instead how our new forms of calculative cleverness
Can be stiched into an empathic technics
That breaks with machine learning as a mode of targeting
And wreathes computation with ways of caring.