This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.
Despite decades of research to better understand suicide risk and to develop detection and prevention methods, suicide is still one of the leading causes of death globally. While large-scale studies using real-world evidence from electronic health records can identify who is at risk, they have not been successful at pinpointing when someone is at risk. Personalized social media and online search history data, by contrast, could provide an ongoing real-world datastream revealing internal thoughts and personal states of mind.
We conducted this study to determine the feasibility and acceptability of using personalized online information-seeking behavior in the identification of risk for suicide attempts.
This was a cohort survey study to assess attitudes of participants with a prior suicide attempt about using web search data for suicide prevention purposes, dates of lifetime suicide attempts, and an optional one-time download of their past web searches on Google. The study was conducted at the University of Washington School of Medicine Psychiatry Research Offices. The main outcomes were participants’ opinions on internet search data for suicide prediction and intervention and any potential change in online information-seeking behavior proximal to a suicide attempt. Individualized nonparametric association analysis was used to assess the magnitude of difference in web search data features derived from time periods proximal (7, 15, 30, and 60 days) to the suicide attempts versus the typical (baseline) search behavior of participants.
A total of 62 participants who had attempted suicide in the past agreed to participate in the study. Internet search activity varied from person to person (median 2-24 searches per day). Changes in online search behavior proximal to suicide attempts were evident up to 60 days before attempt. For a subset of attempts (7/30, 23%) search features showed associations from 2 months to a week before the attempt. The top 3 search constructs associated with attempts were online searching patterns (9/30 attempts, 30%), semantic relatedness of search queries to suicide methods (7/30 attempts, 23%), and anger (7/30 attempts, 23%). Participants (40/59, 68%) indicated that use of this personalized web search data for prevention purposes was acceptable with noninvasive potential interventions such as connection to a real person (eg, friend, family member, or counselor); however, concerns were raised about detection accuracy, privacy, and the potential for overly invasive intervention.
Changes in online search behavior may be a useful and acceptable means of detecting suicide risk. Personalized analysis of online information-seeking behavior showed notable changes in search behavior and search terms that are tied to early warning signs of suicide and are evident 2 months to 7 days before a suicide attempt.
Worldwide, suicide is the 18th leading cause of death, resulting in nearly 800,000 lives lost annually [
One of the most important challenges facing suicide prevention researchers—as well as clinical providers—is to identify warning signs for suicidal behavior [
Machine learning and natural language processing methods have recently been applied to social media data as a means of identifying suicide risk [
The use of individualized web searches for proximal risk assessment of suicide attempts is also aligned with the Fluid Vulnerability Theory [
However, one limitation to using search data for suicide risk prediction is the concern about privacy and the use of data not intended for public consumption. The intent of social media is to share life events and information publicly whereas search queries are often meant for personal, nonpublic use, and thus privacy concerns about using these data are important to understand [
The purpose of this study is to examine the feasibility of using data from internet searches to identify suicide risk. The first objective of this study was to determine whether internet search behavior (frequency of search queries and queries categorized to known warning signs) were evident within 60 days of a documented suicide attempt. The second objective of this study was to determine how comfortable individuals with lived experience of suicide are with the use of internet search data for early identification.
Participants with a prior confirmed suicide attempt were recruited from an ongoing randomized clinical trial [
Participants were asked to complete a 30-minute semistructured interview about their concerns and suggestions for using internet search data as a means of preventing suicide. They were offered the option to provide a one-time confidential data download of their online Google search history. Participants who opted to participate in the study were reimbursed US $30 regardless of whether or not they agreed to share their Google search history.
Google Takeout is a web-based interface developed by Google that allows users of Google apps to download their data into an exportable file. Using prior work [
The Suicide Attempt and Self-Injury Count, a brief version of the Suicide Attempt Self-Injury Interview [
The interview was developed by the research team and focused on the acceptability of using internet information to detect the risk of suicide and to prevent suicide (
Participants’ web searches were used to generate behavioral (online information seeking pattern) and semantic (meaning of search content) features. For behavioral features, we generated daily summary of participants' search history such as the average number of searches per day and the time of day when searches were conducted. For semantic features, we applied distributional models of semantics [
Because of a small sample size and high level of heterogeneity in web search data across participants, we used an individualized analytical approach (
Participants’ responses to structured survey questions were summarized using summary statistics. Semistructured responses were transcribed verbatim during participant interviews and anonymized prior to analysis. We used a mixed methods approach, combining quantitative and qualitative data with the function of expansion [
Of 150 individuals, 99 were eligible to participate in this study and were approached, and 62 consented to participate in the qualitative interview. Of the 62 who consented, 26 (42%) were able to provide web search data. Reasons for not providing data were technical issues in downloading search data (17/62, 27%), unwillingness to share Google searches (15/62, 24%), and not having a Google account (4/62, 6%) (
Study CONSORT flow diagram.
Demographic characteristics.
Characteristics | Approached (n=99) | Consented (n=62) | Search data downloaded (n=26) | ||
Age (years) at enrollment, mean (SD) | 33.10 (12.45) | 34.94 (13.15) | 29.62 (9.15) | .18 | |
|
|
|
|
.58 | |
|
Male | 50 (50.5) | 33 (53.2) | 15 (57.7) |
|
|
Female | 38 (38.4) | 21 (33.9) | 5 (19.2) |
|
|
Other | 5 (5.1) | 4 (6.5) | 3 (11.5) |
|
|
Transgender | 6 (6.1) | 4 (6.5) | 3 (11.5) |
|
|
|
|
|
.83 | |
|
White | 66 (66.7) | 43 (69.4) | 21 (80.8) |
|
|
Mixed | 20 (20.2) | 14 (22.6) | 3 (11.5) |
|
|
Asian | 7 (7.1) | 2 (3.2) | 2 (7.7) |
|
|
Black or African American | 4 (4.0) | 3 (4.8) | 0 (0.0) |
|
|
American Indian or Alaska Native | 1 (1.0) | 0 (0.0) | 0 (0.0) |
|
|
Native Hawaiian or Other Pacific Islander | 1 (1.0) | 0 (0.0) | 0 (0.0) |
|
|
|
|
|
.94 | |
|
Single/never married | 72 (72.7) | 42 (67.7) | 18 (69.2) |
|
|
Divorced | 12 (12.1) | 10 (16.1) | 3 (11.5) |
|
|
Married | 9 (9.1) | 4 (6.5) | 2 (7.7) |
|
|
Separated | 5 (5.1) | 5 (8.1) | 3 (11.5) |
|
|
Widowed | 1 (1.0) | 1 (1.6) | 0 (0.0) |
|
|
|
|
|
.96 | |
|
Some college, associate’s degree, or technical training | 53 (53.5) | 33 (53.2) | 16 (61.5) |
|
|
Bachelor’s or graduate degree | 22 (22.2) | 16 (25.8) | 4 (15.4) |
|
|
High school graduate or GED | 15 (15.2) | 9 (14.5) | 5 (19.2) |
|
|
Some high school | 7 (7.1) | 3 (4.8) | 1 (3.8) |
|
|
Other | 2 (2.0) | 1 (1.6) | 0 (0.0) |
|
|
|
|
|
.98 | |
|
Less than $5000 | 9 (10.8) | 5 (9.4) | 3 (14.3) |
|
|
$5000-9999 | 11 (13.3) | 8 (15.1) | 2 (9.5) |
|
|
$10,000-24,999 | 23 (27.7) | 12 (22.6) | 4 (19.0) |
|
|
$25,000-49,999 | 21 (25.3) | 16 (30.2) | 7 (33.3) |
|
|
More than $50,000 | 15 (18.1) | 9 (17.0) | 4 (19.0) |
|
|
None | 4 (4.8) | 3 (5.7) | 1 (4.8) |
|
aData were missing for n=16, n=9, and n=5 individuals for approached, consented, and data downloaded, respectively.
In total, search history data for 24,397 days were collected with 349,922 individual search queries from 26 study participants. The median time span of the search history data across participants was 1348 days (range 220-4752 days); however, the actual number of days when participants searched online was much lower than the data collection period and varied widely (median 898.5 days, range 75-2759 days) (
Search data characteristics across participants: (a) span (in days) of search data collected from participants (in grey) and the number of days (blue) on which participants made at least one search. (b) Median daily number of web searches performed by the participants. The error bars indicate the 25th and 75th percentile. (c) Proportions of participants' web searches stratified by time of day.
Our analysis revealed idiosyncratic online information seeking behavior across individuals; the number of searches per day varied between 2 searches and 24 searches (median 8.5 searches). The time of day when participants conducted online searches also varied (morning: 0%-4.35%; late night: 0%-37.5%;
Of the semantic search content that we mapped to known suicide warning signs, we identified a small proportion of search queries (median 1.2%, range 0.06%-21.47%) with a proximity
Cue term sets developed to represent selected warning signs and a subset of top search queries that map to each of the warning signs.
Warning sign | Cue terms | Retrieved search queries |
Alcohol use | whiskey; alcohol; aa; wine; alcoholic; beer | “aa meetings”; “how much beer to get drunk”; “wine hangover vs. hard alcohol”; “alcohol poisoning”; “alcoholics anonymous”a |
Preparation of personal affairs | will; affairs; suicide+note | “writing a suicide note”; “living will”; “write your will online” |
Suicide communication | hotline; help; suicide+communicate | “what does suicide hotline do”; “suicide crisis text line”; “suicide text line”; “emergency room si suicidal ideation” |
Suicide methods (preparation) | overdose; gun; lethal | “sleeping pill overdose suicide”; “is ambien lethal”; “where can I get suicide pills”a; “where to buy a gun in Seattle”; “cheap guns”a |
Burdensomeness | burden | “discussing work burdens marriage”a |
No reason to live | hopeless; live; persist | “I don’t want to live anymore” |
Anger | hostile; rage; anger | “fits of rage”; “depression and rage”; “serious anger marijuana” |
Anxiety | scared; fearful; afraid; anxiety; anxious; jittery; | “ocd anxiety”; “apprehensive”a; “social anxiety”; “marijuana for anxiety”; “why do I have so much anxiety”; “phobia of diseases”a |
Emptiness | numb; hollow; feeling+empty | “I feel so empty”; “I like the feeling of being sad” |
Interpersonal problem | conflict; divorce; fight; breakup; loss | “final divorce decree cost”; “infidelity and custody”a; |
aFound using distributional semantic approaches (ie, queries do not contain any of the manually defined cue terms) illustrating the capacity of distributional semantics approaches to identify related concepts expressed in different terms.
On average, 58% of attempts (n=30; range 15/30, 50% to 19/30, 63%) were found to be associated (–log10(placement value)≥2) with at least one search feature in 1 of the 4 proximal time periods (7, 15, 30, and 60 days).
Summary of individualized association analysis for 11 high-level search constructs over 4 suicide attempt–proximal periods: (a) 7, (b) 15 (c) 30, and (d) 60 days.
Baseline distributions for 4 example search features (each indicated by a red circle in
Three primary themes were identified regarding the acceptability of using search history for suicide prevention: utility, accuracy, and privacy (
Illustrative quotes of participant responses to use of internet history in suicide prevention.
Theme | Illustrative quotation | Respondents reporting the theme, n (%) |
Useful | “It’d be a good way to help people get resources that they don’t otherwise know about.” |
40/59 (68) |
Detection accuracy concerns | “No problem with that as long as they did it right. I wouldn’t want the SWAT team to show up at my door...” |
34/59 (58) |
Privacy concerns | “I'm chronically in private mode, because I don't want Google or tech other companies knowing I'm looking at this. If I'm ever in public, I don't want my search results to be seen by others.” |
19/59 (32) |
When presented with potential prevention interventions, participants favored interventions that provided a direct link to either a crisis counselor (35/61, 57%), friends or family (33/60, 55%), peers (30/61, 49%), or to a self-guided meditation video (33/61, 54%) (
This is one of the first studies to examine and describe the nature of individualized internet search data with an eye toward suicide prevention. We found that while search queries and behavior do change prior to suicide attempts, there is considerable variation between individuals, with some participants searching online more frequently, and others seeking information online sporadically prior to attempts. Additionally, search queries over time are highly individualized, and for some attempts, changes in search behavior and queries related to risk are evident 60 days before the attempt, with a majority evident 2 weeks before the attempt. Search content associated with risk windows also varied, although some content was highly prevalent across time points such as queries expressing anger or suicide methods. Although these findings suggest that the use of internet searches for risk prediction will be complicated due to the intraindividual variation, it may still be possible to develop a personalized temporal risk profile or a digital phenotype [
We found that participants felt using internet search data to predict and intervene in suicide was potentially helpful, but they also harbored some important reservations. Participants felt that any intervention based on search history or social media algorithms would need to be highly accurate and respect personal privacy. The interventions themselves should be active (link to a friend), rather than passive (suggestion to contact a hotline). Importantly, participants were particularly concerned about the use of emergency services as a means of intervention.
This study represents the first step in understanding the potential utility of online search data for suicide prevention. The next steps will require a study with a much larger sample size due to the intraindividual variation in search signal differences, in addition to interindividual variation in search terms and search behaviors prior to attempts. Expansion of the semantic feature space may also further refine predictive signals. While these results demonstrate that a personalized analytical approach can identify patterns of search behaviors that are evident up to 2 months before an attempt, larger studies are needed to assess potential representational bias and further refine high-risk signatures from online search data.
This study is a preliminary cohort study and thus has limitations. First, this was a small sample. Although individualized analysis of web search data indicates the potential benefits for understanding real-world risk factors of suicide and when someone may be at a higher risk, future research should explore the cohort-level predictive ability of data, including optimization of analytical parameters (eg, selection of threshold to indicate a meaningful association between web searches and suicide risk factors). Second, participants in this study were at very high suicide risk with both a lifetime suicide attempt and a recent episode requiring hospitalization. A larger prospective study of people with varying levels of suicide risk, including those without a history of suicidal ideation, is warranted to ensure that the search patterns and terms found here are unique to the imminent risk of suicide. Understanding the perspectives of other individuals on the sharing of web search data and the appropriateness of intervention will also be crucial prior to any deployment of prediction algorithms and related suicide prevention efforts. Finally, we asked participants their perspectives about hypothetical intervention scenarios based on internet search informed risk prediction. It is likely that participants’ acceptability of such interventions will differ when they are faced with them during a crisis event.
Although this is a preliminary study, the findings are promising and suggest a potentially useful and timely method for utilizing search data for detecting the risk of suicide. If handled appropriately, this method of risk detection is seen by those with a lived experience of suicide as an acceptable method of detection. Suicide is a serious public health problem, one that has the potential to escalate during these times of health, societal and economic challenges. Methods that can quickly identify and intervene to prevent a suicide event could help prevent and reduce the public health burden of suicide.
A brief schematic overview of the gTAP workflow.
Interview Guide for Exploring Technology for Suicide Prevention AFS Sub-Study.
Description of Search Data Features.
Schematic overview of search data featurization and non-parametric association analysis to compare the difference between a typical “baseline” search behavior to a proximal period before the documented suicide attempt.
Survey participants were asked to rate helpfulness and comfortableness with each of the following potential intervention options. Values represent percentages of all participants that responded to each question.
This work was supported in part by grants from the National Institute of Mental Health (P50 MH115837, R01 MH10230f4, and R33 MH110509).
PAA, AP, and KC designed the study. AP and TC led the search and Google Search data analyses with contributions from PH and CB. KH and TH conducted all study interviews. PAA, HH, and TH contributed to the qualitative analysis of the study and interpretation of the descriptive analyses. PAA, KC, AP, TC, and HH were involved in writing the first draft of the manuscript. PAA obtained study funding. All authors critically revised the manuscript.
PAA reports consulting with Verily Life Sciences on mental health and technology projects. HH was an employee of Verily Life Sciences prior to study contribution.