Published on in Vol 21, No 5 (2019): May

Proactive Suicide Prevention Online (PSPO): Machine Identification and Crisis Management for Chinese Social Media Users With Suicidal Thoughts and Behaviors

Proactive Suicide Prevention Online (PSPO): Machine Identification and Crisis Management for Chinese Social Media Users With Suicidal Thoughts and Behaviors

Proactive Suicide Prevention Online (PSPO): Machine Identification and Crisis Management for Chinese Social Media Users With Suicidal Thoughts and Behaviors

Original Paper

1Institute of Psychology, Chinese Academy of Sciences, Beijing, China

2Department of Psychology, University of Chinese Academy of Sciences, Beijing, China

3Department of Social and Behavioural Sciences, City University of Hong Kong, Hong Kong, China (Hong Kong)

4Department of Computing, The Hong Kong Polytechnic University, Hong Kong, China (Hong Kong)

Corresponding Author:

Tingshao Zhu, PhD

Institute of Psychology

Chinese Academy of Sciences

No.16, Lincui Road



Phone: 86 150 1096 5509


Background: Suicide is a great public health challenge. Two hundred million people attempt suicide in China annually. Existing suicide prevention programs require the help-seeking initiative of suicidal individuals, but many of them have a low motivation to seek the required help. We propose that a proactive and targeted suicide prevention strategy can prompt more people with suicidal thoughts and behaviors to seek help.

Objective: The goal of the research was to test the feasibility and acceptability of Proactive Suicide Prevention Online (PSPO), a new approach based on social media that combines proactive identification of suicide-prone individuals with specialized crisis management.

Methods: We first located a microblog group online. Their comments on a suicide note were analyzed by experts to provide a training set for the machine learning models for suicide identification. The best-performing model was used to automatically identify posts that suggested suicidal thoughts and behaviors. Next, a microblog direct message containing crisis management information, including measures that covered suicide-related issues, depression, help-seeking behavior and an acceptability test, was sent to users who had been identified by the model to be at risk of suicide. For those who replied to the message, trained counselors provided tailored crisis management. The Simplified Chinese Linguistic Inquiry and Word Count was also used to analyze the users’ psycholinguistic texts in 1-month time slots prior to and postconsultation.

Results: A total of 27,007 comments made in April 2017 were analyzed. Among these, 2786 (10.32%) were classified as indicative of suicidal thoughts and behaviors. The performance of the detection model was good, with high precision (.86), recall (.78), F-measure (.86), and accuracy (.88). Between July 3, 2017, and July 3, 2018, we sent out a total of 24,727 direct messages to 12,486 social media users, and 5542 (44.39%) responded. Over one-third of the users who were contacted completed the questionnaires included in the direct message. Of the valid responses, 89.73% (1259/1403) reported suicidal ideation, but more than half (725/1403, 51.67%) reported that they had not sought help. The 9-Item Patient Health Questionnaire (PHQ-9) mean score was 17.40 (SD 5.98). More than two-thirds of the participants (968/1403, 69.00%) thought the PSPO approach was acceptable. Moreover, 2321 users replied to the direct message. In a comparison of the frequency of word usage in their microblog posts 1-month before and after the consultation, we found that the frequency of death-oriented words significantly declined while the frequency of future-oriented words significantly increased.

Conclusions: The PSPO model is suitable for identifying populations that are at risk of suicide. When followed up with proactive crisis management, it may be a useful supplement to existing prevention programs because it has the potential to increase the accessibility of antisuicide information to people with suicidal thoughts and behaviors but a low motivation to seek help.

J Med Internet Res 2019;21(5):e11705



Approximately one million people die by suicide globally every year. Aside from being a great challenge to public health, suicide also causes significant economic losses and aggravates labor shortages. It is estimated that by 2020, approximately 1.53 million people will die from suicide annually, and the number of people who attempt suicide will be 10 to 20 times greater [1]. Suicide is the leading cause of death among young people aged between 15 and 29 years [2]. In China, 200 million people attempt suicide annually, with two-thirds aged between 15 and 34 years [3]. Suicide prevention is thus crucial, particularly for young people.

Current suicide prevention methods practiced globally include school-based screening, screening by a primary care provider, and gatekeeper training, all of which are methods that are targeted at the general population and involve passively waiting for people to be in need [4]. However, many studies have found that because most people experiencing suicidal thoughts and behaviors tend not to participate in the aforementioned activities and have low motivation to seek help, existing methods have a rather weak effect on suicide prevention [5-8]. For example, only 17% of suicidal people in low-income countries, such as China, receive treatment in a timely manner [6]. The main reasons for not seeking help include the lack of a perceived need for services, high self-reliance, stigma, and structural factors such as time and cost [6-8]. However, the subjective judgment of at-risk individuals may not be good, and a high self-reliant tendency may lead to severe depressive symptoms and suicidal ideation among young people [9]. The passive suicide approach needs suicidal cases to actively seek help [10], such as the Columbia suicide screening program in which suicidal students filled in surveys in schools to get help [11]. In contrast, a proactive approach for suicide prevention, in which the program itself takes the initiative to identify suicidal people and invite them to use specific services, may increase the likelihood of service usage for the hidden people [10].

As in most developing countries, mental health care in China is at an early stage of development [12]. Moreover, because of China’s large population and uneven distribution of resources, it is hard to implement school-based screening or maintain primary care provider screenings nationally [13,14]. Gatekeeper training is still at an early stage [15], and thus new suicide prevention methods are urgently needed.

The internet has become an indispensable part of life for many people. As such, researchers have started to use people’s self-generated online messages to identify suicidal thoughts and behaviors either by manual [16] or machine learning analysis [17]. However, identification of suicidal thoughts and behaviors is just the first step in suicide prevention. Even though the internet has been used to manage suicide-prone crises [18,19], more effort is needed to prevent suicide. Previous studies have used the internet simply as a platform, and this means that they suffered from the same shortcomings as traditional prevention methods.

Half of the Chinese population uses the internet. Approximately two-fifths (40.9%) of China’s netizens use the Sina microblog, the Chinese version of Twitter [20]. Microbloggers (users of microblogs) can post microblog posts publicly, similar to Twitter. They can also send direct messages to other users that can be seen by the sender and receiver exclusively. Microbloggers can follow other users, along with replying to, commenting on, reposting, or liking others’ posts. An average of 139 million new posts are generated on the Sina microblog daily, with most (82%) microbloggers being under 30 years old [21]. These phenomena provide opportunities to prevent young people dying from suicide in China because existing findings suggest that young people feel they can freely discuss suicide-related topics on social networks [22,23]. The challenge is that the proportion of suicide posts is extremely low, making it nearly impossible to identify them manually, and that is why researchers have called for more efforts to build automated or semiautomated suicide ideation detectors to facilitate the provision of timely help and support to people at risk of suicide [24]. Suicidal people need to first be identified, and suicide identification and preventions can be carried out with the assistance of machine learning models.

To our knowledge, no research has previously been undertaken on the combination of proactive identification of suicidal individuals via social media and specialized crisis management. Moreover, a growing number of studies have demonstrated that (1) there is a shortage of online suicide prevention approaches [25], (2) unidirectional monologic suicide prevention information distributed by professional institutions is insufficient and more dialogic communication is needed between professionals and people at risk of suicide [26], (3) suicidal statuses can be detected from human language [27], and (4) a large improvement (such as enhancement of recall) can be achieved by applying machine learning algorithms to suicide identification [17]. Because language is an explicit behavior that indicates human mental status, people with suicide ideation are more likely to talk about suicide than people without suicide ideation [28]. Recently, researchers have started to use online longitudinal data to evaluate certain improvements after receiving psychosocial support services [29,30]. For example, people who used more future tense words online were found to have benefited more from online social support [31]. Previous studies have shown that higher future orientation was associated with less suicide ideation [32,33]. In our view, a reduction in death-oriented language and an increase in future-oriented language may serve as indicators for reduced suicide risk.

Figure 1. Study protocol for evaluating the Proactive Suicide Prevention Online (PSPO).
View this figure

We proposed a new internet-based approach, Proactive Suicide Prevention Online (PSPO), for the identification and prevention of suicidal thoughts and behaviors (see Figure 1). We identified a microblog group online and manually annotated their comments on a suicide note to train a machine learning model. Next, the model was used to automatically identify posts that suggested suicidal thoughts and behaviors. We proactively provided crisis management in the form of emotional and informational support to microbloggers identified as at risk. Finally, we used the language changes in their posts as criteria to evaluate the efficacy of the PSPO approach. Based on the research gaps summarized above, we aimed to evaluate (1) the performance of PSPO in identifying high-risk social media users with suicidal thoughts and behaviors, (2) the acceptability of a proactive approach that offered help to social media users with suicidal thoughts and behaviors, (3) the improvement of PSPO in prompting suicidal social media users to seek help in comparison with traditional suicide prevention methods, and (4) the efficacy of PSPO for suicidal social media users in terms of changes in their language use (ie, reduced death-oriented words and increased future-oriented words).

Data Collection

Identifying At-Risk Individuals

A microblogger, Zoufan, died by suicide due to depression on March 17, 2012, with her suicide arousing wide attention online. Since her death, her blog has become a “secret garden” where suicidal people share their feelings and thoughts. By July 24, 2018, more than 1.3 million comments had been posted on her online suicide note, and many of them contained suicide information. This microblog group comprises people with and without suicide ideation. We analyzed the comments left on Zoufan’s online suicide note. The official Sina microblog application programming interface was used to obtain comments posted from April 1 to 28, 2017, which were manually annotated as the training set. Comments posted from July 3, 2017, to July 3, 2018, were obtained and automatically identified by the developed machine learning model. Further details regarding the procedures used for building the machine learning model (eg, the coding system for suicidal thoughts and behaviors in posts, feature selection for the machine learning models) are provided in Data Analysis.

Providing Crisis Management

All microbloggers who were identified by the machine learning model as expressing suicidal thoughts and behaviors were invited to join the study via direct message. There were no exclusion criteria because we aimed to reach out and provide support to as many suicidal social media users as possible.

The direct message, designed in our previous study [34], included (1) a brief introduction to the project; (2) URLs for assessment protocols on suicidal thoughts and behaviors, depressive symptoms, and help-seeking behaviors; (3) emotional support (empathy, recommendations such as having regular physical exercise and a healthy diet, and encouragement); (4) informational support (the URL for this study, along with referrals to hospitals and hotline services); and (5) details regarding the availability of one-to-one counseling by contacting counselors via direct messaging (see Multimedia Appendix 1).

If a user replied to the direct message, counselors provided support that was targeted to the user’s specific problem. Twelve certified counselors (2 men and 10 women, mean age 23.08 [SD 1.08] years) with experience in handling suicidal cases were trained to provide counseling services through direct messaging.

Direct Message Assessment Protocol

Suicidal thoughts and behaviors were tested with the use of two items chosen based on previous research [35] and the 9-Item Patient Health Questionnaire (PHQ-9) [36]. The two items were “Do you have a plan to commit suicide?” and “Have you ever attempted suicide?” Participants responded with binary choices (yes/no), and if the answer was yes to the first item, they were required to indicate whether they had a specific or vague plan. A sample item on the PHQ-9 would be “having little interest or pleasure in doing things.” Participants rated the frequency of the 9 symptoms over the past 2 weeks on a 4-point Likert scale (0 = not at all, 3 = nearly every day), and the total score of the PHQ-9 ranges from 0 to 27, with higher scores indicating a greater severity of depressive symptoms. The Chinese version has been shown to have good psychometric properties [37], and internal consistency was .84 in this study.

Help-seeking behavior was assessed by two items [35]: “What kind of psychological treatment have you received before?” and “Have you sought help when you had suicidal ideation?” and if the answer was positive to either of the questions, the effectiveness of the former help was rated on a 7-point Likert scale (1 = totally disagree, 7 = totally agree). If the participant’s rating for the question was 3 or lower, we recorded that the former help was not useful.

Acceptability was measured with one item (“How acceptable do you find this proactive help?”) and rated on a 7-point Likert scale (1 = totally disagree, 7 = totally agree). We considered a rating of 4 or higher to indicate that the program was acceptable.

One-to-One Counseling

The counselor training was based on problem-solving therapy [38], which begins with identifying a person’s problem and then helping them to affirm feasible solutions to a specific issue by setting goals and comparing the pros and cons of every plausible solution. A concrete and feasible plan is then made to facilitate the client in overcoming the problem they face. The goal for counselors in crisis management was to persuade suicidal microbloggers to seek professional services and provide them with the appropriate referrals. The interaction between the microbloggers and our counselors also depended on the needs of the microbloggers. The training lasted 3 hours and included a theoretical explanation and practice of applying problem-solving therapy to this online situation. In addition to counselors being under monthly supervision by psychiatrists and senior counselors, we also formed an online chat group where the counselors could discuss the problems they encountered in consultations at any time.

Because the data used in this study were all publicly available, traditional informed consent was not appropriate. In the identification section, measures were taken to anonymize the data in the data analysis to minimize the inadvertent disclosure of personal information or information that may reveal clues with regard to an individual’s online identity. In the crisis management part of the study, participants gave informed consent voluntarily when agreeing to take part in the program. The project received ethical approval from the Institutional Review Board of the Institute of Psychology, Chinese Academy of Sciences, with the ethics approval number H16003.

Data Analysis

Building Machine Learning Models for Suicide Recognition

The first step for supervised learning was to obtain a training set. To achieve good performance of the machine learning model, this study decided not to use crowd-sourcing [39]. Rather, 5 psychology postgraduates with expertise in analyzing suicide annotated the microbloggers’ comments. The annotation process was identical to the one used in our previous study [34]. Expressing a death wish or writing about suicide was coded as suicide ideation. Because a suicide plan is defined as suicide-related communication to account for its interpersonal nature, which is often expressed in verbal words regarding how one might advance from ideation to action [40], we operationalized a suicide plan as one that involved discussions regarding the act of dying (eg, a death kit, death place and time, making a will) after considering the nature of dialogue in social media. Attempted suicidal behavior within the preceding 2 weeks with current suicide ideation or the possibility of executing a suicide plan in the coming 1 to 2 weeks was coded as a suicide attempt. Posts were ranked as follows: 0 = no suicide risk; 1 = risk of suicide ideation but no detailed plan made; 2 = risk of suicide plan not requiring emergency aid; and 3 = significant risk of suicide attempt requiring emergency aid.

Comments identified as indicating suicidal thoughts and behaviors were labeled as positive training samples, and 10,000 posts without suicidal thoughts and behaviors were randomly selected to serve as negative training samples. Another strategy for the improvement of the performance of the machine learning model was the theoretical-based feature selection. Researchers in the field of computer science had a tendency to select predictive features by randomly using linguistic analyses, such as n-grams and sentiment analyses, without a theoretical or empirical basis [41,42]. In this study, we combined data-driven features (n-grams) that were derived from social media data, domain knowledge, and theoretical guidance [43,44] to select features. A knowledge-based generic suicide-related lexicon [45], which was manually developed by a panel of domain experts, was used. Theory-motivated features include personality and depression, which are the factors most commonly cited as being relevant to suicides [46-48]. Personal traits such as fearfulness, social inhibition, shyness, pessimism, immaturity, and lack of internal organization were associated with psychotic suicide attempts [43]. Moreover, there has been significant progress in predicting personality and depression from social media data [43,44,49,50], and depression has also been used to predict suicidal behaviors [51]. Thus, it is both theoretically sound and technically feasible to incorporate predictive features of personality traits and depression into a model for suicide ideation detection. To our knowledge, no prior studies have used the lexicon and predictive features (including personality traits and depression) that were included in the machine learning models.

A binary classification of suicidal thoughts and behaviors detection model was built to determine if the comment indicated suicidal thoughts and behaviors. We used support vector machine (SVM), decision tree, random forest, and logistic regression algorithms with 10-fold cross-validation to train the detection model because these machine learning algorithms are the most widely used methods of predicting psychological characteristics and emotions and detecting suicide ideation [45,52]. The performance of the detection model was evaluated through the use of four metrics: precision, recall, F-measure, and accuracy [53].

Using Language Frequency Changes as an Efficacy Indicator for Crisis Management

Because of the low response rate in the completion of the online survey, we were unable to collect 1-month post–PHQ-9 data. We used language frequency changes as an efficacy indicator for crisis management. To examine the microbloggers’ language changes between 1 month before the commencement of the program and 1 month after the completion of the program, the Simplified Chinese Linguistic Inquiry and Word Count (SCLIWC) was used. The SCLIWC is an amended version of the text analysis program LIWC designed to perform well in Simplified Chinese on a microblog [54]. It is composed of 7 main categories and 64 subcategories. Death-oriented and future-oriented words are 2 of the subcategories, and every participant’s posts were parsed into these 2 subcategories. Category scores were calculated by the ratio of words within the category to all the words in the posts. Furthermore, we used the change tendencies of the 2 subcategories as a measure of the efficacy of the program.

Machine Learning Models for Suicide Recognition

From April 1 to 28, 2017, four weekly sessions of manual annotation of comments were conducted. Of the 27,007 comments, 10.32% (2786/27,007) were identified as indicating suicidal thoughts and behaviors. In those, 81.44% (2269/2786), 13.75% (383/2786), and 4.81% (134/2786) contained information coded as suicide ideation, suicide plan, and suicide attempt, respectively (Table 1).

Table 2 presents the means and standard deviations in the performances of the detection models with the whole feature set and two baseline feature sets developed by the selected classification algorithms. The best overall models were the SVM models. We compared the performances of SVM models, which were constructed with each feature set using a Tukey honestly significant difference post hoc test. The precision of the model of set C is lower than set A (t=–6.32, P<.001). However, the recall (t=12.07, P<.001), F-measure (t=5.48, P<.001), and accuracy (t=3.32, P=.004) of the model were all significantly higher in set C than with set A. Comparing the performance of SVM models using the feature sets C and B, and despite the precision of the SVM model with set C also being lower (t=–5.80, P<.001), the recall and F-measure of the model using set C were significantly higher (t=12.23, P<.001 and t=3.87, P=.001, respectively) than those of the model using set B, while their accuracies were equivalent (t=1.34, P=.20).

Of the 387,823 comments made between July 3, 2017, and July 3, 2018, 24,727 (6.38%) were identified as being indicative of suicidal thoughts and behaviors by the machine learning model.

Table 1. Manual identification of suicidal comments posted in April 2017.
DateComments (n)Suicidal comments
  Suicidal thoughts and behaviors, n (%)Suicide ideation, n (%)Suicide plan, n (%)Suicide attempt, n (%)
4/01-4/076975849 (12.17)702 (82.69)107 (12.60)40 (4.71)
4/08-4/146201682 (11.00)561 (82.26)90 (13.20)31 (4.55)
4/15-4/216467563 (8.71)457 (81.17)82 (14.56)24 (4.26)
4/22-4/287364692 (9.40)549 (79.33)104 (15.03)39 (5.64)
Total27,0072786 (10.32)2269 (81.44)383 (13.75)134 (4.81)
Table 2. Performance of the machine learning models.
Model performance and feature setSVMa, mean (SD)DTb, mean (SD)RFc, mean (SD)LRd, mean (SD)
 Ae.88 (.01).84 (.02).87 (.01).87 (.01)
 Bf.88 (.01).76 (.01).85 (.01).87 (.01)
 Cg.85 (.01).76 (.01).85 (.01).88 (.01)
 A.78 (.02).68 (.05).75 (.02).79 (.02)
 B.80 (.01).75 (.01).74 (.01).79 (.01)
 C.85 (.01).75 (.01).73 (.01).80 (.01)
 A.83 (.01).74 (.03).80 (.01).83 (.01)
 B.84 (.01).76 (.01).79 (.01).83 (.01)
 C.85 (.01).76 (.01).78 (.01).84 (.01)
 A.85 (.01).79 (.02).83 (.01).85 (.01)
 B.86 (.01).78 (.01).82 (.01).85 (.01)
 C.86 (.01).78 (.01).82 (.01).86 (.01)

aSVM: support vector machine.

bDT: decision tree.

cRF: random forest.

dLR: logistic regression.

eA: n-gram features.

fB: n-gram features + domain knowledge features.

gC: n-gram features + domain knowledge features + theory-motivated features.

Crisis Management

We sent direct messages to the 12,486 microbloggers (some microbloggers had multiple comments) who were identified as having expressed suicidal thoughts and behaviors in the 24,727 comments by the machine learning model. A total of 34.58% (4318/24,727) of individuals completed the assessment protocol, and there were 1403 valid samples (mean age 21.66 [SD 3.26] years). Females significantly outnumbered males (χ21=647.33, P<.001), and most of the participants were students or employed, single, and had graduated from college (see Table 3).

In terms of suicide risk, most of the respondents (1259/1403, 89.73%) thought they would be better off dead or by hurting themselves in some way. Nearly half of these (699/1403, 49.82%) had a suicide plan, with 6.34% (89/1403) indicating a specific plan to commit suicide and 43.48% (610/1403) indicating a vague plan to do so. Of the 1403 participants, 545 (38.85%) had previously attempted suicide. The mean score for the PHQ-9 was 17.40 (SD 5.98), and this represents moderately severe depressive symptoms. Two-thirds of the participants (924/1403, 65.86%) had never received any kind of psychological treatment. Just over half (725/1403, 51.67%) had not sought help from anyone, 12.19% (171/1403) had sought help from professionals (such as a psychiatrist, therapist, or general practitioner), and 36.14% (507/1403) had sought help from people around them (such as family and friends). Of the participants who had sought help before, 48.33% (678/1403) rated the efficacy of the former help as 2.60 (SD 1.43), and 77.00% (522/678) thought that the former help rendered was of no use to them.

On a 7-point Likert scale, nearly 70% (968/1403, 69.00%) of participants rated proactive help through the use of direct messaging acceptable (4 or more on the scale). The average score for all participants was 4.35 (SD 1.81).

Between July 3, 2017, and July 3, 2018, microbloggers logged into the study website to view prevention information 12,300 times. A total of 2321 users replied to a direct message at least once. Figure 2 shows the total number of microbloggers who interacted with our counselors monthly from July 3, 2017, to July 3, 2018. On average, there were 234.08 (SD 88.70) microbloggers who interacted with our counselors every month. Table 4 shows the interaction between microbloggers and counselors, and this included the number of microblogger replies and interaction days. Approximately 90% (2043/2321, 90.12%) of the microbloggers replied fewer than 10 times. Nearly 97% (2246/2321, 96.77%) of the microbloggers interacted with our counselors for less than 5 days. A total of 1097 users completed an assessment protocol and consultation. Of the 12,486 microbloggers who were contacted, 5542 (44.39%) responded to our direct message either by completing the assessment protocol or by interacting with the counselors. Earlier studies found that the percentages of college students seeking help from professionals were 5.1% for all college students, 14.4% for college students with mental health problems, and 4.5% for college students without mental health problems [55]. Compared with the traditional methods used, we were able to prompt a larger number of people identified via the machine learning model as having posted suicidal content to seek help for their distress or suicide ideation.

Table 3. Demographic characteristics of participants.
CharacteristicValue, n (%)
 Male225 (16.04)
 Female1178 (83.96)
Education level 
 Junior middle school or below82 (5.84)
 Senior high school272 (19.39)
 College1006 (71.71)
 Graduate or above43 (3.06)
Employment status 
 Employed472 (33.64)
 Unemployed229 (16.32)
 Student603 (42.98)
Marital status 
 Single/divorced1342 (95.65)
 Married61 (4.35)
Figure 2. Number of microbloggers who interacted with our counselors from July 3, 2017, to July 3, 2018.
View this figure
Table 4. Interactions between microbloggers and counselors.
InteractionsValue, n (%)
Number of microblogger replies 
 ≤102043 (90.12)
 11-30118 (5.21)
 31-5046 (2.03)
 51-10060 (2.65)
 >10054 (2.38)
Days of interactions with counselors 
 ≤52246 (96.77)
 6-1048 (2.07)
 >1027 (1.16)
Table 5. Changes in frequency of language use pre- and postprogram.
CategoryExamplesPreprogram, % (SD)Postprogram, % (SD)t valueP value
Death-oriented wordsDie/suicide/will0.37 (0.01)0.31 (0.01)2.21.03
Future-oriented wordsAfter/soon/future0.34 (0.01)0.34 (0.01)–2.29.02

Finally, we used SCLIWC to detect language changes in the 2321 social media users who replied to the direct message. By tracing the accounts of those microbloggers, we compared their microblog posts from a month before and after receiving services from our counselors. After deleting users who did not complete the interaction with the counselors for one month, there were 2031 microbloggers. As shown in Table 5, the frequency of death words significantly declined (P=.03), and the frequency of future-oriented words significantly increased (P=.02). For the month before the program, the number of posts by a single user ranged from 1 to 1013. The mean was 30.59 (SD 84.36). For the month after the program, the number of posts by a single user ranged from 1 to 1279, and the mean was 27.41 (SD 74.04). The paired sample t test result showed that the difference in the total number of posts by a single user before and after the program was thus not significant (t=1.92, P=.06).

Principal Findings

In our study, we first identified a microblog group formed around the Sina microblog account of a microblogger who committed suicide, which was an efficient way to identify a high-risk population. Then, we proactively pushed direct messages to invite all of the microbloggers identified by the machine learning model as people who had exhibited suicidal thoughts and behaviors to participate in our study. Our results provided some preliminary evidence that automatic identification of suicidal thoughts and behaviors along with proactive suicide prevention are acceptable and helpful.

For suicide ideation detection, recall is arguably more important than precision. The results of machine learning models demonstrate that in general, incorporating theory-related features and features based on domain knowledge can improve the recall, F-measure, and accuracy of a model for detecting suicide ideation. The best results were .88, .85, .85, and .86 for precision, recall, F-measures, and accuracy, respectively, which demonstrates the utility of the model in identifying suicide posts. Apart from multiresources and theoretical-based feature selection [39,41] being able to ensure that relevant features are included and redundant ones excluded, our results outperformed earlier suicide machine learning models to identify posts with suicide content [52,56] mainly because we did not rely on crowd-sourcing [39] and instead opted for postgraduate annotators specializing in studying suicide. In addition, the dataset size was almost twice the size of those used in similar studies done earlier [56,57].

In this study, most individuals who completed the questionnaires were single females (employed or students) with a college degree. This is consistent with a previous study showing that females are more likely to talk about suicide ideation to health professionals and use health services than males [6]. Those with a higher education and those who are never married also had significantly higher odds of receiving mental health treatment [6].

For the period of July 3, 2017, to July 3, 2018, 6.38% of the comments were identified as expressive of suicidal thoughts and behaviors. The self-reports of the 1403 participants identified as individuals who displayed suicidal proclivities testified to the utility of the machine learning model. The self-reported results in the survey showed that percentages for suicide ideation, suicide plan, and past suicide attempt were 89.73%, 49.82%, and 38.85%, respectively. These are much higher than the results in a meta-analysis study that showed percentages of suicide ideation and past suicide attempt for the general Chinese population were 3.9% and 0.8%, respectively [58]. Moreover, 6.34% of our participants self-reported a specific suicide plan. Their mean score for the PHQ-9 was 17.40 (SD 5.98), indicating moderately severe depressive symptoms. If suicide ideation can be identified as early as possible, then at-risk individuals can be prevented from deteriorating to a point where they make a specific suicide plan [59]. PSPO enables timely crisis management only for those in need, without disturbing others.

Even with a high suicide ideation rate, 65.86% of participants who completed the questionnaires had never received any kind of psychological treatment. Moreover, 51.67% of them did not seek help from anyone regarding their suicide problem. This is consistent with earlier studies [60,61] and may explain why 69.00% of the participants accepted our PSPO—it provided a new way of accessing help for those who had experienced barriers to seeking help for suicidal thoughts and behaviors previously. Another possible reason for the acceptability of PSPO was its anonymous nature.

This study has demonstrated some primary evidence for the efficacy of PSPO. Eliminating suicide ideation can be a long process, but suicide crisis management can serve as “emotional cardiopulmonary resuscitation” for people at risk. We sent out 24,727 direct messages to 12,486 different social media users, and 5542 (44.39%) of them responded. Of these, 4318 individuals finished the assessment protocol and 2321 users replied to the direct message. On average, 234.08 (SD 88.70) microbloggers interacted with our counselors monthly, approximately 90% of the microbloggers replied fewer than 10 times, and nearly 97% of the microbloggers interacted with our counselors for less than 5 days. Those results indicate that PSPO might largely extend the potential for suicide prevention for those who have never sought help before when compared with traditional passive methods. Moreover, the prevention information on our website was viewed 12,300 times. Finally, after interacting with the counselors, the microbloggers with suicidal thoughts seemed to change the language they used on social media significantly. In particular, the frequency of death-oriented words was found to have been significantly reduced a month after the crisis management as compared with the frequency of those words one month prior to receiving crisis management. One possible reason could be that the microbloggers felt the concern, social support, and empathy of the counselors. Another possible reason is that these users started to seek help after the consultation. At the same time, the frequency of future-oriented words increased significantly, although it was a slight change. This may be due to the relatively small number of future words in the overall vocabulary used. Nevertheless, it may also signal that the users had less suicide ideation and became more willing to accept support than before.

Limitations and Future Work

In this study, we only focused on a microblog group. Future studies would be needed to establish whether our machine learning model can be applied to other similar suicide groups and other social media platforms such as school bulletin boards, suicide groups online, or online self-help groups for suicidal thoughts and behaviors. Because our suicidal thoughts and behaviors detection model is at an early stage of development, only a binary classification model was built, and it was mainly focused on finding suicidal candidates for the primary crisis management. Multiclass classification can be adopted and adapted in the future to facilitate customized suicide prevention for different social media users. Moreover, to detect suicide ideation, we concentrated mainly on text features extracted from posts, although other behaviors on social media, such as interactions with other users and posting frequencies and times, could also be effective predictors. Examining these potential factors may provide additional insights and guidance for building more effective models based on social media to detect suicide ideation.

Slightly more than one-third (34.58%) of the users completed the questionnaires provided in the direct message. Given the sensitivity of the subject, the relatively low response rate is understandable, although it is higher than those found in earlier studies (close to 10%) [62]. For future research, we plan to investigate the differences between those who participated in the program and those who did not to acquire more first-hand data to develop a deeper understanding of their psychology and behavior. Our aim is to involve more human resources in lifesaving, thereby increasing the prevention rate for suicide.

We only offered the primary crisis management information to microbloggers with suicidal thoughts and behaviors. More standardized and systematic emergency intervention protocols, mental health resources, and professional referrals are needed to guarantee reasonable retention in a future study. While PSPO provides an opportunity for longitudinal study, the effectiveness of various Web-based suicide prevention and intervention approaches including PSPO should be examined because a follow-up is crucial in suicide intervention [63]. We will thus try our best to provide follow-up measures and actions for the identified users.

Finally, because this was a preliminary study, we mainly used the changes in future-oriented word frequency to demonstrate the efficacy of PSPO. Our efficacy evidence should be interpreted with caution because the relationship between future-oriented words and reduced suicide risk would still need further verification. Future studies should consider using direct indicators of reduction of suicidal thoughts and behaviors to demonstrate the improvement. Moreover, there is a possibility that the results reflect regression to the mean. A few strategies in the study design stage (eg, using a randomized controlled trial, having multiple tests at different time points for actual behavior instead of just intention or attitude) and analysis stage (eg, using analysis of covariance) are desirable to reduce the regression to the mean [64,65].


This paper presents PSPO as a proactive suicide prevention method for identifying and preventing suicide incidents by social media users, especially young people. The results indicate that PSPO is feasible for identifying populations at risk of suicide and providing effective crisis management, and the identification of at-risk individuals is automatic and timely. The crisis management is also proactive, acceptable, and low cost. Our study may be a useful supplement to existing prevention programs, and suicide crisis management may increase public awareness of help-seeking related to suicide risk, thereby improving the well-being of the population. This approach could alleviate the problems associated with a huge population with weak psychological services and help with an imperfect suicide prevention system in large developing countries such as China.


The authors gratefully acknowledge the generous support of the National Basic Research Program of China (2014CB744600), China Social Science Fund (Y8JJ183010), National Social Science Fund of China (16AZD058), and the Research Grants Council of the Hong Kong Special Administrative Region, China (Collaborative Research Fund, project number C1031-18G). The sponsors had no role in the study design; the collection, analysis, and interpretation of data; in the writing of the report; or in the decision to submit the paper for publication.

Conflicts of Interest

None declared.

Multimedia Appendix 1

The direct message sent to users with suicidal thoughts and behaviors.

PDF File (Adobe PDF File), 67KB

  1. Bertolote JM, Fleischmann A. A global perspective in the epidemiology of suicide. Suicidologi 2015;7(2):1. [CrossRef]
  2. World Health Organization. Preventing suicide: a global imperative   URL: [accessed 2019-04-22] [WebCite Cache]
  3. Statistical Information Center of the Ministry of Health of the People's Republic of China. China Health Statistics Yearbook 2013   URL: [accessed 2019-04-22] [WebCite Cache]
  4. Schaffer A, Sinyor M, Kurdyak P, Vigod S, Sareen J, Reis C, et al. Population-based analysis of health care contacts among suicide decedents: identifying opportunities for more targeted suicide prevention strategies. World Psychiatry 2016 Jun;15(2):135-145 [FREE Full text] [CrossRef] [Medline]
  5. Fountoulakis KN, Gonda X, Rihmer Z. Suicide prevention programs through community intervention. J Affect Disord 2011 Apr;130(1-2):10-16. [CrossRef] [Medline]
  6. Bruffaerts R, Demyttenaere K, Hwang I, Chiu W, Sampson N, Kessler RC, et al. Treatment of suicidal people around the world. Br J Psychiatry 2011 Jul;199(1):64-70 [FREE Full text] [CrossRef] [Medline]
  7. Hom MA, Stanley I, Joiner JT. Evaluating factors and interventions that influence help-seeking and mental health service utilization among suicidal individuals: a review of the literature. Clin Psychol Rev 2015 Aug;40:28-39. [CrossRef] [Medline]
  8. Han J, Batterham PJ, Calear AL, Randall R. Factors influencing professional help-seeking for suicidality. Crisis 2018 May;39(3):175-196. [CrossRef] [Medline]
  9. Labouliere CD, Kleinman M, Gould MS. When self-reliance is not safe: associations between reduced help-seeking and subsequent mental health symptoms in suicidal adolescents. Int J Environ Res Public Health 2015 Apr 01;12(4):3741-3755 [FREE Full text] [CrossRef] [Medline]
  10. World Health Organization. Towards evidence-based suicide prevention programmes   URL: [accessed 2019-04-22] [WebCite Cache]
  11. Scott MA, Wilcox HC, Schonfeld IS, Davies M, Hicks RC, Turner JB, et al. School-based screening to identify at-risk students not already known to school professionals: the Columbia suicide screen. Am J Public Health 2009 Feb;99(2):334-339. [CrossRef] [Medline]
  12. Gao X, Jackson T, Chen H, Liu Y, Wang R, Qian M, et al. There is a long way to go: a nationwide survey of professional training for mental health practitioners in China. Health Policy 2010 Apr;95(1):74-81. [CrossRef] [Medline]
  13. Kong Y, Zhang J. Access to farming pesticides and risk for suicide in Chinese rural young people. Psychiatry Res 2010 Sep 30;179(2):217-221 [FREE Full text] [CrossRef] [Medline]
  14. Tang S, Meng Q, Chen L, Bekedam H, Evans T, Whitehead M. Tackling the challenges to health equity in China. Lancet 2008 Oct 25;372(9648):1493-1501. [CrossRef] [Medline]
  15. Liang T, Zhang X, Wang Z. A review on the suicide gatekeeper training. Adv Psychol Sci 2012 Aug 20;20(8):1287-1295. [CrossRef]
  16. Cash SJ, Thelwall M, Peck SN, Ferrell JZ, Bridge JA. Adolescent suicide statements on MySpace. Cyberpsychol Behav Soc Netw 2013 Mar;16(3):166-174. [CrossRef] [Medline]
  17. Cheng Q, Li TM, Kwok C, Zhu T, Yip PS. Assessing suicide risk and emotional distress in Chinese social media: a text mining and machine learning study. J Med Internet Res 2017 Jul 10;19(7):e243 [FREE Full text] [CrossRef] [Medline]
  18. Barak A. Emotional support and suicide prevention through the Internet: a field project report. Comput Hum Behav 2007 Mar;23(2):971-984. [CrossRef]
  19. Robinson J, Hetrick S, Cox G, Bendall S, Yuen HP, Yung A, et al. Can an Internet-based intervention reduce suicidal ideation, depression and hopelessness among secondary school students: results from a pilot study. Early Interv Psychiatry 2016 Feb;10(1):28-35. [CrossRef] [Medline]
  20. China Internet Network Information Center. 41st Statistical Report on Internet Development in China   URL: [accessed 2019-04-22] [WebCite Cache]
  21. Microblog Sina. Sina microblog user development report   URL: [accessed 2019-04-22] [WebCite Cache]
  22. Paul MJ, Dredze M. Discovering health topics in social media using topic models. PLoS One 2014;9(8):e103408 [FREE Full text] [CrossRef] [Medline]
  23. Barak A, Miron O. Writing characteristics of suicidal people on the Internet: a psychological investigation of emerging social environments. Suicide Life Threat Behav 2005 Oct;35(5):507-524. [CrossRef] [Medline]
  24. Kumar M, Dredze M, Coppersmith G, De Choudhury M. Detecting changes in suicide content manifested in social media following celebrity suicides. Proc ACM Conf Hypertext Soc Media 2015 Sep;2015:85-94 [FREE Full text] [CrossRef] [Medline]
  25. Jacob N, Scourfield J, Evans R. Suicide prevention via the Internet: a descriptive review. Crisis 2014 Jan 01;35(4):261-267. [CrossRef] [Medline]
  26. Westerlund M, Hadlaczky G, Wasserman D. The representation of suicide on the Internet: implications for clinicians. J Med Internet Res 2012 Sep 26;14(5):e122 [FREE Full text] [CrossRef] [Medline]
  27. Zhang L, Huang X, Liu T, Li A, Chen Z, Zhu T. Using linguistic features to estimate suicide probability of Chinese microblog users. Lecture Notes in Computer Science 2015:549-559. [CrossRef]
  28. O'Dea B, Larsen ME, Batterham PJ, Calear AL, Christensen H. A linguistic analysis of suicide-related twitter posts. Crisis 2017 Feb 23;38(5):319-329. [CrossRef] [Medline]
  29. Saha K, Weber I, De Choudhury M. A social media based examination of the effects of counseling recommendations after student deaths on college campuses. Proc Int AAAI Conf Weblogs Soc Media 2018 Jun:320-329 [FREE Full text] [Medline]
  30. Zhang Y, Ramesh A, Golbeck J, Sridhar D, Getoor L. A structured approach to understanding recovery and relapse in AA. 2018 Presented at: Proceedings of the 2018 World Wide Web Conference; 2018; Lyon. [CrossRef]
  31. De Choudhury M, Kıcıman E. The language of social support in social media and its effect on suicidal ideation risk. Proc Int AAAI Conf Weblogs Soc Media 2017 May;2017:32-41 [FREE Full text] [Medline]
  32. Hirsch JK, Duberstein PR, Conner KR, Heisel MJ, Beckman A, Franus N, et al. Future orientation and suicide ideation and attempts in depressed adults ages 50 and over. Am J Geriatr Psychiatry 2006 Sep;14(9):752-757 [FREE Full text] [CrossRef] [Medline]
  33. Chang EC, Yu EA, Lee JY, Hirsch JK, Kupfermann Y, Kahle ER. An examination of optimism/pessimism and suicide risk in primary care patients: does belief in a changeable future make a difference? Cogn Ther Res 2012 Dec 16;37(4):796-804 [FREE Full text] [CrossRef]
  34. Tan Z, Liu X, Liu X, Cheng Q, Zhu T. Designing microblog direct messages to engage social media users with suicide ideation: interview and survey study on Weibo. J Med Internet Res 2017 Dec 12;19(12):e381 [FREE Full text] [CrossRef] [Medline]
  35. Li X, Zhang Y, Wang Z, Yang S, Fei L. Characteristics of non-psychiatric patients with major depressive episode from general hospitals. Chin Mental Health J 2006;20(3):171-175. [CrossRef]
  36. Kroenke K, Spitzer RL. The PHQ-9: a new depression diagnostic and severity measure. Psychiatr Ann 2002 Sep 01;32(9):509-515. [CrossRef]
  37. Yu X, Tam WWS, Wong PTK, Lam TH, Stewart SM. The Patient Health Questionnaire-9 for measuring depressive symptoms among the general population in Hong Kong. Compr Psychiatry 2012 Jan;53(1):95-102. [CrossRef] [Medline]
  38. D'Zurilla TJ, Goldfried MR. Problem solving and behavior modification. J Abnorm Psychol 1971 Aug;78(1):107-126. [Medline]
  39. Burnap P, Colombo W, Scourfield J. Machine classification and analysis of suicide-related communication on Twitter. 2015 Presented at: Proceedings of the 26th ACM Conference on Hypertext & Social Media; 2015; Guzelyurt. [CrossRef]
  40. Silverman MM, Berman AL, Sanddal ND, O'carroll PW, Joiner TE. Rebuilding the tower of Babel: a revised nomenclature for the study of suicide and suicidal behaviors. Part 2: Suicide-related ideations, communications, and behaviors. Suicide Life Threat Behav 2007 Jun;37(3):264-277 [FREE Full text] [CrossRef] [Medline]
  41. Coppersmith G, Leary R, Whyne E, Wood T. Quantifying suicidal ideation via language usage on social media.   URL: [accessed 2018-07-26] [WebCite Cache]
  42. Pestian J, Nasrallah H, Matykiewicz P, Bennett A, Leenaars A. Suicide note classification using natural language processing: a content analysis. Biomed Inform Insights 2010 Aug 04;2010(3):19-28 [FREE Full text] [Medline]
  43. Park G, Schwartz HA, Eichstaedt JC, Kern ML, Kosinski M, Stillwell DJ, et al. Automatic personality assessment through social media language. J Pers Soc Psychol 2015 Jun;108(6):934-952. [CrossRef] [Medline]
  44. Wang X, Zhang C, Ji Y, Sun L, Wu L, Bao Z. A depression detection model based on sentiment analysis in micro-blog social network. Revised Selected Papers of PAKDD 2013 International Workshops on Trends and Applications in Knowledge Discovery and Data Mining 2013;7867:201-213. [CrossRef]
  45. Huang X, Zhang L, Chiu D, Liu T, Li X, Zhu T. Detecting suicidal ideation in Chinese microblogs with psychological lexicons. 2014 Presented at: UIC-ATC-SCALCOM '14 Proceedings of the 2014 IEEE 11th Intl Conf on Ubiquitous Intelligence and Computing and 2014 IEEE 11th Intl Conf on Autonomic and Trusted Computing and 2014 IEEE 14th Intl Conf on Scalable Computing and Communications and Its Associated Workshops (UIC-ATC-ScalCom); 2014; Washington p. 182-187. [CrossRef]
  46. Brezo J, Paris J, Turecki G. Personality traits as correlates of suicidal ideation, suicide attempts, and suicide completions: a systematic review. Acta Psychiatr Scand 2006 Mar;113(3):180-206. [CrossRef] [Medline]
  47. Harrington R. Depression, suicide and deliberate self-harm in adolescence. Brit Med Bull 2001;57(1):47-60. [CrossRef] [Medline]
  48. Vandivort DS, Locke BZ. Suicide ideation: its relation to depression, suicide and suicide attempt. Suicide Life-Threat Behav 1979;9(4):205-218. [CrossRef]
  49. Golbeck J, Robles C, Edmondson M, Turner K. Predicting personality from Twitter. 2011 Presented at: Privacy, Security, Risk and Trust (PASSAT) and IEEE Third International Conference on Social Computing (SocialCom); 2011; Boston. [CrossRef]
  50. Schwartz HA, Eichstaedt JC, Dziurzynski L, Kern ML, Blanco E, Kosinski M, et al. Toward personality insights from language exploration in social media. 2013 Presented at: AAAI Spring Symposium: Analyzing Microtext; 2013; Stanford.
  51. Passos IC, Mwangi B, Cao B, Hamilton JE, Wu M, Zhang XY, et al. Identifying a clinical signature of suicidality among patients with mood disorders: a pilot study using a machine learning approach. J Affect Disord 2016 Mar 15;193:109-116 [FREE Full text] [CrossRef] [Medline]
  52. Guan L, Hao B, Cheng Q, Yip PS, Zhu T. Identifying Chinese microblog users with high suicide probability using internet-based profile and linguistic features: classification model. JMIR Ment Health 2015;2(2):e17 [FREE Full text] [CrossRef] [Medline]
  53. Goutte C, Gaussier E. A probabilistic interpretation of precision, recall and F-score, with implication for evaluation. 2055 Presented at: European Conference on Information Retrieval; 2005; Santiago de Compostela p. 345-359. [CrossRef]
  54. Gao R, Hao B, Li H, Gao Y, Zhu T. Developing simplified Chinese psychological linguistic analysis dictionary for microblog. 2013 Presented at: International Conference on Brain and Health Informatics; 2013; Maebashi p. 359-368. [CrossRef]
  55. Liu F, Zhou N, Cao H, Fang X, Deng L, Chen W, et al. Chinese college freshmen's mental health problems and their subsequent help-seeking behaviors: a cohort design (2005-2011). PLoS One 2017;12(10):e0185531 [FREE Full text] [CrossRef] [Medline]
  56. Burnap P, Rana OF, Avis N, Williams M, Housley W, Edwards A, et al. Detecting tension in online communities with computational Twitter analysis. Technol Forecasting Soc Change 2015 Jun;95:96-108. [CrossRef]
  57. Huang X, Li X, Zhang L, Liu T, Chiu D, Zhu T. Topic model for identifying suicidal ideation in Chinese.   URL: [accessed 2018-07-26] [WebCite Cache]
  58. Cao X, Zhong B, Xiang Y, Ungvari GS, Lai KYC, Chiu HFK, et al. Prevalence of suicidal ideation and suicide attempts in the general population of China: a meta-analysis. Int J Psychiatry Med 2015;49(4):296-308 [FREE Full text] [CrossRef] [Medline]
  59. Bryan CJ, Butner JE, Sinclair S, Bryan ABO, Hesse CM, Rose AE. Predictors of emerging suicide death among military personnel on social media networks. Suicide Life Threat Behav 2017 Jul 28:1. [CrossRef]
  60. Chen J. Seeking help for psychological distress in urban China. J Community Psychol 2012 Apr 04;40(3):319-341. [CrossRef]
  61. Ma X, Xiang Y, Cai Z, Li S, Xiang Y, Guo H, et al. Lifetime prevalence of suicidal ideation, suicide plans and attempts in rural and urban regions of Beijing, China. Aust N Z J Psychiatry 2009 Feb;43(2):158-166. [CrossRef] [Medline]
  62. Sueki H, Yonemoto N, Takeshima T, Inagaki M. The impact of suicidality-related internet use: a prospective large cohort study with young and middle-aged internet users. PLoS One 2014;9(4):e94841 [FREE Full text] [CrossRef] [Medline]
  63. Suominen K, Isometsä E, Suokas J, Haukka J, Achte K, Lönnqvist J. Completed suicide after a suicide attempt: a 37-year follow-up study. Am J Psychiatry 2004 Mar;161(3):562-563. [CrossRef] [Medline]
  64. Barnett AG, van der Pols JC, Dobson AJ. Regression to the mean: what it is and how to deal with it. Int J Epidemiol 2005 Feb;34(1):215-220 [FREE Full text] [CrossRef] [Medline]
  65. Clifton L, Clifton DA. The correlation between baseline score and post-intervention score, and its implications for statistical analysis. Trials 2019 Jan 11;20(1):43-48 [FREE Full text] [CrossRef] [Medline]

DT: decision tree
PSPO: Proactive Suicide Prevention Online
SCLIWC: Simplified Chinese Linguistic Inquiry and Word Count
PHQ-9: 9-Item Patient Health Questionnaire
RF: random forest
SVM: support vector machine

Edited by G Eysenbach; submitted 26.07.18; peer-reviewed by J Han, K Bentley; comments to author 08.10.18; revised version received 02.12.18; accepted 30.03.19; published 08.05.19


©Xingyun Liu, Xiaoqian Liu, Jiumo Sun, Nancy Xiaonan Yu, Bingli Sun, Qing Li, Tingshao Zhu. Originally published in the Journal of Medical Internet Research (, 08.05.2019.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on, as well as this copyright and license information must be included.