This is an open-access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included.
Digital health social networks (DHSNs) are widespread, and the consensus is that they contribute to wellness by offering social support and knowledge sharing. The success of a DHSN is based on the number of participants and their consistent creation of externalities through the generation of new content. To promote network growth, it would be helpful to identify characteristics of superusers or actors who create value by generating positive network externalities.
The aim of the study was to investigate the feasibility of developing predictive models that identify potential superusers in real time. This study examined associations between posting behavior, 4 demographic variables, and 20 indication-specific variables.
Data were extracted from the custom structured query language (SQL) databases of 4 digital health behavior change interventions with DHSNs. Of these, 2 were designed to assist in the treatment of addictions (problem drinking and smoking cessation), and 2 for mental health (depressive disorder, panic disorder). To analyze posting behavior, 10 models were developed, and negative binomial regressions were conducted to examine associations between number of posts, and demographic and indication-specific variables.
The DHSNs varied in number of days active (3658-5210), number of registrants (5049-52,396), number of actors (1085-8452), and number of posts (16,231-521,997). In the sample, all 10 models had low R2 values (.013-.086) with limited statistically significant demographic and indication-specific variables.
Very few variables were associated with social network engagement. Although some variables were statistically significant, they did not appear to be practically significant. Based on the large number of study participants, variation in DHSN theme, and extensive time-period, we did not find strong evidence that demographic characteristics or indication severity sufficiently explain the variability in number of posts per actor. Researchers should investigate alternative models that identify superusers or other individuals who create social network externalities.
Digital health social networks (DHSNs), otherwise known as discussion forums or peer-to-peer support groups, are in abundance [
In an era of increasing health costs [
As we increasingly rely on technology to help us look after our health, management science is playing a greater role in using data to measure efficiencies [
As a discipline, social network theory (SNT) maps social capital and the strength of relationships in networks. Within a network, nodes are individual actors, and ties are the relationships between nodes. For decades, disciplines such as economics, political science, public health, marketing, and finance have analyzed real world relationships within networks of actors [
Recently, SNT has shifted toward the topology of scale-free networks. This stream of research investigates whether network growth is random, if networks evolve, follow encoded and organized principles [
In the context of this study, actors are DHSN registrants who have created, at minimum, 1 post. From this perspective, 3 fundamental principles guide network growth.
The first is the network’s total number of posts. In most DHSNs, actor posts remain on the network, and each new post adds to the quantitative size and value of the community. Whether actors passively read, actively respond to, or agree or disagree with new content, the quantitative value of the network
Second is the number of actors in the network. If a network contains
Third, the mathematical relationship between these 2 quantities (positive network externalities and number of actors) represents a power law [
Monitoring nodes and ties, and monitoring topologies are important considerations for those who manage social networks. However, these tasks are retrospective as they examine a network’s past state. Methods to drive future growth and promote individual agency are required. As the creation of externalities governs the success of a network, it would be helpful to profile actors who create value by generating externalities [
The 4 interventions in this study [
Theoretical constructs and evidence-base.
Theoretical construct | Problem drinking | Depressive disorder | Panic disorder | Smoking cessation |
Brief intervention [ |
X | X | X | X |
Cognitive behavioral therapy [ |
X | X | ||
Gamification [ |
X | X | X | X |
Harm reduction [ |
X | X | ||
Health belief model [ |
X | X | X | X |
Motivational interviewing [ |
X | X | X | X |
Normative feedback [ |
X | X | ||
Social cognitive theory [ |
X | X | X | X |
Structured relapse prevention [ |
X | |||
Targeting and tailoring [ |
X | X | ||
Transtheoretical model [ |
X |
Four social networks.
Social network | Social network launch date | Data acquisition date | Number of days active | Number of subjects registered in program | Number of actors, n (%) | Number of actor postsa | Number of subjects in analysis, n (%)a |
Problem |
Dec 26, 2005 | Dec 31, 2015 | 3658 | 5049 | 1085 (21.49) | 16,231 | 4784 (94.75) |
Depressive |
Feb 6, 2003 | Dec 31, 2015 | 4712 | 11,675 | 2065 (17.69) | 20,516 | 1958 (16.77) |
Panic disorder | January 23, 2002 | Dec 31, 2015 | 591 | 9783 | 3579 (36.58) | 61,743 | 6151 (62.87) |
Smoking |
Sep 26, 2001 | Dec 31, 2015 | 5210 | 52,396 | 8452 (16.13) | 521,997 | 12,061 (23.01) |
Total | n/ab | n/a | 18,671 | 78,903 | 15,181 (19.24) | 620,487 | 25,178 (31.91) |
Mean | n/a | n/a | 4688 | 19,726 | 3795 (19.24) | 155,122 | 6239 (31.63) |
aModerator posts removed.
bn/a: not applicable.
Demographic characteristics (age, gender, highest level of education obtained, current occupation), and indication-specific details (
Indication-specific data collected at registration.
Intervention | Indication-specific data | Measurement |
Average drinks per day | Drop-down menu 0-30+ | |
Program goal: cut down, stop, unsure | Likert scale | |
Depression rating over past 2 weeks | Likert scale 0-10 | |
Level of distress over past 2 weeks | Likert scale 0-10 | |
Level of interference over past 2 weeks | Likert scale 0-10 | |
Tried cognitive behavior therapy in the past | Yes or no | |
Currently being treated | Yes or no | |
Using program with health care professional | Yes or no | |
Number of attacks over past 2 weeks | Drop-down menu 0-51+ | |
Average fear rating during attack | Likert scale 0-10 | |
Attack interference with average daily life | Drop-down menu 0-4 | |
Attack causing avoidance | Drop-down menu 0-4 | |
Tried cognitive behavior therapy in the past | Yes/No | |
Use of program with health care professional | Yes/No | |
Smoking patterns: ≥ 1 cigarette per day, occasional smoker, recently quit | Drop-down menu | |
Last cigarette: >24 hours, <24 hours | Radio button | |
Cigarettes per day | Drop-down menu 0-100+ | |
Total years smoked | Drop-down menu 0-75+ | |
Minutes to first cigarette: >60, 31-60, 6-31, ≤5 | Drop-down menu | |
Past year quit attempts > 24 hours | Drop-down menu 0-10+ | |
Number of cohabitant smokers | Drop-down menu 0-10+ | |
Fagerstrom dependency score (very low, low, moderate, high, very high) | Internal calculation |
As a first step in profiling actors based on characteristics, and to investigate the feasibility of developing predictive models that identify superusers in real time, the objective of this study was to examine the association between number of posts and actor demographic and indication-specific variables inputted at registration.
Data were extracted from the custom SQL DHSN databases of the 4 digital health interventions. As they contained full data sets, samples totaling 24,954 registrants and 3285 actors were used in the analysis (
Sample size.
Intervention | Sample size | Sample size actors | Sample size posts |
Problem drinking | 4484 | 884 | 12,914 |
Depressive disorder | 1958 | 206 | 3190 |
Panic disorder | 6151 | 585 | 18,921 |
Smoking cessation | 12,061 | 1610 | 90,894 |
Total sample | 24,954 | 3285 | 125,919 |
A total of 5 models were developed to explore whether posting behavior was associated with demographics characteristics and indication-specific severity amongst all registrants (
Regression models for all subjects.
Model | Equation |
1 | |
2 | |
3 | |
4 | |
5 |
Another 5 additional regression models were developed to explore whether posting behavior was associated with demographics characteristics and indication-severity amongst actors (
Regression models for actors.
Model | Equation | |
6 | ||
7 | ||
8 | ||
9 | ||
10 |
Dummy variables were created for categorical data, with 1 dummy variable excluded during regressions. Analyses were performed with Stata version 13 (Stata Corp LLP, College Station, TX, USA).
As outlined in previous research conducted on the 4 DHSNs, the number of posts per actor is right skewed, indicating the presence of a power law [
All data collection policies and procedures adhered to international privacy guidelines [
All 5 models had low R2 values (see
A total of 4 independent demographic variables were included in each of the 10 models (
In 9 of the models, age was positively and significantly associated with number of posts (beta range =.13-.4). This means that as age of registrants increased, number of posts increased marginally.
Education was positively and significantly associated to the number of posts in 6 models (beta range =.082-.315). This means that within these 6 models, number of posts increases by less than 1 with every unit increase in education category.
Gender was negatively and significantly associated number of posts in 4 models (beta range =−.766 to −.272). This means that within these 4 models, number of posts decreased by less than 1 with male registrants.
Registrants had the option of selecting from 1 of 12 occupations. Compared with registrants who indicated that they were full-time students, occupation was positively associated with number of posts in 14 cases (beta range =.377-5.301), and negatively associated with number of posts in 19 cases (beta range =-2.609 to -.587).
The variable
R2 values for ten models.
Model | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
R2 | 0.016 | 0.013 | 0.02 | 0.043 | 0.026 | 0.027 | 0.018 | 0.061 | 0.086 | 0.031 |
Statistically significant demographic independent variables (all models).
Independent variable | Model 1 |
Model 2 |
Model 3 |
Model 4 |
Model 5 |
Model 6 |
Model 7 |
Model 8 |
Model 9 |
Model 10 |
Percentage significant | |
Gender | −.272 |
−.766 |
−.422 |
−.365 |
40 | |||||||
Age | .400 |
.234 |
.324 |
.130 |
.322 |
.136 |
.138 |
.285 |
.184 |
90 | ||
Education | .146 |
.315 |
.195 |
.082 |
.095 |
.139 |
60 | |||||
Full-time student (reference) | ||||||||||||
Stay at home mom or dad | −1.057 |
20 | ||||||||||
Management | .546 |
−1.675 |
20 | |||||||||
Teacher or professor | −2.348 |
−1.139 |
.810 |
−2.609 |
−.949 |
50 | ||||||
Administrative, financial or clerical sales or service | .519 |
.377 |
.852 |
-.894 |
40 | |||||||
Technologist or technical occupation | .532 |
10 | ||||||||||
Farming, forestry, fishing or mining | 1.016 |
5.301 |
.400 (.04) | 3.793 |
30 | |||||||
Trades, transport or equipment operator | −1.564 |
−1.047 |
−.690 |
−.696 (.05) | 40 | |||||||
Processing, manufacturing or utilities | −.846 |
−.641 |
20 | |||||||||
Unemployed at present or on work leave | .479 |
−.820 |
−.587 (.02) | 20 | ||||||||
Professional services (eg, certified accountant, lawyer, doctor) | −.856 |
10 | ||||||||||
Occupation not listed | .703 |
.825 |
.488 |
−1.314 |
−.945 (.001) | .647 |
60 |
In total, 10 indication-specific variables were tested for their association with posting behavior in the 2 addiction health interventions (
In the problem drinking intervention, registrants had the option of selecting 1 of the 3 program goals. Compared with registrants who indicated that they wanted to cut down,
Statistically significant indication-specific independent variables (addiction interventions).
Independent Variables | Model 2 |
Model 7 |
Model 5 |
Model 10 |
|
Cut down (reference) | n/aa | n/a | |||
Quit drinking | .463 |
n/a | n/a | ||
Not sure | −.460 |
−0.509 |
n/a | n/a | |
≥ one cigarette per day, occasional smoker, recently quit | n/a | n/a | .278 |
||
Last cigarette: >24 hours, <24 hours | n/a | n/a | .534 |
||
Cigarettes per day | n/a | n/a | |||
Total years smoked | n/a | n/a | .040 |
.025 |
|
Minutes to first cigarette: >60, 31-60, 6-31, ≤5 | n/a | n/a | .705 |
.625 |
|
Past year quit attempts > 24 hours | n/a | n/a | −.048 |
−.054 |
|
Number of cohabitant smokers | n/a | n/a | |||
Fagerstrom dependency score (very low, low, moderate, high, very high) | n/a | n/a | 0.657 |
0.651 |
an/a: not applicable.
In model 5, increased cigarette consumption (smoking patterns) (beta=.278,
In both models, increases in total years smoked (beta=.040,
Ten indication-specific variables were tested for their association with posting behavior in the 2 mental health interventions. Whether a participant had
In models 3, 4, and 9 posting behavior was positively and significantly associated with experience with CBT (beta= .851,
Statistically significant indication-specific independent variables (mental health).
Independent variables | Model 3 |
Model 8 |
Model 4 |
Model 9 |
Depression rating past 2 weeks (0-10) | n/aa | n/a | ||
Level of distress past 2 weeks (0-10) | n/a | n/a | ||
Level of interference past 2 weeks (0-10) | n/a | n/a | ||
Currently being treated | n/a | n/a | ||
Tried cognitive behavior therapy in the past | .851 |
1.118 |
.870 |
|
Number of attacks over past 2 weeks |
n/a | n/a | .054 |
|
Average fear rating during attack | n/a | n/a | −.099 |
|
Attack interference in average daily life | n/a | n/a | .406 |
.224 |
Attack causing avoidance | n/a | n/a | ||
an/a: not applicable.
In the depression interventions, other than past CBT experience, there were no statistically significant associations with posting behavior.
In the panic disorder intervention, attacks interfering in average daily life were positively and significantly associated with posting behavior (beta=.406,
Despite observable statistically significant results in demographic and indication-specific data, all regressions had low R2 values, and their impact on superuser behavior was minimal. As mentioned previously, all models fail to explain the variance of the dependent variables.
Based on the results in 4 of the 10 models, females tend to post more than males. However, these results should be interpreted with caution as the impact was minimal (beta range=−.766 to −.272) and only statistically significant in all subject models. These results also do not confirm the gender of superusers.
Increased posting with age was positively and statistically significant in 9 of the 10 models, although the increase is negligible and should be interpreted with caution (beta range=.130-.400). For example, the analysis did not consider whether addiction treatment for smoking cessation, or if treatment for mental health issues, also coincides with age.
Although the impact is minimal, increased education was related to increases in posting behavior in 6 of the 10 models (beta range=.082-.315). The issue of education level and use of medical resources has a rich history in the literature and is nonconclusive. For example, one might assume that actors with higher levels should have better knowledge seeking skills and make limited use of DHSNs, or conversely, that actors with lower education levels and fewer formal resources would use DHSNs with greater intensity.
A recent qualitative review on factors affecting therapeutic compliance found the effect of education level to be equivocal [
In the smoking cessation intervention, inexperienced quitters who have smoked longer, have increased dependency, and have recently quit, tend to post more. This supports past research indicating that the intervention’s DHSN primarily acts as a relapse prevention tool for new quitters [
It was interesting to note that
The results of this study suggest that demographic or indication-specific variables have limited association with the creation of externalities in DHSNs. What, if anything, may be associated with posting behavior? If superusers are key to the growth and sustainability of DHSNs, how can they be detected?
The real-time assessment of phenotype, or observable traits resulting from the interaction of an individual in an environment, have recently been recognized as key to the next frontier of medicine [
For example, a recent study identified the ability to use natural language processing to detect phenotypes in electronic health records [
DHSN content may contain rich sources of phenotypes as an post or an actor’s profile may include avatars, images, badges or awards for participation, likes or other semiotic indicators of support from other members, or links to specific outside resources. Post content may be mined for specific keywords, phrases, or even tone. Time of post, time between posts, response to specific types of content or members, or other time-based interactions may also be indicative of specific behavior. Recent health care informatics research has also identified a relationship between increased systems use and outcomes, and a variety of unique system measures that may help categorize behaviors [
A challenge is that even if phenotypes can be predicted, risk-stratifying behavior may prove difficult. However, the medication adherence literature, which generally classifies patients as full compliers, partial compliers, or noncompliers may give insights on categorizing behavior similar to nonadherence [
In some respects, the low R2 values in the models and lack of statistically significant variables in this study expose the limitations of big data. Popular belief holds that large data sets of survey data will contain insights and intelligence that have been previously unobtainable [
The results of this study are from “real world” social networks and the main strengths are the longevity of the DHSNs, the number of posts, the 4 separate indications, and that 2 of the social networks in the study were focused on mental health, and the remaining 2 on addictions.
Ideally, data from this study would be derived from a randomized controlled experiment. However, it would be difficult, if not impossible, to recruit a study population and execute a study in a similar sample. We are not aware of any other study in the health care literature with such an extensive and complete dataset, and as such, results should be interpreted accordingly.
A strength and limitation is that the populations analyzed are self-selecting populations that actively sought help. In the context of this study it was helpful to have datasets of active and engaged participants. However, these results may not be indicative of populations of patients in health plans, hospital networks, or mass public health campaigns.
A limitation to this study is that demographic and indication-specific data was self-report. Self-report data is common in digital health studies, and the consensus is that data from subjects is at least as reliable as pencil-and-paper questionnaires [
Based on the large number of study participants, variation in DHSN theme, and extensive time-period, we did not find strong evidence that demographic characteristics or indication severity sufficiently explain the variability in number of posts per actor. Researchers should investigate alternative methods and models that may identify individuals who promote DHSN growth.
Digital health social network
Trevor van Mierlo is the CEO & Founder of Evolution Health Systems. Evolution Health owns and manages digital health interventions, including the applications analyzed in this study.