This is an open-access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included.
Online health communities (OHCs) provide a convenient and commonly used way for people to connect around shared health experiences, exchange information, and receive social support. Users often interact with peers via multiple communication methods, forming a multirelational social network. Use of OHCs is common among smokers, but to date, there have been no studies on users’ online interactions via different means of online communications and how such interactions are related to smoking cessation. Such information can be retrieved in multirelational social networks and could be useful in the design and management of OHCs.
To examine the social network structure of an OHC for smoking cessation using a multirelational approach, and to explore links between subnetwork position (ie, centrality) and smoking abstinence.
We used NetworkX to construct 4 subnetworks based on users’ interactions via blogs, group discussions, message boards, and private messages. We illustrated topological properties of each subnetwork, including its degree distribution, density, and connectedness, and compared similarities among these subnetworks by correlating node centrality and measuring edge overlap. We also investigated coevolution dynamics of this multirelational network by analyzing tie formation sequences across subnetworks. In a subset of users who participated in a randomized, smoking cessation treatment trial, we conducted user profiling based on users’ centralities in the 4 subnetworks and identified user groups using clustering techniques. We further examined 30-day smoking abstinence at 3 months postenrollment in relation to users’ centralities in the 4 subnetworks.
The 4 subnetworks have different topological characteristics, with message board having the most nodes (36,536) and group discussion having the highest network density (4.35×10−3). Blog and message board subnetworks had the most similar structures with an in-degree correlation of .45, out-degree correlation of .55, and Jaccard coefficient of .23 for edge overlap. A new tie in the group discussion subnetwork had the lowest probability of triggering subsequent ties among the same two users in other subnetworks: 6.33% (54,142/855,893) for 2-tie sequences and 2.13% (18,207/855,893) for 3-tie sequences. Users’ centralities varied across the 4 subnetworks. Among a subset of users enrolled in a randomized trial, those with higher centralities across subnetworks generally had higher abstinence rates, although high centrality in the group discussion subnetwork was not associated with higher abstinence rates.
A multirelational approach revealed insights that could not be obtained by analyzing the aggregated network alone, such as the ineffectiveness of group discussions in triggering social ties of other types, the advantage of blogs, message boards, and private messages in leading to subsequent social ties of other types, and the weak connection between one’s centrality in the group discussion subnetwork and smoking abstinence. These insights have implications for the design and management of online social networks for smoking cessation.
Over the past decade, many people have turned to the Internet to find health-related information and support. According to the Pew Research Center, 72% of adult Internet users in the United States use the Internet for health-related purposes. Of those, 26% have read or watched someone else’s experience about health or medical issues in the last 12 months and 16% have used the Internet to find others who might share the same health concerns in the last year [
Various aspects of OHCs have been studied, such as topics of online discussions [
This study adopted a multirelational perspective in examining the network structure and dynamics of a popular OHC for smoking cessation. We examined the structure of the social network, as well as the coevolution of different types of subnetworks. Numerous publications highlight the importance of social influences on a range of smoking behaviors, including initiation, cessation, and relapse, in offline settings. Thus, we also illustrated how users’ behavior patterns in different subnetworks were related to their smoking status, using outcome data available for a subset of OHC members enrolled in a randomized trial. Our primary goals were to characterize multirelational social networks in an OHC for smoking cessation, identify dynamic coevolution of multirelational networks, and explore potential links between users’ online social network engagement and health behavior using a multirelational approach. To our knowledge, this study is the first to analyze large-scale multirelational social networks among OHC users of a Web-based, smoking cessation program. Furthermore, while previous studies have enumerated social networks based only on users’ posting behaviors, our multirelational social network incorporated private behaviors as well by considering both posting and reading behaviors of users. This study lays the foundation for an ongoing series of analyses aimed at understanding and optimizing the multirelational behaviors of a large OHC for smoking cessation.
We conducted these analyses using longitudinal data from BecomeAnEX, a Web-based smoking cessation program developed and managed by Truth Initiative (formerly American Legacy Foundation). Launched in 2008, BecomeAnEX was developed in accordance with the Clinical Practice Guidelines for Treating Tobacco Use and Dependence [
The BecomeAnEX community is composed of thousands of current and former smokers who interact via 4 primary communication channels. Users can exchange private messages via the site; users who have opted-in to receive email notifications are informed when they have received a new message. Message board posts are public communications made on a member’s profile page. All users have a community profile that can be customized with photos and personal information. Group discussions are threaded discussions among users with similar experiences or interests (eg, “March Quit Dates,” “Over 50 BecomeAnEXs”). Blogs are single entries made by users about their experiences, which appear in reverse chronological order on the site. Users can comment on others’ blog posts, creating threaded discussions similar to group discussions. Communication between and communication among members via blogs (and comments), message boards, and group discussions are all public communications that can be accessed by all BecomeAnEX users. Private messages occur only between two users. Blogs and group discussions elicit many-to-many communications, whereas posts on message boards and private messages are one-to-one communications. A community administrator addresses technical issues and spammers, but otherwise the community is largely unmoderated.
All user actions are date and time stamped and stored in a relational database. Before analysis, users’ identifiers were converted into alphanumeric strings using cryptographic hash functions, which makes this conversion infeasible to invert. The content of private messages was not included in the dataset to protect privacy.
The Python programming package NetworkX (v1.11) was used to construct and analyze social networks. The multirelational network consists of 4 subnetworks: private messages (PM), message boards (MB), group discussions (GD), and blogs (BL). In each subnetwork, a node represents an individual user, while a directed tie pointing from user A to user B means that B accessed information contributed by A or, in other words, information from A reached B. Taking the blog subnetwork as an example, if B posted a comment to one of A’s blogs then we assume B read (or at least skimmed) the original blog post, and so there is a tie pointing from A to B (A→B) indicating that A’s contribution has reached B. Similarly, if A’s clickstream (ie, the logs of clicking URLs) suggests that he or she has read that comment from B, then we add a B→A tie to reciprocate the earlier A→B tie. In such a directed network, a node’s in-degree refers to the number of other nodes that have ties pointing to it (ie, the number of people who may have influenced that user). Conversely, a node’s out-degree is the number of its outgoing ties (ie, the number of people that user has potentially influenced). A node’s total degree is the total number of its network neighbors irrespective of tie direction. By incorporating both posting (outgoing ties if a post was read by others) and reading (incoming ties) behaviors, our subnetworks can better capture how information flows among OHC users via each means of communication. When combining all nodes and ties in the 4 subnetworks, an aggregated network emerges, where a tie means two users have had some type of interaction in the community.
Our analysis proceeded in 4 steps. First, we conducted topological analysis to illustrate the characteristics of the 4 subnetworks. We examined the number of nodes with total degree greater than zero, the number of edges, density (defined as the number of actual ties divided by the number of possible ties), and the proportion of ties that were reciprocated. To compare the connectedness of the subnetworks, we identified the largest strongly connected component (LSCC). A strongly connected component is a subset of a network, in which there is a directed path between every pair of nodes. The LSCC is the one with the most nodes among all strongly connected components of a network. For each subnetwork, we also calculated the average shortest path among nodes in its LSCC. In general, the larger the LSCC and the shorter the average path length within the LSCC, the more connected the network.
Second, we measured structural similarities among the subnetworks using 2 metrics: centrality correlations at the individual level and tie overlap at the network level. At the individual level, one’s centrality can be captured by in- and out-degrees. Higher degrees usually mean higher centralities. We correlated each node’s rank by in- and out-degrees in one subnetwork with the same node’s rank by in- and out-degrees in the other 3 subnetworks. A high correlation coefficient between two subnetworks suggests that individuals with high centrality in one subnetwork tend to have high centrality in another. At the network level, the tie overlap between two subnetworks was calculated with Jaccard coefficients [
Third, coevolution analysis was used to demonstrate tie formation dynamics across subnetworks. Building on analyses of the static characteristics (ie, topology) and structural similarities of the subnetworks, we also investigated coevolution dynamics between the 4 subnetworks. We were specifically interested in how the formation of a tie between two users in one subnetwork triggered the formation of ties between the same two users in other subnetworks. For each subnetwork, we calculated the probability that this subnetwork hosts the first tie among all pairs of nodes that were connected in any of the 4 subnetworks. We also investigated whether the same pair of nodes that formed their first tie in one of the subnetworks would form new ties in other subnetworks. To answer this question, we analyzed the temporal sequence of tie formations, and calculated the probabilities to form subsequent ties in the second and third subnetworks given the subnetwork in which the first tie was formed, along with the most common tie sequences.
Finally, user profiling was used to identify whether centralities in different subnetworks had different implications for abstinence rates. We used Gaussian mixture models (GMMs), an unsupervised clustering technique, to divide users into groups based on their centralities in the 4 subnetworks so that those with similar centralities across subnetworks were placed in the same group. As the input for the profiling process, each user is represented by a vector with 8 elements, each one being the user’s in- and out-degree in the 4 subnetworks. To determine the number of user groups (K), we tried different K values (from 2 to 10) for GMM and selected the value that represented the best fit with our data as determined by log-likelihood.
The user profiling analysis was based on a subsample of N=1337 BecomeAnEX users who participated in a randomized smoking cessation trial (NCT01544153) and were assigned to the control arm (BecomeAnEX alone). The trial has been described in detail elsewhere [
The study protocol was approved by Chesapeake Institutional Review Board (protocol #CR00040526).
The dataset used in this study spanned the period from January 1, 2010, to May 31, 2015, and included records of both posting and reading behaviors of N=71,251 users who accessed content of the community on BecomeAnEX by clicking and reading a post (eg, a blog, a message board post, or a group discussion thread) or a private message. The community was migrated from a different platform before this period, which resulted in a slightly different user experience. Our analyses focus on this time frame given the stability of the social network feature set.
Descriptive statistics of the aggregated network and each of the 4 subnetworks are presented in
The topological analysis described in the previous section treated each subnetwork as independent. However, two individuals may be connected in more than one subnetwork in the online community. We computed how many pairs of nodes were connected in different subnetworks. As shown in
As shown in
Descriptive statistics of the aggregated network and the 4 subnetworks.
Characteristics | Aggregated | Blog | Message |
Group |
Private |
Number of nodes with degree >0 | 71,251 | 27,461 | 36,536 | 14,827 | 34,996 |
Number of edges | 2,578,659 | 1,065,514 | 1,027,694 | 956,506 | 60,555 |
Density | 5.08×10−4 | 1.41×10−3 | 7.70×10−4 | 4.35×10−3 | 4.94×10−5 |
% Of reciprocated ties | 18.22 |
23.61 |
29.62 |
3.57 |
8.94 |
% Of nodes in the LSCCa | 35.64 |
35.00 |
41.61 |
26.48 |
6.87 |
Average shortest path length in LSCC | 2.86 | 2.29 | 2.68 | 2.40 | 3.74 |
aLSCC: largest strongly connected component.
Network degree distributions for the aggregated network and 4 subnetworks.
The number of node pairs with ties in different networks.
Node pairs | Number of node pairs |
Pairs connected in 1 subnetwork only | 1,807,720 |
Pairs connected in 2 subnetworks | 300,758 |
Pairs connected in 3 subnetworks | 66,591 |
Pairs connected in 4 subnetworks | 6251 |
Spearman rank correlation coefficients between individual nodes’ in-degree (above the diagonal) and out-degree (below the diagonal) across the 4 subnetworks.
Network | Blog | Message board | Group discussion | Private message |
Blog | — | .45a | .35a | −.23a |
Message board | .55a | — | .40a | −.10a |
Group discussion | .33a | .32a | — | −.10a |
Private message | .43a | .35a | .35a | — |
a
Tie overlap measured by Jaccard coefficients between the 4 subnetworks.
Subnetwork | Blog | Message board | Group discussion | Private message |
Blog | — | 0.23 | 0.05 | 0.02 |
Message board | — | — | 0.05 | 0.02 |
Group discussion | — | — | — | 0.01 |
As shown in
Probabilities (P) of subnetworks to host the first tie between two nodes, conditional probabilities of subsequent ties in other subnetworks, and top tie sequences.
Subnetwork hosting the 1st tie | P(hosting the 1st tie), % | P(forming 2nd ties in other subnetwork | 1st tie), % | Top 2-tie sequence by P(sequence | 1st tie), % | P(forming 3rd ties in other subnetwork | 1st tie), % | Top 3-tie sequence by P(sequence | 1st tie), % |
Blog | 33.67 |
27.22 |
BLa→MBb |
4.37 |
BL→MB→GDc |
Message |
28.30 |
28.52 |
MB→BL |
5.10 |
MB→BL→GD |
Group |
39.24 |
6.33 |
GD→MB |
2.13 |
GD→BL→MB |
Private |
1.87 |
26.05 |
PMd→BL |
3.65 |
PM→MB→BL |
aBL: blog.
bMB: message board.
cGD: group discussion.
dPM: private message.
Gaussian mixture model with K=7 generated user groups that fit our data the best—the log-likelihood reached a plateau when K=7. Adding more clusters only increased the likelihood by 0.4%-4% (K=8, 9, and 10), but lower K values (K=2 to 6) reduced the likelihood by 16%-61%.
User groups and their average in- and out-degrees in 4 subnetworks.
User group | MBa |
MB |
BLb |
BL |
GDc |
GD |
PMd |
PM |
No. of users | 30-Day ppa at 3 monthse, % |
1. Super users | 118.8 | 150.1 | 176.9 | 183.7 | 118.9 | 30.2 | 6.3 | 6.3 | 18 | 55.6 (10/18) |
2. Regular contributors | 8.8 | 4.5 | 17.8 | 9.4 | 51.5 | 95.0 | 0.5 | 0.0 | 13 | 38.5 (5/13) |
3. Regular contributors | 11.1 | 19.1 | 24.8 | 25.8 | 17.8 | 0.0 | 0.7 | 0.3 | 88 | 30.7 (27/88) |
4. Lurkers | 3.9 | 1.0 | 12.6 | 0.0 | 56.4 | 0.0 | 0.4 | 0.0 | 68 | 14.7 (10/68) |
5. Lurkers | 3.0 | 0.7 | 14.4 | 0.0 | 0.0 | 0.0 | 0.4 | 0.0 | 118 | 14.4 (17/118) |
6. Inactive users | 0.0 | 0.9 | 0.0 | 0.0 | 0.0 | 0.0 | 1.2 | 0.0 | 210 | 9.5 (20/210) |
7. Inactive users | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 822 | 9.1 (75/822) |
aMB: message board.
bBL: blog.
cGD: group discussion.
dPM: private message.
eThe 30-day point prevalence abstinence (ppa) at 3 months calculated under intent-to-treat principle with nonresponders counted as smokers.
Users in group 1 were highly connected users with many incoming and outgoing ties across the 4 subnetworks. Groups 2 and 3 represented regular contributors who not only read what others posted, but also contributed content that was read by others, although they were less connected than those in group 1. Groups 4 and 5 were “lurkers” who mainly read posts from others but contributed little or no content of their own. The largest 2 groups (groups 6 and 7) consisted of trial participants who never visited the BecomeAnEX community (but may have used other smoking cessation features or content on the website), although those in group 6 received private messages and visits to their message boards from an average of about 1 other user.
The overall comparison between user groups found that high degree centralities were associated with high abstinence rates. For example, well-connected users in group 1 had significantly higher abstinence rates than regular contributors in group 3 (
The multirelational network approach enabled the discovery of meaningful subgroups of participants, using information that would have been lost in an aggregated network analysis. For example, users in group 3 and group 4 had similar total degrees in the aggregated network (73.8 and 71.7, respectively). However,
In addition, the specific subnetwork in which users gained their centralities resulted in varying abstinence rates. For instance, having high in- and out-degrees in the group discussion subnetwork alone did not necessarily suggest high abstinence rates. Lurkers in group 4 had the second highest average in-degree in the group discussion subnetwork, but the abstinence rate in group 4 was not significantly different from that of otherwise similarly connected lurkers in group 5 (
Multidimensional scaling of the 7 user groups.
To our knowledge, this study is the first to analyze a smoking cessation OHC from the perspective of a multirelational social network. We constructed 4 subnetworks based on users’ interactions via 4 communication channels and illustrated the value of a multirelational approach through topological analysis, coevolution analysis, and user profiling analysis. We found that the subnetworks based on different types of relationships had different topological characteristics. Specifically, the blog subnetwork was the most connected. The blog and message board subnetworks were topologically similar, whereas the private message subnetwork was topologically distinct from others.
Coevolution analyses of subnetwork tie formation dynamics found that although the group discussion subnetwork was the most common subnetwork for the initial formation of ties between users, ties formed there also had the lowest probability of leading to additional ties in another subnetwork. This may have been because the many-to-many group-based interactions did not encourage relationship building at the dyadic (ie, one-to-one) level. By contrast, roughly a quarter of users who formed their first ties in one of the other subnetworks, including in the private message subnetwork, went on to form additional ties in a second subnetwork. When two BecomeAnEX users are first connected via private messages, it is likely to be via a welcome message from one member to another. Even though many such messages may be a mere formality, they do seem to encourage users to build more ties in other social networks, notably the blog and the message board subnetworks. However, because we did not use the content of private messages to protect users’ privacy, we cannot directly validate whether these messages were indeed welcome messages.
User profiling based on users’ centralities across the 4 subnetworks showed that users can have different centralities in different subnetworks. This further highlights the importance of examining subnetworks within OHCs. For example, although users with high centralities across all 4 subnetworks had high abstinence rates, aggregating these subnetworks into one network would have lost valuable information about users’ online and offline behaviors. In other words, having high total degrees, high in-degrees, or high out-degrees in the aggregated network was not necessarily related to abstinence. Instead, our multirelational approach revealed that the subnetwork in which a user gained his or her centralities mattered.
Analyzing centrality with a multirelational approach is likely to be particularly useful for researchers and website designers interested in improving the effectiveness of OHCs as health interventions. This approach is capable of identifying which communication channels are facilitative of desired outcomes and which channels are not. We found that high centrality in the blog and message board subnetworks was positively associated with abstinence, whereas high centrality in the group discussion subnetwork was not. Recall that the group discussion subnetwork has the lowest reciprocity rate of 3.57% (32,928/923,578). Having a high degree in this subnetwork does not necessarily mean the user interacted or bonded with more peers in the community. These findings suggest that the group discussion feature may not be contributing to the health behavior change goals of the OHC and may be a candidate for revision to serve a more useful function or removal so as to avoid distracting new users from more active and/or effective communication channels. These insights would have been obscured with an aggregated network analysis; the multirelational approach allowed the signal of blog and message board centrality to be distinguished from the noise of group discussion centrality.
These findings shed light on users’ online behaviors in a multirelational social network in an OHC for smoking cessation and inform community design or redesign, management, and interventions for smoking cessation and other health-risk behaviors using Web-based platforms for behavior change. For example, because the blog and the message board subnetworks were similar in structure and often triggered the formation of subsequent ties in each other, better integration of blogs and message boards may help users connect with each other more easily. Private messages can be a good way to welcome new users and encourage them to build more ties with peers using other means of communication, such as visiting message boards. Conversely, group discussions had the lowest probabilities of triggering subsequent ties in other subnetworks.
Our observation that users with higher centralities had higher abstinence rates is consistent with previous research on the role of online social networks in smoking cessation. Two recent studies [
Although previous social network research has adopted the multirelational approach to study online social networks, the focus was mainly on traditional network analysis tasks, such as node ranking, link prediction, network evolution, and community discoveries [
This research has a few limitations. First, we showed that users with different roles based on their centralities in subnetworks can have different abstinence rates, but we cannot make causal statements regarding the links between centralities in certain subnetworks and abstinence. Second, the user profiling analysis was based only on a group of users who enrolled in a randomized trial. Third, we considered only the social network among users and did not incorporate the textual content of their interactions. This would be an interesting direction for future work to better understand what users shared and talked about in OHCs. Finally, we did not assess or examine other social influences that could affect smoking behaviors, such as family, friends, health care providers, and social media channels. It is important to determine whether and how these offline sources of social support interact with network dynamics that occur within OHCs for smoking cessation.
Directions for future work include investigating how information flows between nodes via different channels of communication. Topic modeling techniques can be used to capture what people talked about in each communication channel to model and predict the coevolution of multirelational social networks. The outcome of topic modeling also has the potential to reveal the evolution of users into specific self-assigned roles within an online community (eg, “Elder,” “Conflict Resolver”). Future work with this network will seek to identify content, communication strategies, and network connections that improve abstinence outcomes.
This study represents one of the first efforts to study the structure and dynamics of a large-scale OHC for smoking cessation. Specifically, user behavior patterns in the subnetworks were found to be differentially associated with important outcomes, including formation of subsequent ties to the network as well as abstinence from smoking. Whereas blogs, message boards, and private messages are effective in triggering subsequent social ties in other subnetworks, group discussions are not. Centralities in the group discussion subnetwork are not indicative of smoking outcome either. The results highlight the value of the multirelational approach in analyzing large-scale online social networks among OHC users. Our research also contributes to multirelational social network analysis by showing that multirelational network analysis of online ties can provide valuable insights for understanding individual health behaviors.
Gaussian mixture model
largest strongly connected component
online health community
This work is supported by the National Cancer Institute of the National Institutes of Health (#R01 CA192345).
SC, AMC, MSA, JLP, and ALG are employees of Truth Initiative, which runs the BecomeAnEX smoking cessation website.