This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included.
The most popular social networking site in the United States is Facebook, an online forum where circles of friends create, share, and interact with each other’s content in a nonpublic way.
Our objectives were to understand (1) the most commonly used terms and phrases relating to breast cancer screening, (2) the most commonly shared website links that other women interacted with, and (3) the most commonly shared website links, by age groups.
We used a novel proprietary tool from Facebook to analyze all of the more than 1.7 million unique interactions (comments on stories, reshares, and emoji reactions) and stories associated with breast cancer screening keywords that were generated by more than 1.1 million unique female Facebook users over the 1 month between November 15 and December 15, 2016. We report frequency distributions of the most popular shared Web content by age group and keywords.
On average, each of 59,000 unique stories during the month was reshared 1.5 times, commented on nearly 8 times, and reacted to more than 20 times by other users. Posted stories were most often authored by women aged 45-54 years. Users shared, reshared, commented on, and reacted to website links predominantly to e-commerce sites (12,200/1.7 million, 36% of all the most popular links), celebrity news (n=8800, 26%), and major advocacy organizations (n=4900, 15%; almost all accounted for by the American Cancer Society breast cancer site).
On Facebook, women shared and reacted to links to commercial and informative websites regarding breast cancer and screening. This information could inform patient outreach regarding breast cancer screening, indirectly through better understanding of key issues, and directly through understanding avenues for paid messaging to women authoring and reacting to content in this space.
Nearly 3 million women have a history of breast cancer today in the United States [
Online social media and social networks potentially provide an opportunity for women to become aware, or more aware, of breast cancer risk and screening options and methods. Such novel channels can allow women to share intimate information regarding their symptoms, signs, screening, diagnosis, and treatment with close friends and relatives. In this study, we explored content relating to breast cancer screening on the leading US online social networking platform. Our approach has several key differentiators from past and current work.
First, we listened rather than reaching out and teaching or communicating. We sought to illustrate that researchers can use an online platform to listen to users in a way that respects their privacy and doesn’t identify them or any of their actual text. This social experience can be viewed through the lens of social normative theory, recognizing that these online channels allow users to build relationships and potentially influence the attitudes and behaviors of connected others [
Yet most research using Facebook, including our own, has hinged on outreach instead of listening. While an online social network is designed to be a social experience for its users, commercial outreach by advertisers and researchers is simple and cost effective. Such outreach methods exploit the personal and intimate setting afforded by the network and its highly tailored ability to finely target users based on expressed and inferred interests. For example, we reached more than 50,000 white, Latino, and Hispanic American women with an interest in maternity care in Los Angeles in part through targeted Facebook advertisements [
Second, Facebook is an intrinsically different platform from other online platforms. Recently, Rosenkrantz and colleagues provided an innovative and important look at how women perceive the mammography experience through examination of several hundred carefully selected tweets both before and after the screening [
However, these other platforms differ in use, beliefs, attitudes, experiences, typical audience, and context of use. Facebook allows its users to experience gratification from satisfying the need to belong and the need for self-presentation [
Third, the scale of our data source exceeds those of other studies leveraging Facebook data. Some studies have examined the rate of engagement with sampled posted Facebook content on breast cancer screening [
Yet there is a wide and deep penetration of Facebook in the United States. More than half of all American adults are users [
We believe that online investigations are crucial to understanding women’s experiences better, and to inform strategies that seek to deal with obstacles to improved utilization of screening. This pilot study is a cursory first step: an exploration of the terms and phrases used by female users on Facebook relating to breast cancer screening over a 1-month period. Our hypothesis was that adult women would be actively generating content and interacting with other users’ content on Facebook on the topic of breast cancer screening. Our objectives were to understand (1) the most commonly used terms and phrases relating to breast cancer screening, (2) the most commonly shared website links that other women interacted with, and (3) the most commonly shared website links, by age groups.
We contracted with Sysomos Scout (Sysomos, Toronto, ON), a commercial infomediary that resells Twitter, Facebook, blog, and other social media data [
We controlled searches using the proprietary tool’s user interface (
Sysomos matched these keywords to any Facebook
All counts for numbers of stories and interactions are unique, by Facebook’s construction of nonoverlapping categories of story, reshare, comment, and reaction. Counts of authors are more complex. Within a category, the number of authors is the unique number of authors. For example, if 45,000 women commented on an article, these are nonduplicated authors. Across categories this may not hold, as the same author may post several stories, comment on other stories, and react to many others.
Accordingly, we cannot add the numbers of authors across the different categories of interactions. For example, 1.1 million unique authors making reactions and the 0.4 million unique authors making comments cannot be added to obtain 1.5 million authors, because this resulting sum double counts women doing both. However, the actual total is no smaller than 1.1 million and no larger than 1.5 million. We conservatively report only the lower number and use phrases such as “…at least…” in reporting these totals.
Sysomos reported to us summary aggregate statistics such as totals, time-based trends such as subtotal by day, content-based subtotals, keyword prevalence, other word prevalence in context of keyword, and most popular website links that were posted or shared. Importantly, Facebook explicitly limits some aggregate data to just the top 10 items within a category and limits all aggregate data to items with at least 100 instances. This is due to confidentiality concerns and the ability otherwise to potentially reidentify individuals. We provide selected excerpts of these data, including tabular and graphical summaries.
In this pilot study, we were most interested in the type of content that was being shared. Links to website content originate in a story. Such stories can be authored by women who embed a link in a posted story, or authored by a marketer or news media organization that uses a shortened (eg, bitly) Web address to allow ease of use and visibility. Sysomos allowed us to identify the actual 10 most popular links and the frequency of each, by interaction type and content of link.
We clicked through all of these links and examined their content in detail. One study team member, a physician scientist (MH), manually categorized their content retroactively. This led to us identifying 5 mutually exclusive and collectively exhaustive categories to which all shared links belonged. These categories were
This study was conducted using completely deidentified, aggregated summary data provided by a third party, and accordingly did not involve human participant research and did not require an institutional review board determination or approval in our institution.
Radiology
Doctor xray
Hospital Xray
Hospital Radiology
Radiologist
breast center
breast imaging center
breast cancer screening
breast screen guidelines
breast screening guidelines
Mammogram
Mammography
Breast Exam
Digital mammography
digitized mammographic image
Breast tomosynthesis
three-dimensional mammography
three-dimensional mammogram
3-D mammogram
3d mammogram
3d mammogram
breast imaging
breast image
full-field digital mammogram
Screening Mammography
Screening Mammogram
Diagnostic Mammography
Diagnostic Mammogram
3-D mammography
mastectomy
3d mammography
Lumpectomy
full-field digital mammography
digital breast tomosynthesis
3d mammography
breast tumor
Digital mammogram
breast needle biopsy
breast xray
breast x-ray
Xray of my breasts
X-ray of my breasts
X-ray of my breast
Doctor x-rayed my breasts
Hospital X-rayed my breasts
x-rayed my breasts
needle biopsy done of my breast
needle biopsy of my breast
breast lump
breast lumps
lump in my breast
BRCA tested positive
BRCA positive
family risk breast
high-risk breast
high-risk breasts
abnormal breast screen
abnormal breast x-ray
abnormal breast xray
dense breast
dense breasts
breast density
DCIS
ductal carcinoma
fatty breasts
fatty breast
breast cancer
Sysomos Subscriber Dashboard screenshot showing total authors, sex and age distributions, sentiment, top links shared, and top inferred topics (source: Sysomos).
More than 1.7 million unique interactions (comments on stories, reshares, and emoji reactions) and stories associated with the 69 breast cancer screening keywords were generated by at least 1.1 million Facebook users over the 30-day period from November 16, 2016 to December 15, 2016.
On average, each of the 59,000 unique stories during the month was reshared 1.5 times, commented on nearly 8 times, and reacted to more than 20 times by other users seeing the original content.
Stories and interactions were most often authored by women aged 45-54 years (
Sysomos Subscriber Dashboard screenshot showing trends in interaction types and age groups daily over the 30-day rolling time period (November 15-December 15, 2016) (source: Sysomos).
A search of mentions of “Doherty” in any link shared, reshared, or otherwise interacted with showed 6700 mentions by 6600 unique authors over the month, respectively 0.4% and 0.6% of the overall totals for the month.
Common terms relating to breast cancer and screening mammography mentioned in any context included “mammogram” (266,000 interactions, or 16% of the month’s total interactions), “lump” (26,600, 1.6%), “abnormal mammogram” (4400, 0.3%), “scars” (4000, 0.2%), “BRCA” (3800, 0.2%), “dense” (3200, 0.2%), “DCIS” (3000, 0.2%), “high risk” (2900, 0.2%), and “compression” (1000, 0.06%).
Across all interactions, the 10 most popular links accounted for a total of 33,600 interactions, or almost 2% of monthly total interactions (
Links to
The next largest category was links to
The third largest category represented links to
Importantly, in this category were at least 700 shared links relating to mercola.com, a natural health advocacy site that presented a view against breast cancer screening, including multiple references to scientific studies and a recent article by Welch and colleagues [
Finally, more than 1 in 6 links were not in relation to breast cancer or screening terms. These presumably were stories, reshares, and comments in which a user conveyed multiple messages, some about breast cancer (hence they were selected by Sysomos) and some not about this.
We repeated our analyses to understand how interest and interactions changed across age groups (
Additionally, we noted that noncelebrity-related news and information about breast cancer represented a larger share among the youngest users (50%) than among older users. We also noted the apparent complete lack of interest among the age group 65 years and older in celebrity-related breast cancer information.
Distribution of most popular links by category and interaction type.
Top 10 links | Most popular reshared with others | Most commented on | Most reacted to | Overall most popular across all interactions |
E-Commerce related to breast cancer | 3100 (59%) | 400 (29%) | 9400 (32%) | 12,200 (36%) |
Celebrity breast cancer information | 300 (6%) | 500 (36%) | 8000 (28%) | 8800 (26%) |
Noncelebrity breast cancer information | 1000 (19%) | 300 (21%) | 1700 (6%) | 2100 (6%) |
Breast cancer advocacy and charity | 300 (6%) | 100 (7%) | 4500 (16%) | 4900 (15%) |
Unrelated content | 600 (11%) | 100 (7%) | 5400 (19%) | 5600 (17%) |
Total of top 10 link volume | 5300 | 1400 | 29,000 | 33,600 |
Distribution of most popular links by category and age group.
Top 10 links | Age group (years) | Overall most popular across all interactions | |||||
18-24 | 25-34 | 35-44 | 45-54 | 55-64 | ≥65 | ||
E-Commerce related to breast cancer | 100 |
900 |
2200 |
3500 |
3400 |
3100 |
12,200 |
Celebrity breast cancer information | 200 |
900 |
2200 |
2500 |
1100 |
0 |
8800 |
Noncelebrity breast cancer information | 700 |
200 |
600 |
400 |
900 |
1600 |
2100 |
Breast cancer advocacy and charity | 0 |
0 |
300 |
600 |
1500 |
3100 |
4900 |
Unrelated content | 0 |
400 |
700 |
900 |
1700 |
1100 |
5600 |
Total of top 10 link volume | 1400 | 3200 | 6000 | 5900 | 8600 | 8900 | 33,600 |
In this novel pilot study, we examined aggregated mentions of terms and phrases, and shared website links among women in the United States on Facebook in relation to breast cancer screening over a 1-month window. We found substantial content posted by, shared among, and interacted with by large numbers of women. The most popular stories provided information on women undergoing treatment for breast cancer and information on online destinations to purchase small items and make small donations to further research.
We observed that the timing of upswings in interest often appeared to coincide with celebrity news, such as a picture shared by Shannen Doherty of herself about to receive radiotherapy for her breast cancer. In general, our work supports the importance of sharing of and commenting on stories about well-known celebrities with breast cancer [
It is well-known that the Internet allows a so-called long tail to form, in which many niche sites, topics, or products are, respectively, visited, mentioned, or bought by a small number of users, in contrast with more popular sites, topics, or products [
Yet, despite these restrictions, we found that there was a plurality of links to commercial e-commerce websites marketing items related to breast cancer themes, such as thebreastcancersite.com. We saw little sharing of original medical news content from formal online media or formal health information publishers, despite the positive impact this can have [
We also found less content than we had expected from some of the most prominent advocacy organizations, such as Susan G. Komen, although the American Cancer Society’s breast cancer site was the link with the second most frequent interactions. Finally, we saw fewer mentions than we had expected of terms anecdotally thought to be points of concern for women (eg, breast compression during imaging) and that had been found among their tweets in a recent innovative study by Rosenkrantz and colleagues [
As we continue to examine this new data source, we expect to obtain more detailed insights about what women are interacting about and how they are interacting regarding breast cancer terms. We expect that such data can inform the outreach of advocacy organizations, and can inform campaigns to improve rates of screening and to educate high-risk women concerning their options, among many other examples.
Methodologically, this study adds to our understanding of patients’ and consumers’ articulated thoughts and feelings about important public health initiatives such as breast cancer screening. We showed that summarized information is available from the world’s leading online social network, and note that this commercially available information is distinct from more easily analyzed public online social media. Given the greater demographic representativeness of Facebook, compared with other online social media and social networks [
While our study had several important strengths, including novelty, exhaustiveness, and national scale in the United States, there are several important limitations. Our data source, Sysomos, is a commercial reseller of data obtained indirectly from Facebook through another intermediary, Datasift. Data provenance, custody, and governance must be assumed but cannot be verified or guaranteed. For example, software errors could occur at each one of these handoffs, as well as within each segment of the data custody chain.
In particular, Facebook is the data owner, whose terms of service do not permit actual visualization of the original post or comment. To protect users’ privacy, all data were aggregated, deidentified, and mapped coarsely into topics. We therefore had no independent ability to confirm whether the reported statistics we obtained were accurate, representative, or exhaustive. Moreover, under our contract, data availability was limited to rolling 1-month lookback periods. Other restrictions motivated by privacy and imposed by the data owner include sampling only high-frequency items, limiting results to the top 10 items in a category, and masking results in which fewer than 100 Facebook users mentioned a term or shared a link. As a result, none of our results were able to provide a full view of the frequency distribution.
Neither we nor other researchers can subsequently return to historical periods beyond examining reports that were downloaded contemporaneously. Similarly, only a 30-day rolling period of aggregated data is made available by Facebook, Datasift, and Sysomos. This clearly further limits replication and error checking. For research purposes, while substantial information abides, much is lost during this process. This weakness does not seem to be one that will be alleviated, given legitimate concerns regarding online privacy [
Finally, while our research is internally valid, the extent to which it is externally applicable is not known. The particular month of data we looked at was almost immediately after a polarizing general election in the United States, in which health-related conversations (eg, Affordable Care Act, Planned Parenthood, women’s right to choose) were widely occurring. In other months, there might have been fewer mentions of breast cancer screening terms. Our research also explicitly required women to have access to the Internet, be a member of Facebook, and use English in their interactions. There are clearly large parts of US society in which one or more of these requirements are not met.
Future researchers may exploit other less coarse methods for obtaining online social media and social network data. Companies operating online survey panels such as Knowledge Networks, Inc [
Examining novel data from the universe of mentions on the leading online social network regarding breast cancer screening-related terms provided an important but superficial and initial look at topics of great interest among all female Facebook users over 1 month. More work is needed using this novel data source and applying its insights to solving pressing public health problems, including the inadequate screening for breast cancer.
Categorization of top 10 links by interaction type and by age category.
None declared.