Exploring the Health-Related Quality of Life of Patients Treated With Immune Checkpoint Inhibitors: Social Media Study

Background Immune checkpoint inhibitors (ICIs) are increasingly used to treat several types of tumors. Impact of this emerging therapy on patients’ health-related quality of life (HRQoL) is usually collected in clinical trials through standard questionnaires. However, this might not fully reflect HRQoL of patients under real-world conditions. In parallel, users’ narratives from social media represent a potential new source of research concerning HRQoL. Objective The aim of this study is to assess and compare coverage of ICI-treated patients’ HRQoL domains and subdomains in standard questionnaires from clinical trials and in real-world setting from social media posts. Methods A retrospective study was carried out by collecting social media posts in French language written by internet users mentioning their experiences with ICIs between January 2011 and August 2018. Automatic and manual extractions were implemented to create a corpus where domains and subdomains of HRQoL were classified. These annotations were compared with domains covered by 2 standard HRQoL questionnaires, the EORTC QLQ-C30 and the FACT-G. Results We identified 150 users who described their own experience with ICI (89/150, 59.3%) or that of their relative (61/150, 40.7%), with 137 users (91.3%) reporting at least one HRQoL domain in their social media posts. A total of 8 domains and 42 subdomains of HRQoL were identified: Global health (1 subdomain; 115 patients), Symptoms (13; 76), Emotional state (10; 49), Role (7; 22), Physical activity (4; 13), Professional situation (3; 9), Cognitive state (2; 2), and Social state (2; 2). The QLQ-C30 showed a wider global coverage of social media HRQoL subdomains than the FACT-G, 45% (19/42) and 29% (12/42), respectively. For both QLQ-C30 and FACT-G questionnaires, coverage rates were particularly suboptimal for Symptoms (68/123, 55.3% and 72/123, 58.5%, respectively), Emotional state (7/49, 14% and 24/49, 49%, respectively), and Role (17/22, 77% and 15/22, 68%, respectively). Conclusions Many patients with cancer are using social media to share their experiences with immunotherapy. Collecting and analyzing their spontaneous narratives are helpful to capture and understand their HRQoL in real-world setting. New measures of HRQoL are needed to provide more in-depth evaluation of Symptoms, Emotional state, and Role among patients with cancer treated with immunotherapy.


Introduction
Health-related quality of life (HRQoL) is a complex subjective concept pertaining to multiple domains including physical, emotional, social, professional, and functional well-being [1,2]. The number of cancer cases is continuing to rise across Europe with approximately 8% of people currently living with cancer [3] and with the side effects of cancer treatment (eg, hair loss, pain, fatigue, nausea) [4]. Patients and their families also often experience psychological distress as a result of a cancer diagnosis and its treatment, including stress, anxiety, and depression [5]. Therefore, HRQoL is particularly important to understand and to be quantified among people living with cancer or caring for someone with cancer. Several cancer-related HRQoL self-administered questionnaires, such as the European Organization for Research and Treatment of Cancer Quality of Life Questionnaire Core 30 (EORTC QLQ-C30) [6] and the Functional Assessment of Cancer Therapy -General (FACT-G) [7], have been used and validated in oncology patient populations, but are limited in that they are required to be complete at a predefined time.
Recently, the internet has provided an opportunity for introducing new sources of clinically relevant information related to patients' perception of illness and its burden [8]. Specifically, patients and their relatives are increasingly using forums, blogs, and social media to obtain health-related information and support [9]. Indeed, the information generated online represents an alternative way to understand patients' and relatives' health state compared with self-administered questionnaires. These patient-generated health data are produced spontaneously-thus not limited to medical consultations, for instance-mostly anonymously, and may better correspond to patients' and relatives' feelings than close-ended questions.
Moreover, text mining techniques applied to analyze social media data can be used with relative ease [10] and have opened up new opportunities to bridge the gap between qualitative and quantitative data [11].
In parallel to the development of new strategies for collecting information on HRQoL of patients with cancer, treatment with immunotherapy has emerged as an innovative curative approach that, instead of destroying cancer cells directly, stimulates the immune system making it capable of identifying and selectively attacking cancer cells. The body develops an internal defense mechanism which can lead to reduced side effects and improved HRQoL [12]. Recent studies have assessed HRQoL of patients undergoing immunotherapy using standard questionnaires, but exclusively only in clinical trial settings [13][14][15][16][17]. In addition to prolonged survival, HRQoL results showed that immune checkpoint inhibitors (ICIs), such as nivolumab, maintained or improved baseline HRQoL levels in patients with advanced melanoma [18], advanced renal cell carcinoma [19], or advanced squamous non-small-cell lung cancer [20]. However, HRQoL of patients treated with ICI remains largely unknown under real-world conditions and it is not known whether the EORTC QLQ-C30 and the FACT-G questionnaires capture the full range of HRQoL domains and experiences relevant to these specific patients. The use of social media data offers a novel opportunity to explore this.
This study assessed and compared conceptual coverage of ICI-treated patients' HRQoL between standard questionnaires and users' ICI-related experiences described in social media. We hypothesized that, given the evolving and dynamic nature of the HRQoL concept, new HRQoL subdomains would emerge from social media posts going beyond the coverage of existing questionnaires, especially in a population of patients treated with new drugs such as ICI.

Study Design and Population
This was a retrospective study using a text mining approach to retrieve information from social media posts written by French language internet users between January 1, 2011, and August 31, 2018. The start date of January 2011 was selected because it corresponds to the date of marketing authorization for selling the first available immuno-oncology treatment in France. Included posts (comprising forum posts or comments on videos or photos) had to mention past or current patients' experience with ICI (ie, ipilimumab, nivolumab, pembrolizumab, atezolizumab, or those available through early access schemes or clinical trials, such as durvalumab, tremelimumab, and avelumab). Posts referring to treatments other than ICI were excluded. Posts could be authored by patients themselves or by their relatives, here interchangeably referred to as patients. Posts were retrieved from the following social media: 12 generic French medical web forums; 4 cancer-specialized French medical web forums; and 3 generic social media (Facebook, YouTube, and Instagram). These social media were screened from the Detec't database [21]. Description of forums is provided in Multimedia Appendix 1. All posts had to be publicly available and include at least one of the predetermined keywords with their synonyms (see the list in Multimedia Appendix 2). Ambiguous posts or duplicates (ie, similar posts posted by the same user in different social media) were excluded through a manual review.
according to the HTML structure of each forum. All posts containing one of the predefined keywords were automatically retrieved from discussion threads and deidentified (signature and quote withdrawal). The deidentification of posts was performed by using an in-house algorithm based on regular expression to automatically identify specific sequences of characters (proper names, phone numbers, postal codes, mail addresses, etc). For specialized French medical web forums, posts containing predefined keywords were identified using a Google operator searching these keywords in selected forum websites (Site: +URL). Identified discussion threads were manually explored, and posts containing one of the predefined keywords were manually retrieved and deidentified. Finally, for the 3 generic social media, posts were identified and retrieved by manually searching the predefined keywords using the social media embedded search fields. These posts were also manually deidentified. Data from these 3 collection methods were then grouped in a unique data set (the analysis corpus), which went through several steps of cleaning (preprocessing by removing French accents, unnecessary spaces, and punctuation and lowercasing all words; removal of stop words; and stemming, based on Porter's algorithm [23]) and formatting (transformation of the corpus into a matrix; creation of tokens and measure of their frequency; exclusion of hardly used, ambiguous, and misspelled words; and document-term matrix weighting; Figure 1).

Study Variables
The analysis corpus contained information on data source (name of the forum or social media), post characteristics (URL of the page or discussion where the post was published; date of the publication; pseudonym or alias of the user; keyword associated with the post leading to its extraction), patient characteristics (age, gender, type of cancer), and user status among "patient," "relative," and "unspecified." Regarding content of retrieved posts, for each patient "associated HRQoL domains" and "associated subdomains" were collected manually. "Associated HRQoL domains" were grouped considering the classification of domains provided in existing measures (ie, the EORTC QLQ-C30 and the FACT-G). For each domain, "associated subdomains" mentioned by patients were also collected even if not included in standard measures in order to allow new subdomains to emerge.

Comparison With HRQoL Standard Questionnaires
The EORTC QLQ-C30 [6] and the FACT-G [7] are among the most widely used questionnaires to capture HRQoL of patients with cancer in research and clinical settings [24,25]. Compared with other questionnaires, they are not limited to a specific cancer type [26,27] and cover the highest number of HRQoL domains across all cancer-specific questionnaires [28]. Both questionnaires were first developed in 1993 [29] and are validated in the French language [30,31].
The EORTC QLQ-C30 is a 30-item questionnaire composed of multi-item scales and single items, and comprises 5 functional scales (physical activity, role, emotional state, cognitive state, and social state), 3 symptoms scales (fatigue, nausea and vomiting, and pain), and a global health status and HRQoL scale. The remaining single items assess additional symptoms commonly reported by patients with cancer: dyspnea, insomnia, lack of appetite, constipation, diarrhea, and financial difficulties. The FACT-G is a 27-item questionnaire divided into 4 well-being subscales: physical well-being, social/family well-being, emotional well-being, and functional well-being [32]. Coverage of domains identified in social media and related subdomains measured by the EORTC QLQ-C30 and the FACT-G was manually assessed independently by 2 operators and compared through a concept mapping approach [33].

Data Analysis
Each post corresponded to a statistical unit. Frequentist analysis was performed on extracted posts to characterize the whole analysis corpus through the following indicators: number of posts, number of patients, occurrence of HRQoL domain(s), keywords (including the number of extracted posts), data source, and users' characteristics. A Venn diagram was generated through the CRAN package "VennDiagram." The list of identified subdomains, including the number of patients and posts, was presented in a descriptive format using Microsoft Excel. For each subdomain, coverage by one or several items of the 2 questionnaires was assessed. For each domain, coverage rates were calculated by dividing the sum of subdomain occurrences covered by questionnaires by the total number of occurrences in the social media. The diagram indicating the coverage rates of each HRQoL domain retrieved from social media posts through the standard questionnaires EORTC QLQ-C30 and FACT-G was generated through Microsoft Excel.

Description of the Population and Posts
The final analysis corpus included 267 social media posts meeting the inclusion criteria, with a maximum of 11 posts from 1 patient and a median of 2 posts per patient. Through the manual extraction, we identified 150 patients (posters) who described their personal experience with ICI (89/150, 59.3%) or that of their relative (61/150, 40.7%). A majority of patients were women (82/150, 54.7%) and gender was undetermined for only 8/150 patients (5.3%). The type of related cancer was identified for 123/150 patients (82.0%): the most frequent cancers were lung cancer and melanoma ( Table 1). The majority of posts were retrieved from 1 cancer-specific patient forum (La ligue contre le cancer, 78 posts by 43 patients) and 1 generic medical forum (Doctissimo, 76 posts by 43 patients). The most frequently identified keyword was "immunotherapy" (72/150, 48.0%) followed by "nivolumab" (31/150, 20.7%) and "ipilimumab" (20/150, 13.3%).  (12/22, 55%). Only 13 patients mentioned Physical activity and the most reported subdomains pertaining to this larger domain were minimal or no physical activity/maintained activity (5/13, 38%) and difficulty walking/eating (5/13, 38%). Professional situation was mentioned by 9 patients with the most frequent subdomain being sick leave (6/9, 67%). Two subdomains were mentioned for the Cognitive state and the Social state, respectively.   "Hello, we started immunotherapy 3 weeks ago, we didn't combine it with [Drug], his health is deteriorating more and more. Yesterday he was re-hospitalized...I don't think his condition could get any worse...I strongly hope he gets better, that he can eat and move..."

Social state
"On the first scan, increased lung metastases...but no new ones, so that's something. (...) We won't do much for the holidays. Until recently we used to go to my in-laws, there were a lot of us, but since my mother-in-law died and since I'm no longer in Olympic shape, we haven't been moving around!" No longer participates in family parties a HRQoL: health-related quality of life.

Coverage of HRQoL Domains in Social Media by the QLQ-C30 and FACT-G Questionnaires
As shown in Table 3, Global health was entirely covered by both the QLQ-C30 and the FACT-G. Physical activity, Professional situation, Cognitive state, and Social state were also fully covered by the QLQ-C30. For Symptoms, the EORTC QLQ-C30 covered a majority (68/123, 55.3%) of the subdomains identified in social media posts, and so did the FACT-G (72/123, 58.5%). Coverage was lower for the Emotional state domain: 14% (7/49) by the EORTC QLQ-C30 and 49% (24/49) by the FACT-G. Finally, the EORTC QLQ-C30 covered 77% (17/22) of subdomains for Role, whereas the FACT-G covered 68% (15/22). Of these domains, the FACT-G fully covered only the Professional situation, whereas it covered 46% (6/13) of the Physical activity, with no coverage of neither Cognitive state nor Social state. Specific subdomains which were not covered by both the EORTC QLQ-C30 and the FACT-G were fever, cough, rash/itch, headache, thyroid disorders, heavy legs, and hair loss for the Symptoms domain; exhaustion, distress, psychological disorders, stable health, emotional exhaustion due to side effects, and isolation for the Emotional state domain; and time constraints regarding medical care and ability to drive again for the Role domain ( Figure 3).

Principal Findings
This novel approach of using social media identified that some of the content posted by patients with cancer and caregivers with experience of ICI overlaps with concepts captured in the 2 most frequently used HRQoL questionnaires. However, the main findings also included the fact that there are a large number of concepts which are not captured in these 2 HRQoL questionnaires. These results confirmed our hypothesis by underlining the emergence of new subdomains of HRQoL in patients treated with immunotherapy. In particular, we observed that retrieved social media posts frequently addressed specific subdomains of the HRQoL domains of Symptoms, Emotional state, and Role, which are not fully covered by the EORTC QLQ-C30 and the FACT-G. Reasons for not including these subdomains in standard questionnaires are varied. First, questionnaires such as the EORTC QLQ-C30 and the FACT-G are designed to be short, effective, and time-saving, thus reducing burden on patients completing these questionnaires. As a consequence, they are limited and do not cover the whole range of the issues impacting patients' HRQoL. Second, these questionnaires were developed more than 25 years ago, which may explain why the domains might not fully capture the impact of recently developed therapies, such as ICI. The concept of HRQoL is evolving and several users have already raised the problem of "partial covering" [33], which relates to the complexity of measuring this broad ranging concept through a robust methodology [34]. Third, subdomains retrieved by our study might be specific to immunotherapy and, therefore, not measured by generic standard questionnaires. Finally, accurately quantifying an individual's HRQoL is, per se, a debated question, because standardized questionnaires might restrict a patient's choice and limit their spontaneity, thus not using a patient-centered approach [35].
We observed a remarkable usage in the occurrences of the keyword "immunotherapy" which likely results from the growing availability of ICI, the steadily increased use of ICI in the French health care system, and, in parallel, the growing proportion of social media users.

Comparison With the Literature on HRQoL and Immunotherapy
The conceptual and psychometric measurement properties of the EORTC QLQ-C30 and the FACT-G have not yet been systematically examined in ICI-treated patient populations and the results of this study cast some doubt on the content validity of these measures in ICI-treated patients. Indeed, content validity is the extent to which an instrument measures the important aspects of concepts most significant and relevant to a patient's condition and its treatment [36]. Because there are subdomains which do not appear in the 2 questionnaires that participants completed, we can assume that the 2 questionnaires lack content validity for this specific patient population. This does underline the need for new or adapted patient-reported outcomes in patients treated with immunotherapy.
Existing studies using these questionnaires in patients with cancer treated with immunotherapy are still often limited to research settings [17,18,37]. For instance, the work by Long and colleagues [18] has demonstrated that the use of nivolumab maintained baseline HRQoL levels to provide long-term quality of survival benefit among 418 patients with advanced melanoma. Cella and colleagues [19] have confirmed the association between nivolumab treatment and HRQoL improvement using the FACT-G among 847 patients with advanced renal cell carcinoma.
Although HRQoL has already been assessed with standard questionnaires in a number of clinical trials of immunotherapy, covered domains were pre-established and limited. Instead, in our study various new spontaneous subdomains emerged, such as fever, time constraints of treatment, difficulty in driving, and isolation. Furthermore, certain subdomains covered by the questionnaires might be inadequately designed for patients treated with ICI. For example, in the FACT-G, hope is collected in a negative way (ie, "I am losing hope in the fight against my illness"), whereas social media users mainly referred to hope in an optimistic way (ie, "regaining hope").

Limitations of the Study
This study was not without limitations. First, selection bias was a major limitation because analyses were restricted to selected data sources and available contents. The population under study was composed of social media users who might not necessarily reflect the characteristics of all patients with cancer receiving immunotherapy. However, because social media are increasingly used by patients [38], especially in France [39], retrieved posts should pertain to an important section of the French population. We also collected data from relatives who provided immunotherapy-related experiences of patients with cancer who were not active on social media. However, given the small number of patients and relatives in our sample, we were not able to distinguish their posts within the analysis corpus. Our results should then be interpreted considering this further limitation. HRQoL self-assessed by patients might be different from the evaluation provided by relatives, as shown in previous research [40]. Similarly, the limited size of our analysis corpus did not allow an analysis per type of cancer. Melanoma, for instance, has long been treated with immunotherapy [41], which means that HRQoL of patients with melanoma might be different from HRQoL related to other cancers.
Second, an extraction bias is also possible because we only considered posts containing predefining keywords. If users expressed their experiences with immunotherapy and consequent impact on HRQoL by using other nonspecific words, their posts were not included in the final analysis corpus. To mitigate this bias, the set of keywords was as comprehensive as possible.
Third, because this study was based on secondary use of data published in social media, it was impossible to get additional data and information from patients (only identified by their pseudonym). The analysis was then restricted to what users mentioned, which can lead to missing data or incomplete capture of the patients' full experience. In particular, subdomains emerging from social media were spontaneously addressed by users versus items from standard questionnaires. The fact that some items of the EORTC QLQ-C30 or the FACT-G were not mentioned in social media posts does not mean that patients were not concerned by them. For this, our conclusions should be considered with a certain degree of caution.
Fourth, because immunotherapy is a fairly recent therapeutic approach, few posts could be identified. A larger analysis corpus is needed to obtain more robust results and to validate our initial findings. As demonstrated in this study, the number of posts concerning immunotherapy is increasing year after year and new studies will benefit from this expanding analysis corpus. In particular, posts should be compared across countries where the EORTC QLQ-C30 and the FACT-G are usually administered to capture HRQoL. Exploration of potential cross-country differences in subdomains mentioned in social media would be noteworthy.
Fifth, social media represent an ideal place where patients can freely and spontaneously discuss their experiences with their therapy, thus providing valuable information on their HRQoL. However, this observation should be interpreted cautiously, because social media data may include a higher frequency of erroneous information, and patients posting on social media forums may not be representative of the wider patient population [10].
Finally, biases related to semantic analyses must be considered. Given the low number of posts within our analysis corpus, we were obliged to retrieve and code the mentions manually and could not apply automated analysis, for example, using topic modeling.

Implications and Future Research
We were able to include users' subjective narratives in the evaluation of the impact of ICI on patients' HRQoL. The results of our study suggest that commonly used measures such as the EORTC QLQ-C30 and the FACT-G may require updating to improve their coverage and applicability of HRQoL domains under real-world conditions. The challenge in measuring HRQoL lies in its uniqueness to individuals [35] and questionnaires such as the EORTC QLQ-C30 and the FACT-G might not take account of this by imposing standardized models of HRQoL. For this reason, as already demonstrated in studies concerning other diseases than cancer [42][43][44][45][46], posts in online forums and social media should be integrated in the assessment of patients' HRQoL, because they can help either detect adverse events or characterize patient experience in a more individualized and spontaneous way.
In summary, this study suggested to explore further specific HRQoL domains related to patients treated with ICI to potentially enrich existing standard questionnaires with new items that are more relevant for these patients in their daily confrontation with disease and treatment.

Conclusion
Patients with cancer and their relatives are using social media to share their experiences with immunotherapy and its impact on HRQoL, particularly with regard to Global health and Symptoms. Emotional state and Role are also increasingly referenced in online forums and social media. Collecting and analyzing these spontaneous narratives can be helpful to capture how immunotherapy affects patients' HRQoL in a more individualized way, thus obtaining information on more facets of life that are important for patients. While standard questionnaires can provide objective scores, which are easily interpretable from a clinical and research point of view, mining social media posts might better inform health care professions and patients of the impact of immunotherapy on patients' HRQoL under real-world conditions. Future research is required to corroborate our findings and propose new individualized measures covering HRQoL more in depth than existing standard questionnaires.