This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included.
Patient-reported outcome (PRO) measures describe natural history, manage disease, and measure the effects of interventions in trials. Patients themselves increasingly use Web-based PRO tools to track their progress, share their data, and even self-experiment. However, existing PROs have limitations such as being: designed for paper (not screens), long and burdensome, negatively framed, under onerous licensing restrictions, either too generic or too specific.
This study aimed to develop and validate the core items of a modular, patient-centric, PRO system (
Thrive was developed in 4 phases, largely consistent with Food and Drug Administration guidance regarding PRO development. First, preliminary core items (common across multiple conditions:
Cognitive interviews helped to remove confusing or redundant items. Empirical testing of subscales revealed good internal consistency (Cronbach alpha=.712-.879), test-retest reliability (absolute intraclass correlations=.749-.912), and convergent validity with legacy PRO scales (eg, Pearson r=.5-.75 between Thrive subscales and PHQ-9 total). The finalized instrument consists of a 19-item core including 5 multi-item subscales:
Thrive appears to be a useful approach for capturing important domains for patients with chronic conditions. This
Patient-reported outcomes (PROs) are reports of health status that come directly from the patient and are typically captured via a questionnaire that has been developed with clearly defined methods, provides proof of validation, and has instructions for use [
In addition to their use in trials, a subset of (mostly specialist) clinics deploy PROs during routine clinical practice to help monitor patient symptoms and functioning and to assist with decision making. The incorporation of PROs into electronic medical records is likely to accelerate this trend [
Whereas other medical tools such as continuous glucose monitors were once the preserve of specialist clinics to check on patient compliance, today people with diabetes themselves are using these tools and integrating them into self-coded apps and jury-rigged mechanisms to develop their own
That was part of the motivation behind the development of the online community PatientsLikeMe, which was first founded in 2005. One feature of the site allows people living with amyotrophic lateral sclerosis (ALS) to access a patient-reported version of the clinician-reported outcome (ClinRO) used in clinical research to characterize patient function, the ALS functional rating scale revised (ALSFRS-R [
However, as PatientsLikeMe expanded to other conditions such as multiple sclerosis (MS), Parkinson disease (PD), HIV, mood disorders, fibromyalgia, epilepsy, autism spectrum disorder, and organ transplants, it became clear that the state of PRO development was highly uneven across these conditions. While some PROs focused on symptoms and pathological elements of disease, others focused on the impact of the condition, treatment side effects, or broader concepts such as HRQoL. As standards on the quality of PRO development (such as the Food and Drug Administration’s (FDA’s) guidance for industry on PRO development in labeling [
Adapting what we felt were the best approaches from the PRO field, we sought to develop a
Methodologically, we sought to conform (to the extent possible) with the FDA’s Guidance for Industry for PRO development [
Developing a conceptual framework and the preliminary item pool through literature review and expert input
Cognitive debriefing of draft items with participants
Revising these items and framework accordingly
Collecting data and evaluating psychometric properties (such as rating scale functioning, reliability, convergent validity, ability to detect change, and bias)
Modifying the instrument based on results of the empirical evaluation
Collecting data and analyzing psychometric properties of the revised instrument
Finalizing the instrument and scoring
Issues identified by the team for patients with the patient-reported outcome status quo.
Issues for patients | Example in existing PROsa | Implications | Proposed solution | Implementation in Thrive |
PROs ignore comorbidity | For example, SF-36b does not contain important domains for a specific chronic condition, whereas condition-specific instruments are unclear on how user should dissociate primary condition from comorbidities | Typical PatientsLikeMe user has a median of 3 moderate-serious medical conditions; fielding additional PROs for each condition dramatically increases burden and redundancy | Core Thrive items asked of all users; curated set of additional symptoms, abilities, and thriving items fielded according to reported conditions | Core Thrive item asks separately about impact of each condition and comorbidity independently, for example, “Parkinson’s impact=a lot” but “Eczema=not at all” |
No personalization for the individual | Redundant questions, for example, pregnancy in males. At best, there are instructions to skip irrelevant questions (eg, “If no, skip to 12”) | Patients wade through the same clumsy skip logic instructions (or irrelevant questions) over and over again | Let patients specify once that something is not relevant and remember that in the future | Option of “Stop asking me this” checks why patient wants to skip and asks if we can assume the last answer given will continue being the same |
Large number of questions | For example, autism treatment evaluation checklist contains 78 items | Takes a long time to complete (approximately 10 seconds per item) and may cause drop-off | Ask as few questions as possible | Review of literature and patient-submitted data to identify most common issues |
Long question stems and responses | Parkinson disease rating scale requires reading 1456 words | Difficult to read on mobile screens, may require scrolling, risks biasing answers | Use brief, active voice items and consistent response scales rather than longer text-anchored responses | Items are Likert-style unipolar responses |
Negative framing | For example, Beck Depression Inventory: “(0) I don't feel disappointed in myself (1) I am disappointed in myself (2) I am disgusted with myself (3) I hate myself” | Fails to identify, for example, users who feel good about themselves; ignores islands of resilience and important self-expression for users; not appealing to use repeatedly | Frame items in a positive or at least neutral way when possible | Abilities stem asks, “how well could you” and Thriving stem asks, “how often could you” |
Variable or unclear recall periods | Recall periods may be missing, “past week” vs “past 7 days”, or very long, for example, past 12 months or “since you were diagnosed” | Different user needs require different recall periods | Codify and test different response periods flexibly, that is, “In the past <recall period> how well could you <activity>?” | Initial validation study developed with “last month” recall period but future work will test other recall periods |
Potentially sexist items | For example, fibromyalgia impact questionnaire focuses on disease preventing patient from doing shopping, laundry, and housework | Risks offending users. Also ignores modern options such as home grocery delivery | Avoid making assumptions about how people live their lives with or without illness | Provide general role function items, for example, “responsibilities” or “personal needs” rather than specific chores |
Anachronistic items | For example, adolescent systemizing spectrum quotient asks about “programming a video recorder” | Unclear how users will interpret such items; potential for user frustration | Focus on personally defined impact of condition rather than task completion | Use |
Confusing scores and directionality across conditions PROs | For example, scores such as the ALSFRS-R have an arbitrary range 0-48, Unified Parkinson’s Disease Rating Scale is 0-199; sometimes higher is worse, sometimes lower | Difficult for patients to understand meaning; conveys false sense of an interval or ratio level scale | Use a score based on a more relatable frame of reference, for example, 0-10 | 10-point scales are more familiar |
aPRO: patient-reported outcome.
bSF-36: short-form 36 questionnaire.
Issues identified by the team for professionals with the patient-reported outcome status quo.
Issues for professionals | Example in existing PROsa | Implications | Proposed solution | Implementation in Thrive |
Incomplete documentation | Most instruments lack detailed instructions for missing data | Unclear how to score, where more validation work is needed, whether items contain bias | Digitize and share item-level response characteristics through data repositories | Work in progress |
Onerous licensing restrictions | For example, license-holders of |
Risk of litigation restricts innovation. Digital health practitioners may need to adapt licensed instruments to their own needs without wanting to revalidate entire instrument. | All PROs should be licensed under |
All Thrive items and supporting documentation are licensed under Creative Commons ShareAlike 4.0 |
aPRO: patient-reported outcome.
Each phase of the instrument development and validation study is presented in temporal sequence below (
Participants were recruited from the membership of PatientsLikeMe.com, an online community for patients living with chronic illness. Potential members are made aware of the site through a variety of channels including Web-based advertising, nonprofit partners, word of mouth, and search. Members join the site with a goal to find other patients like them, track their condition over time, and to benefit from the shared experiences of other members like them [
On request for ethical independent review board, this research was exempted from further ethical review by the New England Independent Review Board as a minimal risk study (WO 1-2559-1).
A literature search was conducted to guide the development of a preliminary conceptual model and item generation. Consistent with widely regarded conceptual models [
Overview of validation process, adapted from the Food and Drug Administration (2009) guidance for industry. PLM: PatientsLikeMe.
Each of these aspects was considered when developing the initial item pool to ensure that the final Thrive core items adequately captured HRQoL. In particular, we were influenced by the Patient-Reported Outcomes Measurement Information System (PROMIS) Domain Framework [
A PRO instrument consists of instructions, items (which incorporate a recall period), and the items’ response options. Given the focus on chronic health conditions, we settled on a
Cognitive interviews were conducted to gather qualitative feedback regarding the preliminary items and to establish content validity. A total of 2 interviewers trained in cognitive interviewing procedures completed the interviews individually with participants over the phone. Interviews were not audio-recorded and lasted approximately 90 min. Retrospective probing was used to enhance realism [
As one of the main objectives was to create a system that would replace the legacy PROs on the PatientsLikeMe website, to ensure that the items were reviewed by a diverse patient group living with chronic health conditions who were representative of our most populated communities, members of PatientsLikeMe who met the following study inclusion criteria were invited to participate:
Reported a primary condition of ALS, PD, multiple sclerosis (MS), major depressive disorder (MDD), generalized anxiety disorder (GAD), or posttraumatic stress disorder (PTSD)
Aged 18 years or older
Primarily resided in the United States
Following cognitive interviews, the draft core Thrive items were programmed in PatientsLikeMe’s research survey tool (RST) and administered along with validated comparison measures (PHQ-9 and the Medical Outcomes Study SF-20) to patients with chronic medical conditions (
During both rounds, participants were asked to complete assessments at 3 timepoints:
(Administration 1) Thrive + comparator measures, baseline
(Administration 2) Thrive only: 3 days after Administration 1, for evaluating stability
(Administration 3) Thrive + comparator measures: 30 days after Administration 1, for evaluating ability to detect change over time
The PHQ-9 is a 9-item self-report measure of depression based on the Diagnostic and Statistical Manual, Fourth Edition diagnostic criteria [
SF-20 is a brief self-report health survey that captures 6 health concepts: physical functioning, role functioning, social functioning, health perceptions, pain, and mental health [
Inspired by the Guy’s Neurological Disability Scale [
The ALSFRS-R is one of the most widely used instruments to capture ALS disease progression [
The PatientsLikeMe-QoL is intended to capture HRQoL related to physical function, mental distress, and social functioning over the past 30 days. This instrument has exhibited high internal consistency and convergent validity [
The target N for each patient group at administration 2 (3-day retest) was 100. The sample size of 100 was derived from a power analysis to detect a significant difference between an intraclass correlation coefficient of .80 (within the acceptable level) and 0.69 (below the acceptable), assuming 80% power. Specifically, a sample size of 100 would detect whether the CI of the reliability coefficient includes values below the accepted reliability threshold (Rxx=.70) 80% of the time. Notably, because of difficulties with achieving a sufficient sample size during Round 2, results were not evaluated separately by patient group.
Adult (18 years or older) PatientsLikeMe members primarily residing in the United States who reported a primary condition of ALS, MS, PD, MDD, GAD, PTSD, or SLE were sent an invitation to participate through the PatientsLikeMe platform. The following information is reported in accordance with the Checklist for Reporting Results of Internet E-Surveys (CHERRIES) checklist [
On March 23, 2017, 20,941 PatientsLikeMe members fitting the inclusion criteria mentioned above were invited to the Round 1, baseline survey; this survey remained open until April 10, 2017. Participants who did not complete this survey were sent 1 reminder message 3 days after the invitation. Those who completed the survey were automatically sent an invitation to administration 2 three days after completion of administration 1. Administration 2 was open for the same time period as the baseline survey. Those who completed the Round 1 baseline survey were invited to a 30-day retest (administration 3) on May 2, 2017, which remained open until May 10, 2017.
For the second round of the surveys, 12,460 participants were sent invitations on June 15, 2017, to the Round 2 baseline survey, which remained open until July 6, 2017. Reminders and the 3-day test/retest invitation were sent in a manner identical to that of Round 1; Round 2-administration 2 was also open from June 15, 2017, to July 6, 2017. Those who completed the Round 2 baseline were invited to a 30-day retest on July 25, 2017 which remained open until August 10, 2017. All numbers pertaining to Round 1 and Round 2 are reported in the results section.
Psychometric validation is an iterative process that is driven by both theoretical and empirical support; therefore, the Thrive research team provided input and feedback during each step of the validation process. Thrive was evaluated using both classical and modern test theory approaches, including evaluation of: rating scale functioning, dimensionality, person-to-item targeting, bias (gender [male, female], race [white, nonwhite], condition [neurodegenerative, autoimmune relapsing, psychiatric]), internal consistency reliability, test-retest reliability, convergent validity, and ability to detect change using longitudinal data. The primary purpose of the first round of testing was to explore item functioning and to make revisions as necessary before the second round. Analytic procedures for this second round of testing were largely consistent with those utilized in Round 1. Readers are referred to Bond and Fox [
Twelve participants completed the first round of cognitive interviews. Participants (75% [9/12] female) reported primary diagnoses of MS (33% [4/12]), fibromyalgia (17% [2/12]), GAD (8% [1/12]), MDD (8% [1/12]), ALS (8% [1/12]), bipolar disorder (8% [1/12]), and SLE (8% [1/12]). As cognitive interviews were being conducted, the interviewers regularly met together and with the research team to discuss participant feedback with the goal of identifying recurring themes. Participants identified several items that had redundant content, were too vague and caused confusion, or that they felt were not important for purposes of monitoring their health. Several items were removed or revised based on participants’ suggested rewordings to increase clarity, response options were modified to enhance consistency or reduce confusion, and the recall period was made consistent across items. For example, when probed about a coping question (“How well could you cope over the last month?”), participants expressed confusion (eg, “Cope with what?”) and felt that one’s ability to cope and deal with life stressors was already covered by other items. Similarly, response options of several items were modified for consistency and to reduce confusion. For example, the question wording “How well could you see yourself as a worthwhile person over the last month?” was changed to “Over the last month, how often did you see yourself as a worthwhile person?”
A few respondents wanted to express more detail about pain or sleep, which were issues of particular concern for them. As this core instrument is meant to be applicable to all PatientsLikeMe members, the research team decided to revisit further detail on those issues as future modular additions to the instrument.
A second round of cognitive interviewing was conducted to evaluate the revised content. A total of 2 participants (1 male) completed the second round of cognitive interviews. These participants reported primary diagnoses of bipolar disorder and SLE. Participants provided relatively similar and positive feedback about the items. This feedback was communicated back to the research team and minor revisions to the survey were made.
Consolidated Standards of Reporting Trials (CONSORT) flow diagrams are presented in
Demographic and clinical characteristics of this sample are presented in
The purpose of Round 1 was to explore item functioning and to make revisions as necessary before the second round. A summary of results from Round 1 can be found in
Of the 12,460 participants who were sent an invitation to participate, 887 responded to the invitation by clicking on the survey link, and 79.4% of these participants (N=704) completed the Round 2 baseline survey; 239 completed the 3-day retest and 51 completed 30-day retest. Demographic and clinical characteristics of this sample are presented in
Results are presented by scale below and are summarized in
Empirical testing of subscales revealed good internal consistency (Cronbach alpha=.712-.879) and test-retest reliability (absolute intra class correlations=.749-.912). Cronbach alpha for the Sleep subscale was lower (Cronbach alpha=.712), probably owing to the lower count of items.
Convergent validity varied by domain. Correlations were highest between the
Analysis of longitudinal residualized change scores over 30 days found significant, but attenuated, patterns of correlation similar to the results of the convergent validity analysis. The strongest relationship (Pearson r=.496) was between the 2 item-Sleep scale (Falling asleep and Staying asleep) with the single-item PHQ-9 question.
Absolute agreement of responses across the 3-day test-retest period (n=239) suggested adequate stability (
Absolute agreement of responses to the Impact of Primary Condition item across the 3-day test-retest period was adequate (
Round 1 participant demographics.
Variable | Baseline | 3-day test-retest | 30-day retest | |
Participants (n) | 2002 | 924 | 717 | |
Age (years), mean (SD) | 54.9 (11.6) | 56.2 (10.7) | 56.0 (11.3) | |
Conditions, median (range) | 2 (1-58) | 2 (1-53) | 2 (1-58) | |
Male | 600 (30.0) | 290 (31.5) | 245 (34.2) | |
Female | 1399 (70.0) | 632 (68.5) | 471 (65.8) | |
Hispanic | 77 (4.0) | 31 (3.5) | 26 (3.8) | |
Non-Hispanic | 1831 (96.0) | 861 (96.5) | 665 (96.2) | |
Asian | 7 (0.4) | 1 (0.1) | 0 (0.0) | |
Black or African American | 86 (4.4) | 29 (3.2) | 23 (3.3) | |
Hawaiian | 3 (0.2) | 2 (0.2) | 2 (0.3) | |
Native American | 25 (1.3) | 10 (1.1) | 7 (1.0) | |
White | 1740 (89.6) | 821 (91.0) | 633 (90.3) | |
Mixed | 82 (4.2) | 39 (4.3) | 36 (5.1) | |
8th grade or less | 3 (0.2) | 0 (0.0) | 1 (0.1) | |
Some high school | 14 (0.8) | 8 (0.9) | 3 (0.4) | |
High school graduate | 175 (10.1) | 83 (9.6) | 66 (9.6) | |
Some college | 658 (38.1) | 305 (35.3) | 242 (35.4) | |
College | 498 (28.9) | 254 (29.4) | 202 (29.5) | |
Postgraduate | 378 (21.9) | 215 (24.8) | 170 (24.9) |
aPercentage does not include missing cases.
Round 2 participant demographics.
Variable | Baseline | 3-day test retest | 30-day retest | |
Participants (n) | 704 | 239 | 51 | |
Age (years), mean (SD) | 54.5 (11.8) | 54.8 (12.1) | 53.7 (12.7) | |
Conditions, median (range) | 1 (1-35) | 1 (1-27) | 1 (1-18) | |
Male | 189 (26.9) | 61 (25.6) | 15 (29) | |
Female | 514 (73.1) | 177 (74.4) | 36 (70) | |
Hispanic | 26 (3.9) | 7 (3.0) | 1 (2) | |
Non-Hispanic | 640 (96.1) | 226 (97.0) | 47 (97) | |
Asian | 3 (0.4) | 1 (0.4) | 1 (2) | |
Black or African American | 53 (7.8) | 13 (5.5) | 3 (6) | |
Hawaiian | 0 (0.0) | 0 (0.0) | 0 (0) | |
Native American | 6 (0.9) | 2 (0.8) | 0 (0) | |
White | 586 (86.3) | 214 (90.7) | 39 (81) | |
Mixed | 31 (4.6) | 6 (2.5) | 5 (10) | |
8th grade or less | 1 (0.2) | 0 (0.0) | 0 (0) | |
Some high school | 6 (1.0) | 3 (1.4) | 0 (0) | |
High school graduate | 81 (13.8) | 18 (8.6) | 7 (17) | |
Some college | 225 (38.5) | 94 (45.0) | 15 (37) | |
College | 160 (27.4) | 55 (26.3) | 12 (30) | |
Postgraduate | 112 (19.1) | 39 (18.7) | 6 (15) |
aPercentage does not include missing cases
Reliability estimates for surviving thrive scales.
Thrive scale (number of items) | Internal consistency reliability (Cronbach alpha; n=704) | Test-retest reliability (n=239) | |
Absolute ICCa | |||
Overall Health (1) | —b | .749 | <.001 |
Impact of Primary Condition (1) | — | .763 | <.001 |
Core Symptoms (5) | .815 | .909 | <.001 |
Mobility (1) | — | .898 | <.001 |
Sleep (2) | .712 | .833 | <.001 |
Abilities (5) | .853 | .912 | <.001 |
Thriving (4) | .879 | .889 | <.001 |
aICC: intraclass correlation coefficient.
bNot applicable.
Ability to detect change (Pearson correlations between Thrive and comparator instruments’ residualized change scores in longitudinal data, N=51).
Thrive scale item | Pearson r; |
||||||||||
PHQa-9 (n=704) | SFb-20 (n=704) General Health Item | SF-20 (n=704) Mental Health | SF-20 (n=704) Physical Functioning | SF-20 (n=704) Role Functioning | SF-20 (n=704) Health Perception | MSRSc (n=255) | PLM-QoLd (n=64) Physical | PLM-QoL (n=64) Mental | PLM-QoL (n=64) Social | ALS FRS-Re (n=60) | |
Overall Health (1 item) | —f | .813; <.001 | — | — | — | — | — | — | — | — | — |
Impact of Primary Condition (1 item) | .463; <.001 | — | −.445; <.001 | −.439; <.001 | −.443; <.001 | −.518; <.001 | .452; <.001 | −.573; <.001 | −.492; <.001 | −.477; <.001 | −.477; <.001 |
Core Symptoms (5 items) | .750; <.001 | — | −.759; <.001 | −.390; <.001 | −.392; <.001 | −.644; <.001 | .574; <.001 | −.698; <.001 | −.775; <.001 | −.675; <.001 | −.148; .26 |
Mobility (1 item) | — | — | — | .415; <.001 | — | — | −.471; <.001 | .687; <.001 | — | — | .423; <.001 |
Sleep (2 items) | −.562; <.001 | — | — | — | — | — | — | — | — | — | — |
Abilities (5 items) | −.744; <.001 | .708; <.001 | .478; <.001 | .520; <.001 | .671; <.001 | −.687; <.001 | .791; <.001 | .770; <.001 | .809; <.001 | .450; <.001 | |
Thriving (4 items) | −.743; <.001 | — | .780; <.001 | .342; <.001 | .378; <.001 | .626; <.001 | −.453; <.001 | .639; <.001 | .806; <.001 | .736; <.001 | .132; .32 |
aPHQ: Patient Health Questionnaire.
bSF: Short-Form General Health Survey.
cMSRS: multiple sclerosis rating scale.
dQoL: quality of life.
eALSFRS-R: amyotrophic lateral sclerosis functional rating scale-revised.
fNot applicable.
Ability to detect change (Pearson correlations between Thrive and comparator instruments’ residualized change scores in longitudinal data, N=51).
Variable | PHQa-9, total | PHQ-9, sleep item | SFb-20, general health item | SF-20, mental health | SF-20, physical functioning | SF-20, role functioning | SF-20, health perception | |||||||||||
r | r | r | r | r | r | r | ||||||||||||
Overall health | —c | — | — | — | .311 | .03 | — | — | — | — | — | — | — | — | ||||
Impact of primary condition | .404 | .003 | — | — | — | — | .352 | .011 | .091 | .53 | .099 | .49 | .276 | .05 | ||||
Core symptoms | .475 | <.001 | — | — | — | — | .485 | <.001 | .217 | .13 | .145 | .31 | .510 | <.001 | ||||
Mobility | — | — | — | — | — | — | — | — | .269 | .06 | — | — | — | — | ||||
Sleep | — | — | .496 | <.001 | — | — | — | — | — | — | — | — | — | — | ||||
Abilities | .190 | .18 | — | — | — | — | .125 | .384 | −.005 | .97 | .330 | .02 | .219 | .12 | ||||
Thriving | .356 | .01 | — | — | — | — | .389 | .005 | .027 | .85 | .058 | .69 | .041 | .78 |
aPHQ: Patient Health Questionnaire.
bSF: Short-Form General Health Survey.
cNot applicable.
Final core Thrive items.
Scale name (# of items) and |
Item content | Response options | |
Overall health | Over the last month, how has your health been? | 5=Excellent; 4=Very good; 3=Good; 2=Fair; 1=Poor | |
Condition impact | Over the last month, how much has your [primary condition] affected your life? |
0=Not at all; 1=A little; 2=Some; 3=A lot | |
Pain | Please rate the severity of any pain over the past month | 0=None; 1=Mild; 2=Moderate; 3=Severe | |
Depressed mood | Please rate the severity of any depressed mood over the past month | 0=None; 1=Mild; 2=Moderate; 3=Severe | |
Anxious mood | Please rate the severity of any anxious mood over the past month | 0=None; 1=Mild; 2=Moderate; 3=Severe | |
Fatigue | Please rate the severity of any fatigue over the past month | 0=None; 1=Mild; 2=Moderate; 3=Severe | |
Stress | Please rate the severity of any stress over the past month | 0=None; 1=Mild; 2=Moderate; 3=Severe | |
Walk | Over the last month, how well could you walk without support (such as a brace, cane, or walker)? | 4=Extremely well; 3=Very well; 2=Fairly well; 1=Poorly; 0=Not at all | |
Fall asleep | Over the last month, how well could you fall asleep when you wanted to? | 4=Extremely well; 3=Very well; 2=Fairly well; 1=Poorly; 0=Not at all | |
Stay asleep | Over the last month, how well could you sleep through the night? | 4=Extremely well; 3=Very well; 2=Fairly well; 1=Poorly; 0=Not at all | |
Think | Over the last month, how well could you think, concentrate, and remember things? | 4=Extremely well; 3=Very well; 2=Fairly well; 1=Poorly; 0=Not at all | |
Emotions | Over the last month, how well could you control your emotions? | 4=Extremely well; 3=Very well; 2=Fairly well; 1=Poorly; 0=Not at all | |
Personal needs | Over the last month, how well could you take care of your personal needs? | 4=Extremely well; 3=Very well; 2=Fairly well; 1=Poorly; 0=Not at all | |
Responsibilities | Over the last month, how well could you meet your responsibilities at work, school, or home? | 4=Extremely well; 3=Very well; 2=Fairly well; 1=Poorly; 0=Not at all | |
Social | Over the last month, how well could you participate in your favorite social and leisure activities? | 4=Extremely well; 3=Very well; 2=Fairly well; 1=Poorly; 0=Not at all | |
Good | Over the last month, how often did you feel good about yourself? | 3=All of the time; 2=Most of the time; 1=Some of the time; 0=None of the time | |
Meaning | Over the last month, how often did you find meaning in your life? | 3=All of the time; 2=Most of the time; 1=Some of the time; 0=None of the time | |
Connect | Over the last month, how often did you feel connected to others? | 3=All of the time; 2=Most of the time; 1=Some of the time; 0=None of the time | |
Wanted | Over the last month, how often did you feel able to live the life you wanted? | 3=All of the time; 2=Most of the time; 1=Some of the time; 0=None of the time |
A chi-square test demonstrated that the partial credit model (PCM [
Andrich thresholds were ordered, providing evidence that the items’ rating scales were functioning as expected [
Absolute agreement of responses to the Walk item across the 3-day test-retest period was good (
The PCM did not evidence significantly better fit than the RSM, so the RSM was used to evaluate rating scale functioning. Assumptions of the model were met, and results suggested that the rating scale was performing as expected. The items did not show evidence of DIF for gender, race, or condition (autoimmune relapsing, psychiatric, or neurodegenerative). Internal consistency was acceptable, and stability was good (
Of the Abilities items, 1 (“Over the last month, how well could you live the life you wanted to live?”) was removed because of conceptual redundancy with another item (“Over the last month, how often did you feel able to live life you wanted?”).
A global chi-square fit test demonstrated that the PCM fit significantly better than the RSM (
The items’ rating scales were functioning as expected, and examination of the person-to-item map suggests adequate coverage. The items did not evidence DIF for gender or race. However, results suggested the presence of DIF for the Think item between the neurodegenerative group and the autoimmune group, whereby this item was easier to endorse for patients with neurodegenerative conditions. Internal consistency was good, and stability was excellent (
A chi-square test demonstrated that the PCM fit significantly better than the RSM (
Next, for purposes of reducing the scale length, the research team utilized theoretical (review of item content) and empirical (person-to-item map, interitem correlations) rationale to identify items for removal. As a result, 4 additional Thriving items were removed (“Over the last month, how often did you feel confident that you could handle your life?,” “Over the last month, how often did you see yourself as a worthwhile person?,” “Over the last month, how often did you feel effective?,” and “Over the last month, how often did you feel you were thriving?”). Removing these items did not result in substantial loss of reliability (from a person reliability coefficient of .92 to a person reliability coefficient of .86). The remaining 4 items evidenced good person-to-item coverage and did not evidence DIF for gender, race, or condition.
Internal consistency and stability were good (
Scores for the multi-item scales (Core Symptoms, Sleep, Abilities, and Thriving) are calculated by taking the average of the items. Whether or not scores are calculated when data are missing depends on how the instrument is being used. For example, PatientsLikeMe members can complete Thrive on a monthly basis to track their functioning, and composites for the Thrive domains can be calculated with missing data so long as 80% of items are completed for each domain. Of course, calculating a score with missing items can increase measurement error. Therefore, whenever possible, patients should be encouraged to answer as many items as they feel comfortable answering.
PROs have the potential to move the locus of control in health care from institutions and professionals to patients themselves by enabling digital health tools that track and predict outcomes, alert their health team, support shared decision making, enable learning from their peer group, underpin systematic self-experimentation, and let them continually participate in research [
Following established best practice for instrument development [
During our interviews, patients consistently described
Health incorporates disease but is bigger. Health is the ability to enjoy life with minimal impact from your conditions. It’s feeling good about life and who you are. Thriving is even more than health...it’s looking forward to each day with desire...and feeling that life is good.
After reviewing the items, most participants interviewed agreed that the Thrive Core items regarding meaning, connectedness to others, self-esteem, and coping were best at reflecting what thriving meant to them.
Thrive contains a number of features designed to make it appealing for use in digital health. Using consistent items across multiple conditions is supportive of patients with multiple comorbidities. For example, a patient living with both PD and MDD only needs to complete information about shared domains (such as ability to sleep) once. By contrast, in our previous PRO model, a patient would have been asked to complete not only a Parkinson-specific measure (the PDRS) but also a mood-specific measure (the mood map) and a generic HR-QoL measure (PLM-QoL), with a number of additional symptoms. The burden of this battery of instruments (100 items with 3 different recall periods, 5 different response scales, and some 3252 words to read) is dramatically reduced by Thrive (19 core items plus 22 condition-specific questions [41 total] in 924 words across consistent response scales and recall periods). Question stems and response options are short and consistent, being optimal for use on mobile displays. When deployed on PatientsLikeMe, users have the option to respond “stop asking me this” for each item, which may be particularly useful for members with quadriplegia whose condition will not improve, those who feel emotionally
This study was subject to a number of limitations. Although the overall number of participants recruited was relatively large, it was a convenience sample from users of an online health community, had only a 9.5% completion rate from those invited, and there was a bias for participants to be more likely to be female and well-educated. There was significant attrition in both rounds of the 3-day retest and 30-day follow-up, which limited our ability to detect minimally important differences and may limit generalizability. Our sample was limited to English-speaking participants residing in the United States with a handful of chronic health conditions. All this limits generalizability to other populations and should be tested further. A larger, prospective, longitudinal study over a longer time course would have been preferable to establish minimally important differences and sensitivity to change. Although Thrive will be deployed with multiple items relating to both the
The number of cognitive interviews conducted was a
All participant data were self-reported rather than being independently validated, though previous studies suggest a high degree of agreement between patient self-report of diagnosis and confirmation via, for example, insurance claims [
Analysis of comparative validity suggests that although there are moderate-strong correlations with overlapping domains from other instruments, it is unlikely that the Core items represent complete
Work is already in progress to describe the development of condition-specific item banks that can be interspersed with the Thrive Core Items (
Sample additional items for 2 conditions based on health care professional review. MS: multiple sclerosis; PTSD: posttraumatic stress disorder.
Work with partners may also involve translation into other languages (such as Mandarin Chinese) and deployment through mobile messaging platforms (such as WeChat) as part of wellness apps. Finally, future work will consider the role of treatment side effects and treatment burden as key aspects of thriving despite illness [
Validation is a continuous and iterative process. This study describing the development and testing of the Thrive Core Set items is the first step on a path that includes replacing all the PROs on PatientsLikeMe, testing against putative biomarkers of disease progression, and deployment on third party digital health platforms. We hope Thrive will be a key resource in the digitization of human health to improve longevity and well-being for all.
Consolidated Standards of Reporting Trials flow diagrams.
Detailed round 1 psychometric analysis.
amyotrophic lateral sclerosis
ALS functional rating scale revised
clinician-reported outcome
differential item function
generalized anxiety disorder
health-related quality of life
major depressive disorder
multiple sclerosis
multiple sclerosis rating scale
principal component analysis
partial credit model
Parkinson disease
Parkinson’s disease rating scale
Patient Health Questionnaire
patient-reported outcome
posttraumatic stress disorder
rating scale model
Short-Form General Health Survey
systemic lupus erythematosus
The authors are grateful to the many participants who took part in their validation efforts, and to colleagues at PatientsLikeMe for their support of this work. The PatientsLikeMe Research Team has received research funding (including conference support and consulting fees) from Abbvie, Accorda, Actelion, Alexion, Amgen, AstraZeneca, Avanir, Biogen, Boehringer Ingelheim, Celgene, EMD, Genentech, Genzyme, Janssen, Johnson & Johnson, Merck, Neuraltus, Novartis, Otsuka, Permobil, Pfizer, Sanofi, Shire, Takeda, Teva, and UCB. The PatientsLikeMe R&D team has received research grant funding from Kaiser Permanente, the Robert Wood Johnson Foundation, Sage Bionetworks, The AKU Society, and the University of Maryland. PW has received speaker fees from Bayer and honoraria from Roche, ARISLA, AMIA, IMI, PSI, and the BMJ.
PW, MH, and JH are employees of PatientsLikeMe and hold stock options in the company. KG, SM, and RB conducted this work as paid consultants to PatientsLikeMe. PW is an associate editor at the