Predictors of Internet Health Information–Seeking Behaviors Among Young Adults Living With HIV Across the United States: Longitudinal Observational Study

Background: Consistent with young adults’ penchant for digital communication, young adults living with HIV use digital communication media to seek out health information. Understanding the types of health information sought online and the characteristics of these information-seeking young adults is vital when designing digital health interventions for them. Objective: This study aims to describe characteristics of young adults living with HIV who seek health information through the internet. Results will be relevant to digital health interventions and patient education. Methods: Young adults with HIV (aged 18-34 years) self-reported internet use during an evaluation of digital HIV care interventions across 10 demonstration projects in the United States (N=716). Lasso (least absolute shrinkage and selection operator) models were used to select characteristics that predicted whether participants reported seeking general health and sexual and reproductive health (SRH) information on the internet during the past 6 months. Results: Almost a third (211/716, 29.5%) and a fifth (155/716, 21.6%) of participants reported searching for general health and SRH information, respectively; 26.7% (36/135) of transgender young adults with HIV searched for gender-affirming care topics. Areas under the curve (>0.70) indicated success in building models to predict internet health information seeking. Consistent with prior studies, higher education and income predicted health information seeking. Higher self-reported antiretroviral therapy adherence, substance use, and not reporting transgender gender identity also predicted health information seeking. Reporting a sexual orientation other than gay, lesbian, bisexual, or straight predicted SRH information seeking. J Med Internet Res 2020 | vol. 22 | iss. 11 | e18309 | p. 1 https://www.jmir.org/2020/11/e18309 (page number not for citation purposes) Comulada et al JOURNAL OF MEDICAL INTERNET RESEARCH


Introduction
Interventions for young adults living with HIV (YALH) increasingly capitalize on the popularity and integration of digital communication media into daily routines. Social media use has saturated the information landscape in the United States, with near-ubiquitous social media platform use among those younger than 30 years [1]. Consistent with their heavy use of digital forms of communication, youth and young adults use digital communication media to seek out personalized and pertinent health information [2]. Though an income-based digital divide persists [3], reliance on electronic health information is common among youth from marginalized populations, including those living with HIV [4] and unstably housed [5]. In fact, almost half (47%) of runaway and homeless youth sought information about HIV or other STIs, and 40% sought information about sex or sexuality from online sources [5].
While the penchant for digital communication among youth and young adults can be harnessed to develop digital health interventions (eg, interventions that provide health information through online sources, social media, and text messages), these strategies are challenged by the growing levels of health misinformation available from digital sources [6]. Younger people often need guidance engaging with accurate health information designed for them. Understanding the types of health information sought online and the characteristics of these information-seeking young adults, especially vulnerable or stigmatized populations, is vital when designing digital health interventions to reach these communities.
Prior studies on internet health information seeking have focused on the general population and patient populations not living with HIV, mostly adults. These studies demonstrated that certain characteristics, including greater socioeconomic stability [7,8], more internet experience [9], female gender [8], less perceived social support [10], better health care provider relationships, and greater health engagement are associated with greater online health information seeking [11]. Behavioral characteristics, including poorer mental or physical health and alcohol and tobacco use also correlate with internet health information seeking [7,8,12]. Importantly, Mitchell et al [13] found that sexual minority youth are more likely to use the internet to seek sexual health information than their heterosexual counterparts.
In one notable addition to the literature, Calvert et al [14] evaluated adults living with HIV and found that greater socioeconomic stability was associated with greater engagement with online health information seeking. Studies focused specifically on how YALH seek health information are needed because the stigma associated with HIV is a known barrier to care [15]. Stigma and the intersectionality of multiple marginalized identities (racial/ethnic minority, gender identity, sexual orientation) of many YALH may represent opportunities for safe exploration and information seeking through digital spaces [16]. Furthermore, a robust examination of predictive models of internet health-seeking behaviors will provide valuable information for tailoring digital interventions to YALH.
To address this goal, analyses for this paper applied machine learning (ML) methods to data from a digital health intervention initiative for YALH to identify salient characteristics that predict internet health-seeking behaviors. The comprehensive model of information seeking [17] and correlates of online health seeking from prior studies provided a framework for selecting candidate predictors. An individual's sociodemographic characteristics were conceptualized as preceding, and even influencing, where people seek information [18]. Information seeking refers to intentional efforts made by individuals to satisfy their information needs or goals [18], such as HIV-related health care needs. As a secondary aim, the study evaluated individual predictors selected by ML methods and compared these findings with extant literature.

Participants
Data used for this analysis were collected as part of an initiative funded by the Health Resources and Services Administration to evaluate digital health interventions targeting young people living with HIV (aged 13 to 34 years) across 10 demonstration sites in the United States. Digital interventions were developed by each site and varied in content and delivery format, which included automated text messaging, mobile apps, and social media. and Winston-Salem, North Carolina. Recruitment took place through community and university clinics, health departments, a hospital system, and a community research site. Eligibility for study enrollment required young people to have a confirmed HIV diagnosis, be between the ages of 13 and 34 years, be capable of filling out audio computer-assisted self-interview (ACASI) assessments administered in English or Spanish, and meet at least one of the following criteria based on the US Department of Health and Human Services (HHS) common core indicators for monitoring HHS-funded HIV care services: (1) newly diagnosed with HIV within the last year upon enrollment, (2) not newly diagnosed and not currently engaged in HIV care, (3) never linked to HIV medical care, regardless of the duration of HIV infection, and (4) not virally suppressed, defined as having a viral load of 200 copies/mL or greater. Demonstration sites had additional eligibility criteria, such as being a patient at the site's clinic or owning a smartphone, if required by their digital health intervention. Participants from all genders, races and ethnicities, and sexual orientations were included in the initiative. Details on the initiative and the intervention typology across sites are described in Medich et al [19]. Figure 1 shows the process used to select participants for analysis. Analyses in this paper incorporated predictors measured at baseline and outcomes measured at 6 months post enrollment (N=720 participants). Participants with missing baseline or 6-month assessments were excluded. Missing data occurred from errors saving electronic assessment files or missing assessments (eg, due to attrition after the baseline assessment). There were only 4 participants younger than 18 years old, making it difficult to model health-seeking behaviors in this group. Moreover, the younger participants represented a different patient population in terms of clinical practice. Therefore, they were excluded, and the final analytical sample contained 716 YALH.

Procedure
All data collection procedures for the cross-site evaluation were approved by the institutional review board at the University of California, Los Angeles (UCLA; No. 15-001625), the institution that was responsible for collecting and evaluating data across the sites. At enrollment, each site screened, consented, and administered a baseline ACASI assessment to participants using Questionnaire Design Studio software (Nova Research Company). Sites also collected participants' medical chart data, either by hand abstraction or from administrative records associated with the receipt of Ryan White HIV/AIDS Program funds. ACASI assessments were administered and medical chart data were obtained by sites every 6 months over the 18-month follow-up period. Sites submitted deidentified ACASI and medical chart data to the UCLA evaluation center through a web-based secure portal.

Measures
Measures treated as predictors were assessed at baseline. After baseline assessment began, measures that better captured evolving trends in technology usage among lesbian, gay, bisexual, transgender, and queer or questioning youth than baseline measures were developed and added to the 6-month follow-up assessment. These measures are treated as outcomes in the analyses.

Sociodemographic Characteristics
Age was calculated from the self-reported month and year of birth. Participants were asked to specify the race with which they identified and indicate whether they were Hispanic or Latinx. They designated their current gender identity with categories for male, female, transgender man, transgender woman, genderqueer or nonconforming, or other gender identity. Participants were also asked to categorize their sexual orientation as straight, lesbian or gay, bisexual, queer, other, or don't know/not sure; responses indicating "other" varied, included pansexual, nonsexual, and refusals to answer. Participants specified whether they were currently in school and the highest level of education they had completed. They were also asked to report monthly income "from all sources combined" and their current employment status (eg, full-time, part-time, student, or disabled). Housing stability was assessed by asking participants to indicate which type of place they stayed in the most in the past week (eg, a house or homeless shelter).

Region
Most transgender women were recruited in Los Angeles, since the Los Angeles site intervention targeted transgender women. Collinearity that would have resulted by including site and gender identity as predictors was addressed by replacing site with a predictor based on Census Bureau regions for the United States. Categories were created for the West (Los Angeles and San Francisco, California), Midwest (Chicago, Illinois; Cleveland, Ohio; and St Louis, Missouri), South (Corpus Christi, Texas, and Winston-Salem, North Carolina), and Northeast regions (Hershey and Philadelphia, Pennsylvania, and New York, New York).

Health Insurance
Participants were asked what type of health insurance they had. Insurance status was dichotomized as being insured versus not being insured or not knowing one's insurance status.

Time Since HIV Diagnosis
Time since HIV diagnosis was calculated as the number of years between the self-reported HIV diagnosis date and the baseline assessment date.

Viral Load
Viral load data were obtained via abstraction from patient medical records. Viral load was categorized as suppressed (at less than 200 copies/mL), unsuppressed, or missing. A missing data category was included because sites were unable to obtain medical chart data on all participants.

Doctor's Office Visitation
Similar to viral load, HIV-related ambulatory care visit attendance was obtained from medical record data. Attendance was categorized as having had an HIV medical visit in the past 6 months, not having had an HIV visit in the past 6 months, or missing.

Provider Empathy
The consultation and relational empathy measure was used to assess participants' perceptions of health care provider empathy (10 items; Cronbach α=.98) [23].

Substance Use
Participants were asked to indicate any nonprescribed substances they used but did not inject in the past 6 months from a checklist (ie, recent use). Both proper names and street names of substances were presented in the checklist, such as methamphetamine and "Tina." Indicator variables were created to denote use (1) or nonuse (0) for alcohol, tobacco, marijuana, and other substances, such as synthetic marijuana, methamphetamine, cocaine, heroin, and painkillers. Other substances were not modeled separately due to self-reported rates of use that were less than 10%, except for methamphetamines (118/720, 16.3%), inhalants (113/720, 15.7%), and powder cocaine (88/720, 12.2%). Participants were asked about lifetime and recent injection drug use, excluding prescribed medications.

Perceived Confidence in Receiving Social Support From Family and Friends
Perceived social support availability from family and friends was assessed through 3 social support items from the coping self-efficacy scale (3 items; Cronbach α=.83) [24], in which participants were asked about confidence in receiving support from family and friends on a scale from 0 (not confident at all) to 10 (very confident).

HIV Status Disclosure
HIV status disclosure was dichotomized as disclosure to one or more individuals or to none based on the participant response to having ever told anyone that they have HIV. If they had disclosed their status, they were asked to indicate types of individuals to whom they disclosed their HIV status (eg, partners and family members).

HIV-Related Stigma
HIV-related stigma was assessed through the revised HIV stigma scale (10 items; Cronbach α=.89) [25]. Using a scale from 1 (strongly disagree) to 4 (strongly agree), respondents were asked to rate their agreement with statements about experiencing HIV stigma.

General Physical and Mental Health
General physical and mental health quality of life was assessed with 4 questions from the 12-item Short-Form Health Survey [26]. Participants were asked if they "felt calm and peaceful," had "a lot of energy," or "felt downhearted and blue" over the past 4 weeks. The 3 items were summed to create a mental health measure (Cronbach α=.66). Participants were also asked how often their physical health or emotional problems interfered with social activities.

Outcomes Measured at 6 Months Post Enrollment
Participants were asked what types of digital media and communication tools they used and what types of information were sought and discussed. For this analysis, the focus centered on questions that queried the types of information that were sought through the internet. Sexual health information (eg, practicing safer sex and HIV information) discussed or sought through text messaging, email, private messaging, and social networking applications is also presented to describe the sample. Two binary outcome measures were created for (1) having looked up general health (GH) information on the internet in the past 6 months and (2) having looked up sexual and reproductive health (SRH) information on the internet in the past 6 months. Transgender health information seeking (eg, gender-affirming hormone information) was also assessed, but rates were too low to analyze using ML models.

Statistical Analysis
All analyses were conducted using R software (version 3.5.3; R Project for Statistical Computing) [28]. Data were randomly split into training (537/720, 74.6%) and testing data sets (179/720, 24.9%). A ML approach was chosen to meet the aims of the paper to build a predictive model and evaluate individual predictors selected by the model. In this vein, we used lasso (least absolute shrinkage and selection operator) regression as the ML approach because it fits a model to all candidate predictors and shrinks regression coefficients to zero for predictors that do not adequately contribute to error minimization. In other words, lasso regression provides a distinguishable subset of predictors, in contrast to ridge regression, which does not constrain regression coefficients to be zero, or to other ML approaches that provide less interpretable parameter estimates, such as random forest algorithms. The glmnet R package [29] was used to fit lasso logistic regression models to the training data set using 10-fold validation to select predictors for seeking general health information and SRH information via the internet.
Accuracy of the lasso models was gauged by using parameter estimates to predict internet health information seeking in the test data and comparing predictions to observed outcome values. Receiver operating curves (ROCs) were plotted to evaluate the sensitivity and specificity of predictions over a range of probability thresholds. Areas under the ROC curve (AUCs) are presented to gauge the accuracy of predictions. An AUC of 0.50 indicates a model that performs no better than chance.
Traditional logistic regressions were fit to the training data using predictors selected by lasso models to aid interpretation of associations between predictors and internet health seeking. Odds ratios (ORs) are reported. Statistical significance levels are not presented due to difficulties interpreting regression coefficient P values for subsets of predictors selected using ML algorithms.

Sample Characteristics
Tables 1 and 2 show variables that were evaluated as candidate predictors of seeking health information on the internet across the 10 demonstration sites (N=716). Two-thirds of the participants were aged 25 to 34 years (483/716, 67.5%). Half of the participants reported a non-Latinx African American racial/ethnic identity (362/716, 50.6%); 27.9% (200/716) reported Latinx ethnicity. Most participants reported male gender (506/716, 70.7%). Nearly one-fifth (130/716, 18.2%) of participants identified as transgender women, and 5 of the 716 participants identified as transgender men (.01%). Approximately half of the participants identified as gay or lesbian (393/716, 54.9%), and half reported having no more than a high school education and access to stable housing (368/716, 51.4% and 363/716, 50.7%, respectively). The median monthly income was US $800 (IQR US $200 to $1500).      Figure 2 shows the ROCs for lasso-based predictions of GH and SRH information seeking in the test data. Curves above the 45° line indicate a degree of predictability beyond chance. AUCs for lasso models fit to GH information-seeking and SRH information-seeking outcomes are 0.76 and 0.73, respectively. To aid interpretation, we describe the accuracy of the model for a probability threshold of 0.50, where we classified participants as having searched the internet for health information if the predicted probability was greater than 0.50. A total of 32.4% (58/179) of the participants in the test data sought GH information on the internet. We correctly classified 16 as having sought GH information and correctly classified 112 of the 121 participants who did not seek GH information. Based on a 0.50 threshold, the accuracy of the GH model was (16 + 112) / 179 = 71.5%. Using the same formula, the accuracy of the SRH model at the 0.50 threshold was 70%. Table 3 and 4 show ORs from logistic models fit to internet GH information-seeking and SRH information-seeking outcomes. Covariates are predictors selected from each lasso model. Mostly consistent with our hypotheses, having a high school degree or less was associated with lower odds of seeking GH and SRH information on the internet relative to having a higher degree (OR 0.49 and 0.68, respectively). Reporting high monthly income was associated with higher odds of seeking SRH information on the internet relative to no, low, or unreported monthly income. In a contradictory fashion, reporting low monthly income was associated with lower odds of seeking SRH information on the internet relative to no or unreported monthly income (OR 0.59). Participants reporting recent use of alcohol, tobacco, and marijuana had higher odds of seeking GH and SRH information (OR 1.29-1.70).   Self-reported high ART adherence was associated with higher odds of seeking GH and SRH information versus low adherence or not being on ART (OR 1.62 and 1.51, respectively). White ethnicity was associated with higher odds of seeking SRH information versus other racial/ethnic groups (OR 1.68). Male gender identity was associated with higher odds of seeking GH and SRH information (OR 1.19 and 1.26, respectively) and transgender gender identity was associated with lower odds of seeking GH and SRH information (OR 0.41 and 0.50, respectively) relative to other gender identities. The odds of seeking SRH information online were approximately twice as high for those reporting "other" as their sexual orientation (ie, excluding those identifying as gay, lesbian, bisexual, or straight) (OR 2.48).

Principal Findings
This study is among the first to report internet health information-seeking behaviors among YALH. We found that a significant minority of YALH used the internet to find GH (211/716, 29.5%) and SRH information (155/716, 21.6%). The rates of technology use and health information seeking were similar in this population to previous reports of predominantly racial/ethnic minority samples of homeless youth, who may face many similar challenges [30].
Patterns of seeking health information were associated with several demographic factors. As reported in the general population [8], YALH in this sample with higher socioeconomic status (ie, education, income) were more likely to go online to seek information regarding both GH and SRH. Interestingly, reporting a sexual orientation of "other" as opposed to gay, straight, or bisexual was also associated with increased SRH information seeking. This may reflect that adolescents and young adults who are exploring their sexuality may feel more comfortable finding health information online [31] as opposed to seeking health information from a person (eg, provider) due to perceived or enacted stigma in the health care setting [32].
Also consistent with general population findings [2], health-related information seeking in the sample was most likely to be directed toward GH topics, like diet and exercise. This focus likely stems from progress in HIV treatment and care [33] and highlights the importance of providers focusing on holistic health. Fewer YALH searched for SRH, most commonly to explore STI symptoms, testing, and treatment. Among transgender individuals (mostly transgender women), nearly a quarter searched for information about hormones, surgery, or other procedures. This is particularly important given poor access to gender-affirming services experienced by this population [34] and underscores the need for integration of gender-affirming care with HIV prevention and treatment services.

Limitations
Several study limitations should be noted. We attempted to engage young HIV-positive individuals who were struggling with adherence and engagement in care. However, this sample did not include those who are disengaged or lost to care, possibly due to syndemic health issues. This group may have very different internet health information-seeking patterns. It is also important to acknowledge that our sample was recruited to participate in digital HIV interventions, suggesting a higher proportion of YALH who seek health information on the internet than the general population of YALH. Further, while youth were recruited from 10 sites across the United States, regional differences in service options and HIV-related stigma may differentially affect YALH (ie, youth living in more rural areas). Region was not retained as a predictor in the final model, but regional differences may not have been adequately captured by study site locations or the regional predictor variable that we created.

Conclusions
Despite these limitations, this is one of the first studies to address internet health information-seeking behaviors among a marginalized group of youth living with a chronic disease. High rates of internet use among YALH and nearly one-quarter of participants seeking health information online have important implications for clinicians and health educators working with YALH and other marginalized populations. Health care providers should receive training in how to engage in open discussions with patients about their technology use, the SRH topics they search for, and ways of ensuring the information being accessed online is reputable. These direct discussions may help reduce stigma and be particularly useful in supporting transitional age youth. Transitioning from pediatric to adult HIV care is commonly associated with poor retention in care [35]; leveraging eHealth literacy support represents an opportunity to improve care outcomes during this period.
While measures related to eHealth literacy (ie, YEHS and MTUAS subscales) were not retained in models, interventions to build transactional eHealth literacy skills (ie, skills to locate and understand, exchange, evaluate, and apply health information [36]) among YALH may still strengthen their engagement in care and increase access to high-quality health information via trusted communication channels (eg, governmental organizations), even through social networking platforms [37,38]. Though user-generated health information content shared on social networking platforms may not be as accurate or trustworthy as scientific or governmental sources, there is value in social networking platforms and tools regarding reach and engagement. Widening disparities in the quality of health information online, particularly on popular social media, may compromise the adoption of these platforms by trusted creators of online health information. For example, though the CDC maintains accounts on legacy social media networks (eg, Instagram), newer technologies can quickly lure younger adults away from carefully crafted messaging. To meet this growing need for trustworthy health information online that serves YALH, creators of digital health information need to be innovative in developing strategies for meeting YALH where they are online.