Published on in Vol 18, No 6 (2016): Jun

Completeness and Reliability of Location Data Collected on the Web: Assessing the Quality of Self-Reported Locations in an Internet Sample of Men Who Have Sex With Men

Completeness and Reliability of Location Data Collected on the Web: Assessing the Quality of Self-Reported Locations in an Internet Sample of Men Who Have Sex With Men

Completeness and Reliability of Location Data Collected on the Web: Assessing the Quality of Self-Reported Locations in an Internet Sample of Men Who Have Sex With Men

Original Paper

1Department of Epidemiology, Rollins School of Public Health, Emory University, Atlanta, GA, United States

2Laney Graduate School, Emory University, Atlanta, GA, United States

3Department of Behavioral Science and Health Education, Rollins School of Public Health, Emory University, Atlanta, GA, United States

Corresponding Author:

Adam S Vaughan, M.P.H., M.S.

Department of Epidemiology

Rollins School of Public Health

Emory University

1518 Clifton Road NE

Atlanta, GA, 30322

United States

Phone: 1 404 727 2038

Fax:1 404 727 8661


Background: Place is critical to our understanding of human immunodeficiency virus (HIV) infections among men who have sex with men (MSM) in the United States. However, within the scientific literature, place is almost always represented by residential location, suggesting a fundamental assumption of equivalency between neighborhood of residence, place of risk, and place of prevention. However, the locations of behaviors among MSM show significant spatial variation, and theory has posited the importance of nonresidential contextual exposures. This focus on residential locations has been at least partially necessitated by the difficulties in collecting detailed geolocated data required to explore nonresidential locations.

Objective: Using a Web-based map tool to collect locations, which may be relevant to the daily lives and health behaviors of MSM, this study examines the completeness and reliability of the collected data.

Methods: MSM were recruited on the Web and completed a Web-based survey. Within this survey, men used a map tool embedded within a question to indicate their homes and multiple nonresidential locations, including those representing work, sex, socialization, physician, and others. We assessed data quality by examining data completeness and reliability. We used logistic regression to identify demographic, contextual, and location-specific predictors of answering all eligible map questions and answering specific map questions. We assessed data reliability by comparing selected locations with other participant-reported data.

Results: Of 247 men completing the survey, 167 (67.6%) answered the entire set of eligible map questions. Most participants (>80%) answered specific map questions, with sex locations being the least reported (80.6%). Participants with no college education were less likely than those with a college education to answer all map questions (prevalence ratio, 0.4; 95% CI, 0.2-0.8). Participants who reported sex at their partner’s home were less likely to indicate the location of that sex (prevalence ratio, 0.8; 95% CI, 0.7-1.0). Overall, 83% of participants placed their home’s location within the boundaries of their reported residential ZIP code. Of locations having a specific text description, the median distance between the participant-selected location and the location determined using the specific text description was 0.29 miles (25th and 75th percentiles, 0.06-0.88).

Conclusions: Using this Web-based map tool, this Web-based sample of MSM was generally willing and able to provide accurate data regarding both home and nonresidential locations. This tool provides a mechanism to collect data that can be used in more nuanced studies of place and sexual risk and preventive behaviors of MSM.

J Med Internet Res 2016;18(6):e142



Place, or the context simultaneously experienced and defined by individuals [1], is critical to our understanding of human immunodeficiency virus (HIV) among men who have sex with men (MSM) in the United States. Through surveillance data, place fundamentally shapes our understanding of the epidemiology of the epidemic [2]. As a contextual exposure, place represents both a foundational environment in which HIV-related behaviors occur and a potential modifier of the pathway between other contextual exposures and HIV-related outcomes [3,4]. However, within the public health literature, place is almost always defined as a residential location [5-13], suggesting a fundamental assumption of equivalency between place of residence, place of sexual risk, and place of prevention. US national HIV case surveillance data make the same assumption, most often reporting data based on residence at the time of diagnosis [2].

Despite this implicit assumption, HIV-related sexual risk and prevention behaviors of MSM do not necessarily occur within the residential neighborhood [14-19]. Social ecologic theory acknowledges the importance of nonresidential locations (such as the broader urban environment and gay venues) in determining these behaviors [3,4,20-23]. For example, the availability of HIV testing services and venues where MSM gather may be influenced by broader social characteristics and norms. Access to these services and venues may then influence the formation of sexual networks and promote or inhibit individual-level behaviors, such as regular HIV testing and unprotected sex [3,4,23,24]. Consequently, using only residential neighborhood as a proxy for the many levels of sociocontextual factors may miss critical health-related exposures. To address this potential misclassification, the concept of activity spaces, which represent the collection of locations to which an individual has been exposed, has recently been introduced into the HIV literature [14,15,25].

Measuring activity spaces requires collecting large amounts of detailed geographic data. Prior studies have used global positioning systems (GPS) [26-31] or interviewer-assisted means to establish specific locations and, ultimately, to measure activity spaces [14,15]. Although these methods provide a precise and comprehensive set of locations, they have limitations. Collecting locations with GPS requires processing large amounts of data and a large investment in purchasing and maintaining the GPS devices. Interviewer-assisted methods require a large time and budget commitment, limiting the number of potential participants in a study.

To begin to address these limitations, our research group recently developed a Web-based tool that allows participants to select locations using a Google Maps question embedded within a Web-based survey [32]. Given the potentially sensitive nature of these data, participants may be more comfortable reporting such data in an anonymous Web-based survey [33]. In validation of this Web-based tool using home and health care provider locations among a cohort of HIV-positive Atlanta-area MSM, approximately 84% of participants indicated these locations using the map-based tool [32]. Among participants recruited on the Web, 50% of locations entered using the map-based tool were found to be within 0.3 miles of the true location (interquartile range, 0.1-1.1 miles). However, this previous study collected data for a limited number of locations from a population defined by a single geographic area (Atlanta, Georgia) and health status (HIV positive). Because research participation may differ by demographic and health-related factors, these results may not be generalizable to a broader population of MSM [34-37].

Therefore, given the need to gather detailed spatial data for HIV-related behaviors among MSM, to overcome current challenges in its collection, and to expand on prior validation efforts, this study examines the quality of spatial data collected using a Web-based map tool. Specifically, using a Web-based map tool to collect both residential and relevant nonresidential locations (eg, sex locations, HIV testing, work, socialization), this study examines the completeness and reliability of data collected from MSM living in a wide range of geographic locations and independent of HIV status.


Participants were recruited using Facebook banner ads, a method that has been shown to yield samples with similar risk behaviors and demographics (excepting race) as venue-based methods of recruiting MSM [37]. Ads were targeted to users based on geography and interests. A $3 donation to a charity the participant selected from a predefined list was provided as incentive.

Eligible participants were required to be male at birth, aged 18 years or older, be able to read and write English, and had to report at least one male sex partner in the past 6 months and to reside in Georgia, Texas, or Wisconsin. These 3 states vary in their underlying HIV epidemiology, demographics, and contextual factors, which could be associated with willingness to answer our map questions and allowed us to draw conclusions based on a diverse convenience sample of MSM. This population also expands on the population used in the prior validation of this tool [32]. Participants who met eligibility criteria completed a Web-based consent form.

Collection of Place-Based Data

Consenting participants completed a Web-based survey that included demographic and behavioral questions and an item on residential ZIP code at the time of data collection.

In addition to these questions, participants were asked to use a map-based tool (Figure 1) [32] to drop a pin onto a Google map to indicate the following specific locations that may be relevant to the daily lives and health-related behaviors of MSM: home; work or school location, if the participant reported working at least part time or being a student; locations of up to 3 sexual encounters in the past 6 months; locations of up to 2 socialization locations; location of last HIV test, within the past year; location of the last test for another sexually transmitted infection, within the past year; primary care physician, if the participant reported having a primary care physician; pharmacy, if the participant reported having a regular pharmacy; and location where he received free condoms, if the participant reported picking up free condoms in the past 6 months.

For each location of interest, participants could choose to not answer the map question and were asked to indicate why they chose not to answer. These reasons were then categorized as either unable or unwilling to answer the question. Answer options indicating that a participant was unable to select the location were the following: “I can’t remember where this location is,” “I’m not sure where that place is on a map,” “I’m not comfortable using the map to select locations,” “This place is in a different city.” Answer options indicating that a participant was unwilling to select the location were “Didn’t feel comfortable giving that information,” “Worried about a loss of privacy,” “Worried about what friends, family, or coworkers would think.”

Participants were also allowed to indicate that a location was the same as another previously reported location (eg, report sex at home). In these cases, participants were not required to select the location a second time or to indicate a reason for not selecting the location. Willingness to use the map-based tool to answer the second location was assumed the same as that of the previously reported location.

For many types of locations, participants needed to report engaging in a qualifying behavior to be eligible to answer the corresponding map-based question. For example, participants needed to report having a regular physician before being presented with the map to identify physician location. As a result, the number of participants eligible to answer each location question varied.

In addition, for each location, participants entered a name that was used to reference that location throughout the survey. This name was entered by participants and could be generic (eg, home, work, bar) or specific (eg, Dr. Smith, Walgreens).

Figure 1. Sample of Google Maps question embedded within the Web-based survey.
View this figure

Primary Outcome Definitions

This analysis uses 2 different primary outcomes: answering the entire set of map questions and answering specific map questions. A participant was considered to have answered the entire set of map questions if he used the map-based tool to indicate all locations for which he was eligible to answer. More granularly, the second outcome required participants to indicate specific eligible locations (eg, home, socialization, sex) using the map-based tool.

Covariate Definitions

The covariates of interest in this study represent demographic variables, contextual factors related to residential location, and factors specific to given location types. All these factors could potentially be associated with an individual being unwilling or unable to answer the location-based questions.

Age was categorized into 3 groups with breaks at ages 25 and 50 years, in accordance with age group definitions used in the Centers for Disease Control and Prevention reporting of HIV surveillance data [38,39]. Due to a limited number of nonwhite participants, self-reported race was categorized as white or nonwhite. Education was categorized as high school diploma or less, any college, or college degree. HIV status was self-reported. State was defined as the state where the participant reported currently living. Each participant was asked to indicate his primary mode of transportation, and this was dichotomized into primarily using a car and primarily using other, noncar transportation.

Residential poverty and residential urbanicity were defined based on the reported residential ZIP code. Poverty was defined using ZIP code tabulation areas (ZCTA) from the US Census Bureau’s 2009-2013 five-year American Community Survey estimates and categorized as low poverty (<20% poverty), high poverty (≥20% poverty), or concentrated poverty (≥40% poverty), based on federal poverty definitions [40]. Urbanicity was defined using the 2013 National Center for Health Statistics Urban-Rural Classification Scheme for Counties [41], with the 2 most rural categories combined. For each sex location, participants reported the type of location (eg, sex partner’s home). Participants also reported any condomless anal intercourse (CAI) at last sex at each reported sex location.

Statistical Analysis


After calculation of descriptive statistics for the covariates of interest, this analysis had 3 parts. We first examined factors associated with answering the entire set of map questions. Second, in an item-specific analysis, we examined factors associated with answering specific map questions (eg, home, sex locations). Finally, we examined the reliability of the reported locations.

Response to the Entire Set of Map Questions

Data regarding answering the entire set of map questions for which participants were eligible were first summarized by the covariates of interest. In bivariate analyses, we compared completeness across the levels of each covariate using chi-square and Fisher exact tests.

We then performed multivariable analyses to examine associations between the given covariates and answering all eligible map questions. Predictive margins methods were used with logistic regression to estimate adjusted prevalence ratios (PRs) for answering all map questions [42,43]. This method permitted direct estimation of adjusted PR, rather than an estimated prevalence odds ratio. Because we expected most men to respond to these questions (ie, the outcome is not rare) [32], the prevalence odds ratio estimated using logistic regression would be larger than the true PR and, consequently, direct estimation of the PR is preferred [44]. This method also avoids statistical convergence issues that may occur when estimating PR using other methods, such as log-binomial regression [45]. This model included the following possible predictors: age, race, poverty category for the residential ZIP code, residential urbanicity, state, education, HIV status, HIV test within the past year, and primary mode of transportation.

Response to Specific Map Questions

Data regarding answering specific map questions (ie, locations of home, sex, socialization, workor school, last HIV test, last sexually transmitted infection test, primary care physician, pharmacy, and free condoms) were first summarized by the covariates of interest. In bivariate analyses, we compared completeness in answering each type of map questions across the levels of each covariate using chi-square and Fisher’s exact tests. Proportions of the reasons for nonresponse were calculated.

We again used predictive margins methods with logistic regression to examine associations between the covariates of interest and answering specific map questions. Nine models were created, one for each location type. Each model included the following possible predictors of prevalence of response: age, race, residential poverty, residential urbanicity, state, education, HIV status, HIV test within the past year, and primary mode of transportation. The model for sex locations also included CAI and sex at the partner’s home. The model for reporting an HIV test location was restricted to HIV-negative participants. Each participant entered up to 2 socialization locations and up to 3 sex locations. Consequently, models for these 2 types of locations accounted for within-participant correlation using marginal models with exchangeable correlation structure.

Data Reliability

Data reliability was assessed using 2 methods. First, agreement between a reported ZIP code and residential location was determined. Other address information was not collected in this study. To measure this agreement, each residential location identified using the map tool was geocoded to a ZCTA, the US Census Bureau’s representation of ZIP codes. Agreement between the geocoded ZCTA and the participant self-reported ZIP code was then defined as an exact match between the 2 values.

In addition, reliability was assessed using distances between the reported location and name of the reported location. In this study, we asked men to identify locations for which they may not readily know the addresses and, consequently, for which a formal validation was not possible within this study. Therefore, for each location, participants entered text to help them identify the location in additional questions about that location. Using this text and the type of location, a Google Maps search was completed around the location selected using the map tool. If this search was informative, the distance between the reported point and the actual point were recorded. If the participant-entered text was generic (eg, doctor), rather than a specific name (eg, Dr Smith), then the driving distance between the selected location and the nearest location matching that description was recorded. Distances were summarized by those matched by a generic name, those matched by a specific name, and those matched using only a geographic location.

Analysis Software

Data management was performed using SAS, v9.4 (SAS Institute, Cary, NC, USA). Geocoding and spatial data manipulation were completed in R, v3.2.1 (R Foundation for Statistical Computing, Vienna, Austria) [46]. Predictive margins models were performed using SAS-callable SUDAAN, v11.0.1 (Research Triangle Institute, Research Triangle Park, NC, USA).


This study was approved by the Emory University Institutional Review Board (protocol #IRB00074519).

Sample Characteristics and Question Completeness

Of 105,815 men presented with the Facebook ad, 3058 men (2.9%) clicked on the ad to enter the eligibility screening. Of these, 624 men (20.4%) were eligible, of whom 341 men (11.1% of those screened, 54.6% of those eligible) consented to participate in the study. 247 men (72.4%) completed the survey and are included in this analysis. Our sample represented a wide range of ages, urbanicity, and poverty levels (Table 1). Our sample was highly educated and largely white.

Table 1. Sample characteristics (N=247).
CovariateNumber (%)
Age (years)

18-2566 (26.7)

26-50103 (41.7)

51 and older78 (31.6)

White202 (81.8)

Non-white race45 (18.2)
Reported HIVa positive
36 (14.6)
HIVa test within the past yearb
119 (56.4)

High school or less22 (8.9)

Some college89 (36.0)

College degree136 (55.1)

Georgia76 (30.8)

Texas134 (54.3)

Wisconsin37 (15.0)
Primary mode of transportation

Car227 (91.9)

Other20 (8.1)
Residential poverty

Low157 (63.6)

High71 (28.7)

Concentrated19 (7.7)

Urban core108 (43.7)

Suburban48 (19.4)

Medium metro41 (16.6)

Small metro31 (12.6)

Nonmetropolitan19 (7.7)

aHIV: human immunodeficiency virus.

bAmong participants who do not report being HIV positive.

Response to the Entire Set of Map Questions

Of included participants, 167 (67.6%) answered all map questions for which they were eligible. Nine participants (3.6%) answered none of the map questions for which they were eligible. Of the remaining participants, 71 (28.7%) answered at least one, but not all, map questions.

In unadjusted analyses (Figure 2), only less education was associated with significantly less completion of all map questions (P<.001), with 31.8% of participants with a high school diploma or less answering all questions, compared with 70.0% of participants with some college and 72.1% of participants with a college degree. This finding was confirmed in adjusted analyses (Figure 3), with participants with no college education being roughly half as likely as those with a college education to answer all eligible map questions (PR, 0.4; 95% CI, 0.2-0.8). No other covariate was significantly associated with answering all eligible map questions in unadjusted or adjusted analyses.

Figure 2. Crude percent of participants answering all eligible map questions and specific map questions. Percentages are the proportion of individuals within the given covariate level eligible to answer the map question who completed the given map question. Statistically significant differences are indicated in black filled circles.
View this figure
Figure 3. Adjusted prevalence ratios (aPR) and 95% CI for answering all eligible map questions and specific map questions by each covariate. aPRs are adjusted by all other covariates. Statistically significant aPRs are indicated with black filled circles. The scale of the y axis is logarithmic and differs across location types to better visualize the CI.
View this figure

Response to Specific Map Questions

In item-specific analyses, most (>80%) of those eligible answered each individual map question (Table 2). Sex locations were the least likely to be answered (80.6%). For most locations, participants who chose to not answer the map-based question were generally unwilling to answer, rather than unable to answer (Table 2). However, for sex locations and HIV testing locations, the proportion of participants who were unable to answer was similar to the proportion who were unwilling to answer.

Table 2. Ability and willingness to answer specific map-based questions.
LocationTotal eligible (%)Answered (%)Unable (%)Unwilling (%)Both unwilling and unable (%)No reason given (%)
Home247 (100)227 (91.9)2 (0.8)15 (6.1)3 (1.2)0 (0)
Work or school209 (84.6)185 (88.5)2 (1.0)21 (10.0)0 (0)1 (0.5)
Socializationa474 (96.0)430 (90.7)6 (1.3)33 (7.0)3 (0.6)2 (0.4)
Sexb396 (53.4)319 (80.6)30 (7.6)36 (9.1)3 (0.8)8 (2.0)
HIVc testd119 (56.4)103 (86.6)9 (7.6)7 (5.9)0 (0)0 (0)
STIc test120 (48.6)103 (85.8)7 (5.8)11 (11.2)1 (1.0)3 (3.1)
Physician178 (72.1)161 (90.4)5 (2.8)7 (13.7)0 (0)3 (5.9)
Pharmacy183 (74.1)169 (92.3)3 (1.6)7 (5.9)0 (0)0 (0)
Free condoms78 (31.6)64 (82.1)3 (3.8)9 (7.5)1 (0.8)0 (0)

aParticipants reported up to 2 socialization locations.

bParticipants reported up to 3 sex locations.

cHIV: human immunodeficiency virus; STI: sexually transmitted infection.

dAmong participants who do not report being HIV positive.

In unadjusted analyses, less than college education was associated with not reporting home location (P=.003) and sex locations (P=.05) (Figure 2). Nonwhite race was significantly associated with not reporting physician (P=.01) locations. Sex at the partner’s house was significantly associated with not reporting the sex location (P=.001). No other bivariate associations were statistically significant.

In adjusted analyses, only 4 covariates were significantly associated with answering specific map questions (Figure 3). Nonwhite participants were less likely than white participants to locate a pharmacy (PR, 0.8; 95% CI, 0.7-1.0). Participants living in Georgia were more likely than participants living in Wisconsin to locate a primary care physician (PR, 1.3; 95% CI, 1.0-1.6). Participants reporting sex at their partner’s home were less likely to indicate the sex location (PR, 0.8; 95% CI, 0.7-1.0). Similarly, participants with less than a college education were less likely to indicate a sex location than participants with a college degree (PR, 0.7; 95% CI, 0.5-1.0).

No other model-based associations between the covariates and answering specific map questions were statistically significant. For example, participants who reported CAI were no more likely to report sex locations (PR, 1.0; 95% CI, 0.9-1.1).

Data Accuracy

Of the 226 participants whose map-based home location could be assigned to a ZCTA, 187 (83%) placed the home location within the boundaries of the reported residential ZIP code. Of the 39 participants (17%) who placed a home location outside of the boundaries of the reported residential ZIP code, 29 placed the home location in an adjacent ZIP code, 2 reported post office box or institutional ZIP codes with a correct pin drop, and 8 placed the home location in a nonadjacent ZIP code. Reliability of residential location did not vary with urbanicity (P=.15).

Of the 1176 unique locations reported by the participants, the combination of the location type and the participant’s text description permitted 575 locations (49%) to be identified. Of these, 278 text descriptions (48%) were a specific name (eg, Walgreens), 61 (11%) were a geographic area (eg, downtown, San Antonio), and 236 (41%) were a generic name (eg, doctor, pharmacy, hospital). Of the 61 locations identified as a geographic area, 53 (87%) were placed in the correct geographic area. Locations were not able to be identified because of a name that had meaning only to the participant (eg, home, work, guy 2’s place, RLD).

Of all locations having a specific text description, the median distance between the participant-selected location and the location determined using the specific text description was 0.29 miles (interquartile range, 0.06-0.88). Of all locations having a generic text description, the median distance between the selected location and the location determined using the generic text description was 0.29 miles (IQR, 0.08-0.64). When stratified by location type, median distances between the selected location and location determined using the text descriptions were generally <one-third mile (Table 3). Although home and work have the highest median distances, very few locations could be identified based on the participant’s text description.

Table 3. Distance in miles between selected location and location determined using any text description.
HIVa test190.220.01-0.52
STD test240.220.13-0.49

aHIV: human immunodeficiency virus; STD: sexually transmitted disease.

Principal Findings

In this paper, we examined the feasibility of collecting location-based data using a Web-based, map-based tool among an online convenience sample of MSM. Overall, participants were willing and able to use this tool to accurately indicate the requested locations, suggesting that this method of data collection is feasible, and results in complete, good quality data. In addition, for most locations, men who chose to not use the map tool were not significantly different from men who did use the tool with respect to demographic factors and HIV-related behaviors. The notable exception to this finding is that men were 20% less likely to report a sex location if that location was a partner’s home, reflecting both confidentiality concerns and uncertainty in the exact location.

The lack of significant associations between the examined covariates and using the map tool has critical implications for the use and subsequent interpretation of these data. Analyses relying on these locations in similar Web-based populations will have minimal bias resulting from nonresponse to these questions, with respect to the covariates measured in this study, although bias may exist due to nonparticipation. A first key exception to this finding was the observed educational gradient in which participants with no college education were less likely to provide all requested locations and sex locations. Missing data among these individuals may especially be a concern in Web-based research, where MSM of color are more difficult to recruit [47].

A second key exception is the potential for bias in analyses using sex location when sex occurs at the partner’s home (although a large majority still provided this location). Therefore, these missing data may bias analyses where either having sex at the partner’s home or education is associated with both the exposure and outcome [48]. This finding may be critical for confounding by education because lower levels of education are frequently associated with locations and with poorer health outcomes.

Men who did not provide the requested locations were generally unwilling, rather than unable, to provide the locations. Even in an anonymous Web-based survey, privacy remained a concern among a small fraction of participants. Although most participants responded to these map questions, privacy concerns for these few individuals must be considered in the implementation and interpretation of future surveys. Providing participants with the opportunity to learn more about their data’s security and reinforcing the acceptability of reporting approximate locations (eg, the nearest intersection) may help to assuage these concerns.

Similarly, participants’ inability to provide these locations could also be addressed within the Web-based survey. This inability may stem from a lack of geographic knowledge or uncertainty in locations. Incorporating text search boxes to search for a given street name or emphasizing the acceptability of identifying an intersection or other landmark could potentially address this limitation. This recommendation could also reduce the observed educational gradient in responding to these questions.

As with all participant-reported data, reliability is an important concern. Despite asking numerous locations for which participants may not readily know an address, we found good agreement between the reported locations and other reported characteristics of those locations. These results are similar to the results of a prior validation of this tool for home and treatment locations among HIV-positive MSM [32]. Participants generally placed home locations within the correct ZIP code and placed other types of locations near the probable true location. This finding suggests that, although precise measures should be used with caution, within-person and between-person relative measures are likely appropriate.

Our findings with respect to answering specific questions contrast with past studies of broader Web-based survey participation. These studies found differential participation in Web-based surveys by demographic and health-related factors. Nonurban MSM have participated in Web-based surveys more than their urban counterparts [34,35]. In addition, individuals with a given medical condition are more likely to participate in research about that condition [36], suggesting that HIV-negative men could have been less likely to provide the requested data compared with men living with HIV.

Compared with previous validation studies [32], this analysis has expanded both the population and types of locations for which valid Web-based map data may be collected. We included MSM, independent of HIV status, from urban, suburban, and rural locations, not only large urban areas that are the typical geographic focus of much HIV research. We also included a wide variety of nonresidential locations that may be contextually important to the health of MSM.

As this study verified that these nonresidential location data can be collected from online samples of MSM, these locations may now be used to describe the activity spaces of MSM and to explore associations between nonresidential places and HIV-related behaviors among MSM. This Web-based tool will permit these location data to be collected using relatively low-resource methods that preserve participants’ anonymity. The results of future analyses may allow us to better consider how differing contexts are associated with HIV risk and prevention. National surveillance data, which are based on residential locations, may be interpreted differently depending on the spatial variation in HIV-related behaviors. In addition, future analyses may permit interventions and policy to be geographically targeted using the locations of relevant behaviors, rather than residential locations.


Despite the breadth of data being collected, this study does have limitations. First, the generalizability of this study may be limited. Our online convenience sample is likely not representative of MSM in Texas, Wisconsin, and Georgia. Our sample is less racially diverse, younger, and more educated compared with the general populations in these states. In addition, despite the breadth of HIV epidemiology, demographic, and contextual factors represented by these states, these MSM may not be representative of MSM across the United States. However, prior studies using venue-time–based sampling of MSM reported demographics similar to this study and to the Internet samples of MSM [37,47,49].

This analysis produced fully-adjusted measures of association for a large number of outcomes and their potential predictors. Consequently, some of these measures may be statistically significant due to type 2 error.

This analysis also used participant-reported ZIP codes as the basis for poverty and urbanicity measures. The use of areas to represent contextual variables may lead to misclassification, especially when using ZCTAs to represent ZIP codes [50,51]. The degree of this misclassification may be less in more urban areas [52-55], although this was not true in our predominantly urban sample. However, ZIP codes are a geographic measure that is readily accessible to participants and are therefore useful despite their limitations.

This study also was unable to validate all locations using a physical address. With our study’s expansion to locations that include where individuals socialized and had sex, validation becomes more difficult as participants may not readily know addresses of these nonresidential locations. Consequently, data reliability could be assessed only using the methods we used. In addition, the text descriptions of these places were useful for only half of locations, limiting conclusions regarding reliability of the remaining half of locations. It is possible that the half of locations that could be validated may have favorably biased the calculated accuracy. Additional validation of geographic reliability may be the subject of future work.


Using a Web-based map tool, MSM participants were generally willing and able to indicate all requested locations. Critically, although most MSM reported sex locations, these locations were reported less frequently than all other locations. Consequently, within this Web-based setting and MSM population (and with careful consideration of the potential biases associated with Web-based research in this population), this method of data collection is feasible, resulting in highly complete, good quality location data.


This work is made possible by the generosity and willingness of our study participants. This work was supported by the following: MAC AIDS Fund, Emory University Center for AIDS Research (P30AI050409), National Institute of Mental Health (F31MH107343-01).

Conflicts of Interest

None declared.

  1. Cummins S, Curtis S, Diez-Roux AV, Macintyre S. Understanding and representing 'place' in health research: a relational approach. Soc Sci Med 2007 Nov;65(9):1825-1838. [CrossRef] [Medline]
  2. Centers for Disease Control and Prevention. Atlanta, GA; 2015 Feb. HIV surveillance report: Diagnoses of HIV infection in the United States and dependent areas, 2013   URL: [accessed 2016-05-24] [WebCite Cache]
  3. Buot MG, Docena JP, Ratemo BK, Bittner MJ, Burlew JT, Nuritdinov AR, et al. Beyond race and place: distal sociological determinants of HIV disparities. PLoS One 2014;9(4):e91711 [FREE Full text] [CrossRef] [Medline]
  4. Frye V, Latka MH, Koblin B, Halkitis PN, Putnam S, Galea S, et al. The urban environment and sexual risk behavior among men who have sex with men. J Urban Health 2006 Mar;83(2):308-324 [FREE Full text] [CrossRef] [Medline]
  5. Hixson BA, Omer SB, del RC, Frew PM. Spatial clustering of HIV prevalence in Atlanta, Georgia and population characteristics associated with case concentrations. J Urban Health 2011 Feb;88(1):129-141 [FREE Full text] [CrossRef] [Medline]
  6. Vaughan AS, Rosenberg ES, Sullivan PS. Spatial relationships between gay stigma, poverty, and HIV infection among black and white men who have sex with men in Atlanta. AIDS Res Hum Retroviruses 2014;30(8):740-741. [Medline]
  7. Laraque F, Mavronicolas HA, Robertson MM, Gortakowski HW, Terzian AS. Disparities in community viral load among HIV-infected persons in New York City. AIDS 2013 Aug 24;27(13):2129-2139. [CrossRef] [Medline]
  8. Krieger N, Chen JT, Waterman PD, Rehkopf DH, Subramanian SV. Painting a truer picture of US socioeconomic and racial/ethnic health inequalities: the Public Health Disparities Geocoding Project. Am J Public Health 2005 Feb;95(2):312-323. [CrossRef] [Medline]
  9. Bowleg L, Neilands TB, Tabb LP, Burkholder GJ, Malebranche DJ, Tschann JM. Neighborhood context and Black heterosexual men's sexual HIV risk behaviors. AIDS Behav 2014 Nov;18(11):2207-2218. [CrossRef] [Medline]
  10. Jeffries WL, Marks G, Lauby J, Murrill CS, Millett GA. Homophobia is associated with sexual behavior that increases risk of acquiring and transmitting HIV infection among black men who have sex with men. AIDS Behav 2013 May;17(4):1442-1453. [CrossRef] [Medline]
  11. Ayala G, Bingham T, Kim J, Wheeler DP, Millett GA. Modeling the impact of social discrimination and financial hardship on the sexual risk of HIV among Latino and Black men who have sex with men. Am J Public Health 2012 May;102(S2):S242-S249. [CrossRef] [Medline]
  12. Carpiano RM, Kelly BC, Easterbrook A, Parsons JT. Community and drug use among gay men: the role of neighborhoods and networks. J Health Soc Behav 2011 Mar;52(1):74-90. [CrossRef] [Medline]
  13. Buttram ME, Kurtz SP. Risk and protective factors associated with gay neighborhood residence. Am J Mens Health 2013;7(2):110-118 [FREE Full text] [CrossRef] [Medline]
  14. Koblin BA, Egan JE, Rundle A, Quinn J, Tieu H, Cerdá M, et al. Methods to measure the impact of home, social, and sexual neighborhoods of urban gay, bisexual, and other men who have sex with men. PLoS One 2013;8(10):e75878 [FREE Full text] [CrossRef] [Medline]
  15. Duncan DT, Kapadia F, Halkitis PN. Examination of spatial polygamy among young gay, bisexual, and other men who have sex with men in New York City: the P18 cohort study. Int J Environ Res Public Health 2014;11(9):8962-8983 [FREE Full text] [CrossRef] [Medline]
  16. Verghese A, Berk SL, Sarubbi F. Urbs in rure: human immunodeficiency virus infection in rural Tennessee. J Infect Dis 1989 Dec;160(6):1051-1055. [Medline]
  17. Lansky A, Nakashima AK, Diaz T, Fann SA, Conti L, Herr M, et al. Human immunodeficiency virus infection in rural areas and small cities of the southeast: contributions of migration and behavior. J Rural Health 2000;16(1):20-30. [Medline]
  18. Tobin KE, Latkin CA, Curriero FC. An examination of places where African American men who have sex with men (MSM) use drugs/drink alcohol: a focus on social and spatial characteristics. Int J Drug Policy 2014 May;25(3):591-597 [FREE Full text] [CrossRef] [Medline]
  19. Oster AM, Wejnert C, Mena LA, Elmore K, Fisher H, Heffelfinger JD. Network analysis among HIV-infected young black men who have sex with men demonstrates high connectedness around few venues. Sex Transm Dis 2013 Mar;40(3):206-212. [CrossRef] [Medline]
  20. Krieger N. Theories for social epidemiology in the 21st century: an ecosocial perspective. Int J Epidemiol 2001 Aug;30(4):668-677 [FREE Full text] [Medline]
  21. Adimora AA, Schoenbach VJ. Social context, sexual networks, and racial disparities in rates of sexually transmitted infections. J Infect Dis 2005 Feb 1;191(S1):S115-S122 [FREE Full text] [CrossRef] [Medline]
  22. Latkin CA, German D, Vlahov D, Galea S. Neighborhoods and HIV: a social ecological approach to prevention and care. Am Psychol 2013;68(4):210-224 [FREE Full text] [CrossRef] [Medline]
  23. Baral S, Logie CH, Grosso A, Wirtz AL, Beyrer C. Modified social ecological model: a tool to guide the assessment of the risks and risk contexts of HIV epidemics. BMC Public Health 2013;13(1):482 [FREE Full text] [CrossRef] [Medline]
  24. Rothenberg R. Maintenance of endemicity in urban environments: a hypothesis linking risk, network structure and geography. Sex Transm Infect 2007 Feb;83(1):10-15 [FREE Full text] [CrossRef] [Medline]
  25. Matthews SA, Yang T. Spatial Polygamy and Contextual Exposures (SPACEs): Promoting Activity Space Approaches in Research on Place and Health. Am Behav Sci 2013 Aug 1;57(8):1057-1081 [FREE Full text] [CrossRef] [Medline]
  26. Zenk SN, Schulz AJ, Matthews SA, Odoms-Young A, Wilbur J, Wegrzyn L, et al. Activity space environment and dietary and physical activity behaviors: a pilot study. Health Place 2011 Sep;17(5):1150-1161 [FREE Full text] [CrossRef] [Medline]
  27. Boruff BJ, Nathan A, Nijënstein S. Using GPS technology to (re)-examine operational definitions of 'neighbourhood' in place-based health research. Int J Health Geogr 2012;11:22 [FREE Full text] [CrossRef] [Medline]
  28. Chaix B, Méline J, Duncan S, Merrien C, Karusisi N, Perchoux C, et al. GPS tracking in neighborhood and health studies: a step forward for environmental exposure assessment, a step backward for causal inference? Health Place 2013 May;21:46-51. [CrossRef] [Medline]
  29. Rainham D, McDowell I, Krewski D, Sawada M. Conceptualizing the healthscape: contributions of time geography, location technologies and spatial ecology to place and health research. Soc Sci Med 2010 Mar;70(5):668-676. [CrossRef] [Medline]
  30. Delaney KP, Kramer MR, Waller LA, Flanders WD, Sullivan PS. Using a geolocation social networking application to calculate the population density of sex-seeking gay men for research and prevention services. J Med Internet Res 2014;16(11):e249 [FREE Full text] [CrossRef] [Medline]
  31. Duncan DT, Kapadia F, Regan SD, Goedel WC, Levy MD, Barton SC, et al. Feasibility and Acceptability of Global Positioning System (GPS) Methods to Study the Spatial Contexts of Substance Use and Sexual Risk Behaviors among Young Men Who Have Sex with Men in New York City: A P18 Cohort Sub-Study. PLoS One 2016 Feb;11(2):e0147520 [FREE Full text] [CrossRef] [Medline]
  32. Dasgupta S, Vaughan AS, Kramer MR, Sanchez TH, Sullivan PS. Use of a Google Map Tool Embedded in an Internet Survey Instrument: Is it a Valid and Reliable Alternative to Geocoded Address Data? JMIR Res Protoc 2014;3(2):e24 [FREE Full text] [CrossRef] [Medline]
  33. van Gelder MHJ, Bretveld RW, Roeleveld N. Web-based questionnaires: the future in epidemiology? Am J Epidemiol 2010 Dec 1;172(11):1292-1298 [FREE Full text] [CrossRef] [Medline]
  34. Sanchez TH, Sineath RC, Kahle EM, Tregear SJ, Sullivan PS. The annual American Men's Internet Survey of behaviors of men who have sex with men in the United States: Protocol and key indicators report 2013. JMIR Public Heal Surveill 2015;1(1):e3. [CrossRef]
  35. Horvath KJ, Rosser B R Simon, Remafedi G. Sexual risk taking among young internet-using men who have sex with men. Am J Public Health 2008 Jun;98(6):1059-1067. [CrossRef] [Medline]
  36. Galea S, Tracy M. Participation rates in epidemiologic studies. Ann Epidemiol 2007 Sep;17(9):643-653. [CrossRef] [Medline]
  37. Hernandez-Romieu AC, Sullivan PS, Sanchez TH, Kelley CF, Peterson JL, Del RC, et al. The comparability of men who have sex with men recruited from venue-time-space sampling and facebook: a cohort study. JMIR Res Protoc 2014;3(3):e37 [FREE Full text] [CrossRef] [Medline]
  38. Centers for Disease Control and Prevention. 2015. HIV Among Youth Internet   URL: [accessed 2016-02-26] [WebCite Cache]
  39. Centers for Disease Control and Prevention. 2015. HIV Among People Aged 50 and Older Internet   URL: [accessed 2016-02-26] [WebCite Cache]
  40. U.S. Census Bureau. 1995. Poverty Areas Internet   URL: [accessed 2016-02-26] [WebCite Cache]
  41. Ingram DD, Franco SJ. 2013 NCHS Urban-Rural Classification Scheme for Counties. Vital Health Stat 2 2014 Apr(166):1-73 [FREE Full text] [Medline]
  42. Graubard BI, Korn EL. Predictive margins with survey data. Biometrics 1999 Jun;55(2):652-659. [Medline]
  43. Bieler GS, Brown GG, Williams RL, Brogan DJ. Estimating model-adjusted risks, risk differences, and risk ratios from complex survey data. Am J Epidemiol 2010 Mar 1;171(5):618-623 [FREE Full text] [CrossRef] [Medline]
  44. McNutt L, Wu C, Xue X, Hafner JP. Estimating the relative risk in cohort studies and clinical trials of common outcomes. Am J Epidemiol 2003 May 15;157(10):940-943 [FREE Full text] [Medline]
  45. Williamson T, Eliasziw M, Fick GH. Log-binomial models: exploring failed convergence. Emerg Themes Epidemiol 2013;10(1):14 [FREE Full text] [CrossRef] [Medline]
  46. R-project. Vienna, Austria: R Foundation for Statistical Computing; 2015. A Language and Environment for Statistical Computing   URL: [accessed 2016-05-24] [WebCite Cache]
  47. Sullivan PS, Khosropour CM, Luisi N, Amsden M, Coggia T, Wingood GM, et al. Bias in online recruitment and retention of racial and ethnic minority men who have sex with men. J Med Internet Res 2011;13(2):e38 [FREE Full text] [CrossRef] [Medline]
  48. Westreich D. Berkson's bias, selection bias, and missing data. Epidemiology 2012 Jan;23(1):159-164 [FREE Full text] [CrossRef] [Medline]
  49. Sullivan PS, Peterson J, Rosenberg ES, Kelley CF, Cooper H, Vaughan A, et al. Understanding racial HIV/STI disparities in black and white men who have sex with men: a multilevel approach. PLoS One 2014;9(3):e90514 [FREE Full text] [CrossRef] [Medline]
  50. Krieger N, Waterman P, Chen JT, Soobader M, Subramanian SV, Carson R. Zip code caveat: bias due to spatiotemporal mismatches between zip codes and US census-defined geographic areas--the Public Health Disparities Geocoding Project. Am J Public Health 2002 Jul;92(7):1100-1102. [Medline]
  51. Duncan DT, Kawachi I, Subramanian SV, Aldstadt J, Melly SJ, Williams DR. Examination of how neighborhood definition influences measurements of youths' access to tobacco retailers: a methodological note on spatial misclassification. Am J Epidemiol 2014 Feb 1;179(3):373-381 [FREE Full text] [CrossRef] [Medline]
  52. Healy MA, Gilliland JA. Quantifying the magnitude of environmental exposure misclassification when using imprecise address proxies in public health research. Spat Spatiotemporal Epidemiol 2012 Apr;3(1):55-67. [CrossRef] [Medline]
  53. Boone JE, Gordon-Larsen P, Stewart JD, Popkin BM. Validation of a GIS facilities database: quantification and implications of error. Ann Epidemiol 2008 May;18(5):371-377 [FREE Full text] [CrossRef] [Medline]
  54. Bonner MR, Han D, Nie J, Rogerson P, Vena JE, Freudenheim JL. Positional accuracy of geocoded addresses in epidemiologic research. Epidemiology 2003 Jul;14(4):408-412. [CrossRef] [Medline]
  55. Vieira VM, Howard GJ, Gallagher LG, Fletcher T. Geocoding rural addresses in a community contaminated by PFOA: a comparison of methods. Environ Health 2010;9:18 [FREE Full text] [CrossRef] [Medline]

CAI: condomless anal intercourse
GPS: global positioning systems
HIV: human immunodeficiency virus
MSM: men who have sex with men
POR: prevalence odds ratio
PR: prevalence ratio
STI: sexually transmitted infection
ZCTA: zip code tabulation area

Edited by G Eysenbach; submitted 26.02.16; peer-reviewed by N Lachowsky, L Salazar, S Stahlman, J Grey; comments to author 21.04.16; revised version received 22.04.16; accepted 10.05.16; published 09.06.16


©Adam S Vaughan, Michael R. Kramer, Hannah LF Cooper, Eli S. Rosenberg, Patrick S Sullivan. Originally published in the Journal of Medical Internet Research (, 09.06.2016.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on, as well as this copyright and license information must be included.