Evaluating diﬀerent recruitment methods in a longitudinal survey: Findings from the pan-European PASTA project (Preprint)

are core requirements in empirical data analyses. Combining opportunistic recruitment with an online survey and data collection platform yields new benefits compared to traditional recruitment approaches. Objective: to recruitment methods to obtain participants’ characteristics, participation behavior, recruitment rates, and representativeness of the sample. than those who are selected randomly and may not have a strong connection to the research topic. Whereas direct face-to-face contacts were very effective with respect to the number of recruited participants; recruiting people through social media was not only effective, but also very time-efficient. The collected data is based on one of the largest recruited longitudinal samples with a common recruitment strategy in different European cities.


Introduction
Recruiting participants has become increasingly challenging in the face of privacy concerns, increasing number of surveys, expectations of rewards for survey completion and necessary effort to achieve an unbiased representative sample [1]. Sufficient sample sizes and minimal sample bias are core requirements in empirical data collection in order to be able to answer research questions properly [2]. Meeting these requirements is especially challenging for surveys with high response burdens such as longitudinal studies [3]. Traditional recruitment methods such as mailed invitations based on samples drawn from population registries are costly and increasingly yield sample biases due to declining response rates and selectivity effects [4][5][6][7], e.g. due to an increase in exclusive use of mobile phones or email rather than traditional mail or landline phone. Opportunistic approaches such as recruitment via social media promise cost savings [4,[7][8][9][10][11][12] and a better coverage of person groups that are hard to reach with the traditional recruitment methods, like parents of adolescents [8], adolescents themselves [10], people with special conditions [13,14], smokers [15] or low-income people [16]. Social media, like Facebook and Twitter, can potentially have a strong snowballing effect [17] given their intensive use and continuing growth (around 1.94 billion monthly active Facebook users worldwide [26]), their feature that information can be shared very easily among networks [17], and are therefore able to reach a large number of people in a very short time. Combining opportunistic recruitment with an online survey and data collection platform comprises additional benefits such as real-time monitoring of recruitment progress and enabling ongoing optimization of recruitment activities [4]. Poor response rates can further be improved by including rewards for participation such as financial compensations [18]. One major drawback of the opportunistic recruitment methods is the concern of sample bias as the population reached thereby does not necessarily represent each group of the total population equally well [10]. Specific types of social media might, for example, be preferably used by younger people [3,7,11] and not by elderly. In this case, elderly have little chance to be sampled at all. Other, more traditional methods such as flyers have the problem of respondents not having a direct and convenient access to the survey and may result in a smaller recruitment rate [19]. The PASTA Project (Physical Activity through Sustainable Transport Approaches) [20,21], used a combination of different opportunistic recruitment methods in order to utilize strengths and to minimize weaknesses. The PASTA study collected data in a longitudinal webbased survey with a cohort design to study the effects of active mobility (like walking and cycling) on overall physical activity and health, crash risks and exposure to traffic-related air pollution. Data collection was done in seven European cities: Antwerp, Barcelona, London, Oerebro, Rome, Vienna and Zurich. The target population for the PASTA survey was the entire adult population in each of the seven case study cities with the aim of oversampling participants that use the bicycle for their daily trips. The objective of this paper is to report on the success of these different methods with regards to obtaining participants' characteristics, participation behavior, recruitment rates, and representativeness of the sample. More specifically, we aim (i) to describe participant characteristics in the seven European cities in terms of number of recruited people, gender, age, education level and employment status, (ii) to show how participants found out about the survey, (iii) to illustrate participation behavior by reporting on the number of filled-in questionnaires, attrition and withdrawal rate, (iv) to present the effectiveness (number of predicted participants) and time-efficiency of different recruitment approaches, and (v) to compare our sample with the general population. Finally, we present our conclusions for the use of recruitment studies in future research on comparable topics.

Study Overview
The PASTA project used a longitudinal web-based survey that was online between November 2014 and December 2016. After a first questionnaire, which collected baseline information, participants received a follow-up questionnaire every 13 days to collect prospective data on travel behavior, levels of physical activity and traffic safety incidents [20]. During this period participants in the seven European cities were recruited on a rolling basis by using different opportunistic approaches. In order to reach a sufficient number of adult participants a standardized guide on recruitment strategy was developed for all cities. This included 1) press releases and editorials; 2) common promotional materials following the same visual identity guidelines; 3) direct targeting of local stakeholders and community groups to distribute survey information through their communication channels (like newsletters, intranet, webpages); 4) extensive use of social media, e.g. Facebook and Twitter (Figure 1), and 5) incentivizing of participation, i.e. participants entered into a prize draw if they completed a questionnaire. Their chance of winning increased with each additional completed questionnaire, except for participants in Sweden (Oerebro) where the lottery was not allowed. different dissemination materials (e.g. flyers) were translated into the local languages. Within this framework there was the flexibility to enable local initiatives and targeted city-specific recruitment, such as promoting recruitment for the project at social and cultural events. Furthermore, the city of Oerebro applied an additional random sampling approach, by contacting people aged 18-74 years by mail or phone. To ensure high data quality, several measures were put in place to reduce attrition rates, such as a user-friendly and custom-made survey platform and the automatic sending of reminder e-mails. Furthermore, participants were able to log in to the platform at any time. There they received an overview of their personal completed and open questionnaires and were able to complete unfinished questionnaires. Further, they were also given the opportunity to withdraw themselves actively from the survey if they didn't want to participate any longer. In addition to the participant`s user interface, the platform also featured a researchers` user interface and dashboard for real-time monitoring of recruitment and survey data collection. A user engagement strategy was also developed, including regular contact with the respondents, PASTA branding, regular posting on social media, and keeping the PASTA website up-to-date.
Participants were asked in the baseline questionnaire how they found out about the survey. They were given the choice between several different options, ranging from word of mouth to large-scale advertising campaigns. At the same time, all city partners kept records on their local recruitment activities to measure invested efforts, including date, category, description and invested time for each applied recruitment activity. Different categories were classified into (i) collaboration with local administration or organization (e.g. survey link on webpages, newsletters, intranet), (ii) handout of flyers at specific locations or at specific events, (iii) display of posters at specific locations, (iv) use of mailing lists, (v) advertisement in online media or print media, (vi) articles in online media, print media or magazines, (vii) oral presentations for recruitment purposes, (viii) radio or TV spots, (ix) Facebook, (x) Twitter, (xi) street recruitment and (xii) use of random sampling. Ethics approval has been obtained by the local ethics committees in the countries where the work was being conducted, and sent to the European Commission before the start of the survey.

Statistical Analysis
Standard descriptive statistics outlined overall participant characteristics and were stratified by city, gender, age, education level and employment status. To assess participation behavior, we tested the number of filled-in questionnaires by different sociodemographic characteristics by using the non-parametric Kruskal-Wallis rank sum test. Each significant result (P < 0.05) was followed by a Dunn´s test to account for significant differences within a variable. The sociodemographic characteristics of the sample (age and gender) were compared with each city census data by applying Pearson`s Chi-squared tests and size effect calculations.
To estimate the effectiveness (number of participants who started the baseline survey) of different recruitment approaches, we developed a recruits prediction model in the form of a nonlinear least squares model in those cities which provided the most comprehensive and detailed information on their local recruitment activities (Antwerp, Barcelona and Vienna). This model is based on the assumption that each recruitment activity generates an effect on the number of participants, which resembles a density function of a log-normal distribution, i.e. a steep increase and a flat decrease: where r i denotes the predicted number of responses on day i; the index c = 1,2,…,C refers to different categories of recruitment activities (such as facebook, twitter, flyer etc.); the index a = 1,2,…,A refers to particular activities of category C; e c the intrinsic effectiveness of an activity of category c; i ca the intensity of a particular activity a of category c; d the number of days elapsed since the start of the activity;  and  the location and dispersion parameter of the log-normal density function. The curve characteristics of the density function were assumed to be the same for all recruitment categories, i.e. only one set of parameters  and  was estimated for all categories. The intensity was assumed to vary in the following sense: (i) each category has it`s intrinsic (baseline) effectiveness e c , (ii) within a given category the intensity varies according to the invested effort i ca (indicated by the reported number of working hours), and (iii) in some cases with strong peaks of recruited people which could unambiguously be assigned to particular recruitment activities with exceptional success, the intensity parameter was manually increased in order to capture the success of this activity adequately.
To estimate the time-efficiency of different recruitment categories, we divided the number of predicted participants by the number of invested hours for each recruitment category. All

Sociodemographic characteristics of participants
A total of 10,691 participants were recruited over a period of 27 months in the seven European cities, ranging from 1,844 individuals in Rome to 1,356 in Zurich. In all cities, except for Rome, more women than men were recruited, with an average age of 41.9 ± 0.19 for men and 40.0 ± 0.17 for women. Most of the participants were highly educated, with 72.5 % possessing a university degree, and 26.0 % possessing a secondary education. Over 60.6 % were full-time employees, followed by 16.8 % part-time employees, 13.8 % students, and 8.9 % of people with home duties, retired or unemployed (  How participants found out about the survey Table 2 shows that the three main sources of finding-out about the survey were: workplaces or employers ( media announcements, and respondents in Barcelona, Oerebro and Vienna through outreach promotion. In terms of gender, men were more likely to be recruited via outreach promotion (19.8 %) or social media (19.1 %), while women were most likely to be recruited via their workplace (24.3 %). Whereas participants aged 30-60 could be best reached via their workplace, participants aged 20-29 and over 60 were reached most often through outreach activities. Also students and participants without employment could best be reached through outreach activities. In case of respondents without a school-leaving qualification, recruitment via their workplace or outreach activities were also the most successful approaches.

Participation rates and behavior
A total of 12,825 people registered for the survey; however, 2,134 never started the baseline questionnaire (attrition rate of 16.6 %). From the remaining 10,691 participants that started the baseline questionnaire, 8,567 finalized it (additional attrition rate of 19.9 %). The attrition rates between people who registered, started and finalized the baseline questionnaire varied between the cities, with the lowest rates in Antwerp and the highest in Oerebro and London (Figure 3). The number of filled-in questionnaires per participant varied significantly between the cities (P < .001), with the highest number in Zurich (11.0 ± 0.33) and the lowest in Oerebro (4.8 ± 0.17). In almost all cities, women filled in fewer questionnaires than men (7.7 ± 0.1 compared to 8.6 ± 0.2) and also younger people or students tended to fill in fewer questionnaires than people aged 30-80 or employees and people with home duties (P < .001) ( Table 3). In total, a share of 12.2 % of participants withdrew from the survey, with the highest share in Oerebro (22.2%) and the lowest in Rome (4.4%).

Effectiveness and Efficiency of different recruitment approaches
Error: Reference source not found gives an overview of only effective recruitment categories, i.e. only activities which were able to recruit participants according to the model (see chapter 2.2). In Antwerp these were the categories: Facebook, mailing lists, collaboration with local administrations and organization, use of flyers and posters, radio spots and online advertisement. In Barcelona: Facebook, mailing lists, street recruitment and print media, and in Vienna: Facebook, online media, mailing lists, collaboration with local administrations and organizations, use of flyers and posters, and street recruitment. One of the most effective approaches in all three cities was Facebook, with more than 400 predicted participants in Antwerp and Barcelona, and over 500 in Vienna (p < .001; Error: Reference source not found). In Antwerp, most people could be reached through different mailing lists (over 900 participants) and in Barcelona through a range of street recruitment activities (over 1,000 participants). In Vienna, especially collaborations with local organizations (like the local bike sharing provider) were very effective in reaching a high share of predicted participants (e.g. peak in Figure 4c). Under consideration of invested working hours one of the most timeefficient categories in all three cities was Facebook, with more than 30 participants per invested working hour. Whereas mailing lists or online advertisement were also effective in Antwerp by reaching approximately 60 participants per invested working hour, only 1 participant could be reached per working hour in Barcelona through street recruitment.  a Columns 4 and 5 show the t-and p-values of the average parameter of a respective recruitment category; parameters of exceptional successful activities (if available for a given category) are not shown (these parameters are always highly significant, e.g. peak in Fig. 4c) b Nonlinear model does not provide an R 2 , because R 2 is not defined for non-linear models. We defined a pseudo R 2 value due to the fact that the model is reasonably close to a linear model and the sample size is big enough.

Representativeness of the sample
Compared to the cities' census data, study participants in all cities, except for Rome, were broadly representative in terms of gender distribution (p > .05). This was mainly the case if participants were informed about the survey by news, through word of mouth (friends, neighbors or relatives) or social media ( Table 5). The main difference was that our recruited sample was on average younger than the general population (high deviation to census data within the age class 60+, see table a -g, Appendix).    N = 1,431

Principal results and participation behavior
The main source of information about the survey was through workplaces or employers.
Having collaborations with different organizations who forwarded survey information to their employees (e.g. through their intranet or regular newsletters) was fundamental for raising awareness to drive recruitment. Outreach promotion by project members (e.g. direct faceto-face recruitment at different events or on public places) and the use of social media channels (Facebook and Twitter) were the next most informative activities. This was especially the case in Rome were social media was based on an account of the city council with a lot of followers.
One third of people who registered for the survey did not complete the baseline questionnaire, with the highest attrition rate in Oerebro and the lowest in Antwerp. In addition, people from Oerebro had the lowest number of filled-in questionnaires per participant and also the share of people who actively deregistered from the survey was highest in the Swedish city. One explanation may lie in the different approach from Oerebro, who also recruited participants through random sampling. The findings suggest that people that are more interested in the topic (in this case in active mobility research) are more willing to participate in a survey and are more likely to stay in the study than those who are selected randomly and may not have a strong connection to the research topic [4][5][6][7]. Therefore a (costly) random selection may eventually still lead to a biased sample that it was supposed to avoid. However, poor response rates can also be improved by including rewards for participation [18]. The high attrition and deregistration rate in Oerebro, may also be (partly) caused by the fact that participation in Oerebro was not rewarded.

Comparative Effectiveness and Representativeness of Recruitment Methods
Whereas direct face-to-face contacts (e.g. street recruitment) were very effective in terms of the number of recruited participants; recruiting people through social media (mainly Facebook) was not only effective, but also very time-efficient. Similar results were observed by others [4,10,11], who applied web based sampling in their research. Regarding to the near ubiquity of the Internet, it has become easier for people to engage in surveys [22] as it can overcome barriers such as physical distance, transportation, and limited time [23]. Nevertheless, effectiveness and time-efficiency must be balanced with how representative the resulting sample is of the target population. Compared to the general population, study participants in almost all cities were broadly representative in terms of gender distribution. Especially, reaching people through news, word of mouth or social media were the most successful options in recruiting a gender balanced sample that represented that of the city´s population. There was, however, an age bias among the applied strategies compared to the city census data. Although the cities applied different strategies to recruit also older people (e.g. by visiting seniors who recently completed computer courses), the most strategies attracted a higher proportion of younger people. Several studies, however, report successfully recruiting people that reflect the demographic spread of the general population by using opportunistic sampling approaches [9,11,17]. People in our sample were further highly educated (72.5 % possessed a university degree), which is a common occurrence for survey research [9,24,25]. Nevertheless, this study found that targeting people without a school-leaving qualification through their workplaces or outreach activities were most promising, i.e. some of the recruitment activities are better suited than others to attract hard-to-reach-groups.

Limitations & Strengths
While this study represents a comprehensive examination of different recruitment approaches in a longitudinal European-wide online survey, there were some limitations that could not be addressed by the current research design. Firstly, sociodemographic characteristics of participants recruited by the discussed methods may be different for topics other than active mobility and the corresponding health aspects. The study population in our sample is highly educated and younger than the general population. This may be due to the fact that our recruitment partly addressed cyclists, and the subject may hold particular interest to those with a higher education. A second limitation lies in the necessity of having access to the Internet to participate in an online survey, which could also explain the high proportion of young participants in our study, and third the sample was limited to the adult population. The strategies used may be more or less effective in recruiting children or adolescents.
These limitations are offset by several strengths. Firstly, we were able to recruit one of the largest longitudinal samples in different European cities with a common recruitment strategy. Secondly, we were able to shed new light on effectiveness and time-efficiency of different recruitment approaches. We now have a large and very detailed database on response behavior per recruitment method for seven different cities in Europe. We observed that offering a mixed recruitment approach was very effective in reaching a high participation rate. The resulting data base allows to answer research questions and analyze the effects of active mobility on people's health, crash risks and exposure to traffic related air pollution (e.g. for cyclists), because of its size and composition. Thus, overall the use of a mixedmethod approach has been successful.