Finding Collaborators: Toward Interactive Discovery Tools for Research Network Systems

doi:10.2196/jmir.3444

Original Paper

¹Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, United States

²Center for Biomedical Informatics, Regenstrief Institute, Indianapolis, IN, United States

³School of Medicine, Indiana University, Indianapolis, IN, United States

⁴Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, United States

Corresponding Author:

Charles D Borromeo, MS

Department of Biomedical Informatics

University of Pittsburgh

5607 Baum Blvd

Pittsburgh, PA, 15206

United States

Phone: 1 4126487898

Fax:1 4126245310

Email: chb69@pitt.edu

Background: Research networking systems hold great promise for helping biomedical scientists identify collaborators with the expertise needed to build interdisciplinary teams. Although efforts to date have focused primarily on collecting and aggregating information, less attention has been paid to the design of end-user tools for using these collections to identify collaborators. To be effective, collaborator search tools must provide researchers with easy access to information relevant to their collaboration needs.

Objective: The aim was to study user requirements and preferences for research networking system collaborator search tools and to design and evaluate a functional prototype.

Methods: Paper prototypes exploring possible interface designs were presented to 18 participants in semistructured interviews aimed at eliciting collaborator search needs. Interview data were coded and analyzed to identify recurrent themes and related software requirements. Analysis results and elements from paper prototypes were used to design a Web-based prototype using the D3 JavaScript library and VIVO data. Preliminary usability studies asked 20 participants to use the tool and to provide feedback through semistructured interviews and completion of the System Usability Scale (SUS).

Results: Initial interviews identified consensus regarding several novel requirements for collaborator search tools, including chronological display of publication and research funding information, the need for conjunctive keyword searches, and tools for tracking candidate collaborators. Participant responses were positive (SUS score: mean 76.4%, SD 13.9). Opportunities for improving the interface design were identified.

Conclusions: Interactive, timeline-based displays that support comparison of researcher productivity in funding and publication have the potential to effectively support searching for collaborators. Further refinement and longitudinal studies may be needed to better understand the implications of collaborator search tools for researcher workflows.

J Med Internet Res 2014;16(11):e244

doi:10.2196/jmir.3444

Keywords

translational medical research; cooperative behavior; interprofessional relations; interdisciplinary studies; information systems; information services

Building collaborative research teams is a critical challenge for biomedical scientists. Interdisciplinary research teams provide a breadth of expertise [1,2], shared workload [2,3], and greater advocacy for breakthroughs [2], often resulting in more frequent citations [4]. However, identifying appropriate collaborators is often difficult, particularly for junior investigators who lack extensive personal networks [5]. Research networking systems (RNSs) that model researcher activity, expertise, and collaborations have been developed to facilitate collaborator searches [6-9], particularly via federated search tools that provide preliminary demonstrations of cross-institution search facilities [10]. Emerging reports of RNS usage provide preliminary evidence of search and navigation patterns extracted from usage logs with deployed RNSs [11,12], but relatively little insight into how search tools should be designed to support the process of collaborator searches. The goal of this study was to conduct an iterative design and qualitative inquiry process to better understand scientists’ needs and workflows, and how they might best be supported by software tools. These efforts led to the development of a functional prototype collaborator search tool, which was evaluated in a preliminary usability study.

Identifying collaborators is a time-consuming process that does not scale well [6]. Researchers seeking collaborators often want to find new collaborators through existing contacts, who can provide useful feedback on the suitability of potential collaborators for their colleagues [6]. Although this approach might be effective for senior scientists with well-established personal contacts, junior researchers often lack personal contacts with potential collaborators [6]. Geographic separation is also a potential concern for evaluating potential collaborators, particularly given experience demonstrating the importance of physical proximity for research groups [13,14].

Identifying appropriate collaborators for team and translational science was one of the key motivations for the emergence of RNSs. As social networks for scientists, RNSs organize researchers’ interests, publications, funding, and collaborators in navigable formats designed to publicize research activity and support discovery of needed expertise. An assortment of commercial and academic RNSs provides a range of functionality, such as Digital Vita’s ability to populate National Institutes of Health (NIH) biosketches from RNS data [9]. Academic RNSs are typically deployed separately at individual research institutions [7], with localized navigation and search tools. Currently, one of the most prominent is the VIVO system [8], which provides a detailed semantic metadata model for describing researchers. Other notable tools include Harvard’s Profiles [15] and commercial tools, such as SciVal [16] and ResearchGate [17].

Concerns about the limitations of restricting searches to single institutions have led to the development of broader search tools. Direct2Experts uses a standard application programming interface convention to provide a federated multi-institution search interface [10]. Although Direct2Experts returns result counts that allow comparison across institutions, results are presented in their native form as provided by each institution. This lack of common formats limits opportunities for comparison and contrast. The VIVO platform’s use of semantic Resource Description Framework (RDF) markup and linked open data provides the possibility of cross-institutional searches, but this functionality is not well supported in current interfaces. The VIVO Searchlight browser plugin [18] demonstrates a possible approach to increasing the utility of RNS data by supporting links to individual VIVO profiles from multiple institutions through commonly used Web resources, such as PubMed entries [19,20].

Preliminary reports from institutional RNSs provide some insight into usage patterns and user goals. An analysis of 5 months of log data from an RNS at Columbia University found differences in usage patterns across user classes, with faculty performing more keyword searches than administrators [12]. A similar log-based analysis at the University of California, San Francisco, found that search engines were the source of almost 75% of initial visits, the number of return visitors increased over time, and that return visitors accessed a higher number of pages/visit compared to first-time users [11].

Relatively little attention has been paid to understanding how information tools might best support the process of searching for collaborators. Techniques such as contextual design [21] and scenario-based design [22] that rely on task modeling and work observation might be used to develop models of researcher goals, needs, and workflows, but the nature of collaborator search complicates these matters. As an occasional ad hoc task that generally lacks focused support from software tools, collaborator search use is not well suited for direct observation. This problem is particularly acute for RNS use. Given the incomplete penetrance of RNS systems [7] and a perceived lack of “critical mass” of participation for institutions where RNS systems have been deployed [23], ongoing use of these tools by researchers may be somewhat limited.

Preliminary investigations of user needs have identified some recurring themes in information needs and workflows. Schleyer et al [9] conducted retrospective interviews aimed at identifying researcher requirements for collaboration search tools, identifying themes such as compatibility of personal styles, rich communication needs including details beyond publications, high-quality data, and the importance of personal networks for the identification of collaborators. Bhavnani et al [24] conducted a qualitative study of researcher needs for tools for both collaboration identification and resource discovery, identifying the need for federated information, facilities for managing large volumes of information, and “humanized computing” tools that would favor user-controlled tools over algorithmic approaches that might use opaque processes to identify suggested resources. These suggestions are consistent with the observation from Boland et al [12] that different classes of RNS users may have different goals and workflows.

The goal of this study was to move beyond these descriptions of broad classes of user needs to explore specific features and designs, and to use these investigations to develop further understanding of user goals and preferences. Specifically, paper prototypes were used to elicit comments from researchers regarding their perceptions of preferences for interactive collaborative search tools. Qualitative analysis of responses to these prototypes was used to identify recurring requirements. These requirements informed the design of a functional prototype collaboration search tool, which was developed to provide preliminary evaluation of the feasibility and usability of interactive collaboration search tools. Results from these inquiries provided preliminary validation of the tool design while identifying areas of concerns that might need to be addressed in subsequent redesigns.

Summary

This study used a combination of prototyping, qualitative inquiry, and software development. Initial designs of paper prototypes were based on findings from earlier studies [9]. Semistructured interviews with potential users [25] provided qualitative feedback, including reactions to the paper prototypes. These responses were analyzed to identify specific requirements, which were used to drive the design and implementation of a functional prototype. This prototype was evaluated through a second set of qualitative interviews with potential users (Figure 1).

Paper Prototypes and Requirements Analysis

Overview

The goal of the first inquiry was to explore user requirements for collaboration search tools. Pilot studies presented a conundrum: potential users were likely to be unfamiliar with the notion of collaborator search tools because of the relatively low adoption rates of RNSs. To effectively elicit participant input, we developed 2 paper prototypes illustrating hypothetical interfaces for collaborator search tools. We use the term “paper” here to informally refer to low-fidelity, nonfunctional prototypes. Using multiple prototypes provided the freedom to consider designs that covered a variety of perspectives on relevant information and to present participants with a range of options that might elicit more detailed feedback [26].

Prototype 1: Personal Contacts Search

Researchers often seek new collaborators through existing contacts [6]. The first prototype explored the possibility of using prior contacts from an external source such as an email contact list to begin a collaborator search (Figure 2). These contacts would then be matched to publications and author information found within an RNS.

Use of this tool begins with importing email contacts. Users then use keyword searches to explore topics of interest. These keyword searches leverage RNS publication and grant data, identifying possible collaborators who have relevant publications. Potential matches are listed in rows on the screen. Information about each candidate is arranged in chronological order along a horizontal timeline. Publications are marked with color codes to indicate individuals who are on the imported contact list, geographically close (within 10 miles of the user), and/or marked as interest for further follow-up. Checkbox filter selections can be used to filter out items based on any of the color-coded categories.

For candidates not found on the user’s contact list, coauthorship information can be used to identify current contacts who might have coauthored papers with them (Figure 3).

Figure 2. Personal contacts search: contact import screen. This screen allows users to import existing contacts from an external source (eg, email) and these contacts are then matched against publication data from the RNS.

Figure 3. Personal contacts search results. This screen shows a list of collaborators who have published on the topic “genomics”. Their publications are color-coded: red indicates institutions within 10 miles of the user, blue means the author was on the user’s contact list, green means the user has marked the collaborator for future contact, and gray is the default color. Circles can be coded with blended colors to indicate multiple categories. Thus, purple indicates nearby (red) authors on the contact list (blue). The circles are sized to indicate the number of citations. The user has selected a paper and their relationship with the author is displayed.

Prototype 2: Collaborator Attribute Search

Seniority can play a major role in collaborator search: junior researchers often seek junior collaborators, perhaps because more senior researchers often decline collaboration requests [5,24]. The second prototype uses a 2-step approach to support the use of seniority in identifying candidates. The use of this tool begins with the identification of a potential collaborator in an RNS, perhaps through browsing lists of participants. The data for this individual is used to formulate a “profile” for subsequent searches, quantifying different aspects of a researcher’s history (eg, overall number of publications, grant funding) into measures that will be used for subsequent comparisons against other candidates. The user can then search the system for a topic of interest based on research keywords similar to those used in the first prototype.

Similar profiles are then computed for each candidate returned by the topic search and compared to the selected profile. The candidates who are most similar to the selected profile are shown on the screen. Thus, initial selection of a profile of a junior researcher might bias subsequent results to favor other junior researchers (Figure 4).

Search results are shown in a table containing researchers’ names, institutions, total number of publications, number of publications matching the search term, the number of years of active publication (a proxy for seniority), an estimate of total research funding (based on grant information), and keywords summarizing their primary research interests. Interactive double-thumb sliders provide the ability to set upper and lower bounds on the attributes in the table (Figure 5) with histograms on the slider providing a display of the distribution of the given values across the currently active candidate profiles [27].

This prototype also differs from the first (Figure 3) in terms of both information provided and the representation of that information. Where the first prototype provides chronologically oriented feedback in graphical form along with contact-based information and geographic hints, the second provides tabular aggregate data. The collaborator attribute search prototype also provides affiliation information and additional matching keywords not available in the personal attributes search design. A summary of key features of the 2 prototypes is given in Table 1.

These prototypes were used to elicit feedback from potential users, including both general preferences for collaborator search tools and specific responses to specific design features. Participant sessions consisted of a structured interview and unstructured discussion of the prototypes. The structured interview included questions concerning demographics, social networking applications usage, and workflows for finding collaborators (interview questions are given in Multimedia Appendix 1). Participants were asked to respond to all questions that they felt were applicable to their work. The interviewer then described and presented each of the prototypes to the participants, using several screens that simulated possible uses of each system. Participants were asked to identify features of the prototypes that they thought would be particularly useful, to note features that appeared to be less worthwhile, and to describe new features that they might like to see added. Finally, they were asked to provide overall impressions, considering both of the prototypes. Each participant saw both of the prototypes with the order of presentation of the prototypes varied between participants.

Sessions were conducted online using the WebEx Web conferencing tool [28], which was used to present the prototype screens to the participants and to record the screenshots and audio from the sessions. Descriptive statistics were used to characterize participant background, education, and collaborator search behavior. Audio and screen capture recordings of the sessions were analyzed and coded using an open-coding approach [25,29]. Specifically, 1 author (CB) reviewed the audio recordings using descriptive codes to classify participant comments including reactions to the prototypes, statements about collaboration finding practices, preferences/requirements for collaboration finding software, etc. Initial codes were chosen based on content of the interactions and eventually categorized as patterns emerged. Higher-level themes identified during this process formed the basis for categorizing requirements for the functional prototypes. A second author (HH) reviewed all codes and categorization. This study was classified as exempt by the University of Pittsburgh Institutional Review Board, Study #PRO12060527.

Table 1. Feature comparison of both Phase 1 prototypes.

Functionality	Prototype 1: personal contacts search	Prototype 2: collaborator attribute search
Search mechanism	Keyword search and link to imported contacts	Browse/search for initial profile, keyword search identifies researchers with similar profiles
Display	Timeline with color-coded glyphs for publications	Tabular grids with aggregate displays of publications, grant funding, institutions, and other keywords
Controls	None	Interactive controls for selecting similarity values for publications, grants, and other values

Figure 4. Collaborator attribute search: selecting a profile. A selected profile (“John Logan”) forms the basis for a similarity search (“juvenile diabetes”) that constrains the candidates returned by subsequent keyword queries. Selecting the profile of a junior researcher might bias results of subsequent searches toward junior researchers.

Figure 5. Collaborator attribute search with dynamic filters. The sliders on the publication counts and years are double-sided allowing researchers to restrict the criteria in either direction. The sliders can adjust publication counts, publication years, funding, and number of publications related to the chosen topic (juvenile diabetes) Histograms on the sliders display distribution of the possible values across items in the currently active set [27].

Functional Prototype Development

Although paper prototypes can provide useful formative feedback for workflow and interface designs, static representations may fail to convey the dynamic nature of interactive tools. A functional prototype was implemented to provide a working example of a tool designed to satisfy the requirements derived from the initial qualitative inquiries. A Virtuoso Open-Source Edition triple store [30] was used to store RDF-formatted VIVO [8] data from the University of Florida and Weill Cornell Medical College. Data from the triple store was retrieved through SPARQL Protocol and RDF Query Language (SPARQL) [31] queries. The Web-based prototype was developed using the D3 library [32], which uses scalable vector graphics and JavaScript to create interactive data visualizations. JavaScript code developed for the prototype issued SPARQL queries against the Virtuoso triple store, passing the results to the D3 library for visualization. The system architecture is illustrated in Figure 6.

Functional Prototype Evaluation

Evaluation of the functional prototype involved asking participants to use the tool to conduct collaborator search tasks. Each participant session began with a series of questions similar to those used in Phase 1 (see Multimedia Appendix 2 for all questions). The participant then completed 2 collaborator search tasks, 1 using the prototype and the other using their choice of online search engines and repositories such as PubMed. Because the alternative online tools did not provide a directly comparable experience, they were used only to provide contrast to the prototype tool and we do not discuss these interactions here. One task asked participants to find collaborators familiar with Alzheimer’s disease, the other specified researchers in Parkinson’s disease. These were chosen to be fairly broad to avoid dependence on user expertise and to minimize risk of bias associated with participant familiarity with the research field. The order of both tasks and tools was varied across participants.

After completing the tasks, participants were interviewed regarding the impressions of the prototype. Participant responses to the tool were evaluated using the System Usability Scale (SUS) [33,34]. Additional Likert scale questions asked participants to respond to key features of the prototype on a Likert scale (1-5, 5 being best). Interviews were conducted via WebEx and demographic and search behavior data were analyzed as in the earlier phase of the study.

Paper Prototypes and Requirements Analysis