Communicating With Patients About Software for Enhancing Privacy in Secondary Database Research Involving Record Linkage: Delphi Study

Background There is substantial prior research on the perspectives of patients on the use of health information for research. Numerous communication barriers challenge transparency between researchers and data participants in secondary database research (eg, waiver of informed consent and knowledge gaps). Individual concerns and misconceptions challenge the trust in researchers among patients despite efforts to protect data. Technical software used to protect research data can further complicate the public's understanding of research. For example, MiNDFIRL (Minimum Necessary Disclosure For Interactive Record Linkage) is a prototype software that can be used to enhance the confidentiality of data sets by restricting disclosures of identifying information during the record linkage process. However, software, such as MiNDFIRL, which is used to protect data, must overcome the aforementioned communication barriers. One proposed solution is the creation of an interactive web-based frequently asked question (FAQ) template that can be adapted and used to communicate research issues to data subjects. Objective This study aims to improve communication with patients and transparency about how complex software, such as MiNDFIRL, is used to enhance privacy in secondary database studies to maintain the public's trust in researchers. Methods A Delphi technique with 3 rounds of the survey was used to develop the FAQ document to communicate privacy issues related to a generic secondary database study using the MiNDFIRL software. The Delphi panel consisted of 38 patients with chronic health conditions. We revised the FAQ between Delphi rounds and provided participants with a summary of the feedback. We adopted a conservative consensus threshold of less than 10% negative feedback per FAQ section. Results We developed a consensus language for 21 of the 24 FAQ sections. Participant feedback demonstrated preference differences (eg, brevity vs comprehensiveness). We adapted the final FAQ into an interactive web-based format that 94% (31/33) of the participants found helpful or very helpful. The template FAQ and MiNDFIRL source code are available on GitHub. The results indicate the following patient communication considerations: patients have diverse and varied preferences; the tone is important but challenging; and patients want information on security, identifiers, and final disposition of information. Conclusions The findings of this study provide insights into what research-related information is useful to patients and how researchers can communicate such information. These findings align with the current understanding of health literacy and its challenges. Communication is essential to transparency and ethical data use, yet it is exceedingly challenging. Developing FAQ template language to accompany a complex software may enable researchers to provide greater transparency when informed consent is not possible.


Dear participant,
We thank you once again for your effort and feedback during the previous two rounds of our Delphi study.We are now moving to the 3rd and final round of the study.
The following pages contain follow up questions based on your Round 2 feedback, questions related to FAQ revisions, and a new MINDFIRL demonstration video, which we re-designed and adjusted based on your comments.In addition, we would like to gather your opinion on the FAQ document as an interactive website.
You will also find a link with a document which summarizes your votes, feedback, and comments in Round 2. To download the summary document, please click here: Link for summary document We look forward to your responses and we once again thank you for your time.

Part 1
Part 1: Revisions to Selected FAQs.
In both Round 1 and Round 2, we received feedback supporting more detail and feedback requesting that we shorten the FAQs.A number of participants thought that the below FAQ is too long.
Please tell us if you think that the FAQ would be better without paragraphs (a) and (b).
What pieces of information about me will the researchers see?
We need different information for different steps of the research process.We need identifying information only to do patient matching (see section 2.1).Additionally, we need non-identifying information only when we are using your health related data to learn more about science or medicine.We will use a software program called MINDFIRL (Minimum Necessary Disclosure For Interactive Record Linkage) to keep identifying information separate from nonidentifying information and ensure that no one can access identifying information and non-identifying information together at the same time.(for MINDFIRL, see sections 2.2 & 2.3) In many cases, patient matching is done entirely with a linkage software, like MINDFIRL, and a human may never see your identifying information because the computer is automatically matching patients without human effort.For other records where the computer is not sure, researchers must manually make the matches.
a. Who will be able to see the identifying information?
The researchers who will be doing the patient matching will have access to identifying information.Information such as your name, date of birth, marital status, and gender help distinguish you from other people.Our researchers need to access identifying information to match patient records.
We are using the MINDFIRL software (see section 2.2) to protect identifying information and prevent unnecessary privacy loss during this process.First, MINDFIRL separates identifying information from the non-identifying information.This means that, no one can access the identifying information AND the health-related data at the same time.
Second, MINDFIRL tells researchers when two records have the same identifying information without showing details.In these cases, our researchers might not need to see any specific identifying information to make a match.
MINDFIRL also tells researcher when records are highly similar without showing details.MINDFIRL shows identifying information only on an 'as needed' basis.For example, a researcher might want to see some details to know if a difference is important (e.g, to tell twins apart).This means that MINDFIRL can help catch common matching problems, such as nicknames (e.g., Pam v. Pamela) or typos (e.g., John v. Jonh), without showing the rest of your identifying information.To see how MINDFIRL works with a specific example, see Section 2.3.
b. Who will be able to see the Non-identifying information or health-related study data?
The researchers who are doing the main research will be able to see the nonidentifying, health-related data.The main research takes place after the patientmatching process and after all identifying information has been removed from the data using the MINDFIRL software.This means that, no one can access the identifying information AND the health-related data at the same time.We will code your non-identifying information to protect your identity.This allows us to use your information to make scientific or medical discoveries without knowing which information belongs to you.We received a number of comments relating to the FAQ "Does MINDFIRL reduce the quality of matched records?".Some responders thought that the answer was too complicated for lay people.Other commenters provided suggestions for describing the study results.Some people thought the FAQ was better before the revisions.
Below are some questions relating to this FAQ.
Please help us choose the best way to communicate these study results to patients.

Which do you prefer?
The people who used the MINDFIRL prototype saw 93% less identifying information.
The people who used MINDFIRL prototype only saw 7% of the identifying information.
The MINDFIRL group needed 7 pieces of information for every 100 pieces available to the other group.
The people who used MINDFIRL saw over 14 times less identifying information.Please choose which approach to this question you prefer.Please note we will update the study details based on the results of the question above.

None of these
OPTION A: Does MINDFIRL reduce the quality of matched records?Early evaluations of MINDFIRL show that it helps researchers make high-quality matches.People who used a prototype version of MINDFIRL were just as accurate as people who saw all of the identifying information.However, the people who used the MINDFIRL prototype saw 93% less identifying information.This means that people using MINDFIRL were just as good at patientmatching as people who saw everyone's identifying information even though people using MINDFIRL saw far less identifying information.OPTION B: Does MINDFIRL reduce the quality of matched records?No.One study showed that people who used an early version of MINDFIRL were just as accurate as people who saw 100% of the identifying information.However, the people who used MINDFIRL only saw 7% of the identifying information.In Round 2, we asked some questions about how you think about risk in database research.A number of the participants thought that many of these questions were difficult to understand.Based on this feedback, we re-worded the questions and have provided some additional background information that we hope will be helpful.
Below we provide background information on 1) "minimal risk" research, and 2) database research.Please read and consider this information when answering the questions below.The law that regulates research has some flexibility depending on how risky a study is.When the risks of research are high, the law requires a lengthy review process before the research can begin.If the risks are low, the law allows a quicker review process.Low-risk studies are sometimes called "minimal risk" research.Only research with "minimal risk" can go through this quick review process.The laws define "minimal risk" as the types of risk that we experience in daily life.

"Minimal Risk" Research
are some examples of "Minimal Risk" research: An identifiable survey or interview about a software (in daily life people regularly share opinions about products or experiences and sharing opinions about software is usually not risky) • Data collected through observation of public behavior • Research involving collecting simple blood tests (in daily life, patients will often get shots or injections, or routine blood work, and these tests are rarely risky or harmful) • Research requiring healthy participants to do moderate exercise (exercise is an activity experienced in daily life) • Research involving standard psychological assessments (these types of tests are not generally considered harmful) Some examples of high-risk research might include: • A survey about past criminal behavior (research participants can face jail or prison time if their answers are discovered) • Research testing a new cancer drug on patients (the drug could have unintended harmful effects) • Research where some or all of the participants could be vulnerable to coercion or undue pressure, such as children, prisoners, individuals with impaired decision-making, or other disadvantaged persons It is important to note that "minimal risk" does not mean "risk-free."Some everyday activities, like driving a car, are quite risky.As long as the risks are not greater than the types of risks experienced in daily life, the research is considered "minimal risk.""Minimal risk" research must still be approved by an Institutional Review Board (IRB).An IRB is an organization that reviews research plans to make sure the research is legal and ethical.The IRB is responsible for protecting the rights, welfare and wellbeing of research participants.However, research that is minimal risk can be reviewed and approved in a more streamlined and efficient process, and the IRB can always require specific protections to "minimal risk" research.Without this legal flexibility, low-risk research would experience significant delays and research costs would increase.

Database Research
In the questions below, we use the phrase "database research" to refer to research where: 1) The researchers use information that already exists, and 2) The information might have been collected for some other, non-research, purpose (e.g., medical records), and 3) The researchers have no contact with the individuals in the data, and 4) It would be very difficult, if not impossible, for researchers to contact the individuals in the data to get informed consent for the research.
Here are two examples of database research: • Researchers using insurance data (e.g.data from Blue Cross Blue Shield) to understand how medical costs have changed for diabetes patients over five years.
• Researchers using hospital discharge data to understand why patients might be coming back to the hospital within 30 days (e.g.living alone, age, insurance status, and diagnosis).
In each of these examples, researchers are using information that has already been collected by insurance companies or hospitals for a main purpose that is not research (i.e., treatment or billing).The researchers did not have contact with the individuals in the data, and it would be difficult, if not impossible for the researchers to contact each person to obtain informed consent.
For each sentence below, please consider how much you agree with the given statement about database research or risk level based on your of the information above.

Strongly agree Agree
Neither agree disagree Q4.1 The types of risks that a person faces in database research are not meaningfully different from the other risks a person might experience in daily life.
Q4.2 In today's technology driven world, data is everywhere, so people experience privacy and confidentiality risks daily because businesses, tech companies, and governments use and link our data for different purposes all the time.
Q4.3 Database research that involves linking different datasets using identifiable information is "minimal risk" research because the risks are not greater than the types of risks experienced in daily life.

Part 4
Part 4: Revisions to Selected FAQ, continued.
We received a lot of feedback relating to the FAQ "What difference is my data going to make?" Based on this feedback, we wrote two different versions of this FAQ.
What difference is my data going to make?
OPTION 1: Your information should help us make discoveries from this research in two ways.First, it is easier to make discoveries or find patterns if there are more records in the dataset.Second, it is important for our data to be representative of the population.If people like you are not included in the Strongly agree Agree Neither agree disagree Q4.4Even though MINDFIRL helps limit access to identifying information when linking records, all database research studies have unavoidable risks, such as a computer hacker breaching the computer system with the data.Q4.5 The unavoidable risk of a hacker breaching a research computer system is not more than the daily risk of hackers breaching computer systems with private data owned by businesses or governments.
Q4.6 The use of the MINDFIRL software will further reduce research risks of patient matching to the minimum necessary.Q4.7 Do you believe that the risks of a database study (with identifiable information) are greater than "minimal risk" for regular health data (i.e., excluding sensitive information such as HIV, mental health, or substance abuse information).
Q4.8 Do you believe that the risks of a database study (with identifiable information) are greater than "minimal risk" for sensitive health data, such as HIV, mental health or substance abuse information.
research, then what we learn will not be useful to you or others in similar situations.In other words, without your information, our findings will not be representative of you.
OPTION 2: In short, your information could help make any discoveries from this research more helpful to people like you.In research, we use information about a group of people, called a "sample," to understand things about a larger group "population."If our data is too different from the population then we cannot learn very much from the research.If people like you are not included in the research, then what we learn will not be useful to you or others like you.For example, if young adults are excluded from all studies about exercise, it will be difficult to know the types and varieties of exercise that are most beneficial to adults.for Researchers using this FAQ: Researchers consider swapping this example with one that relates to the present research] In other words, without your information it will be harder for us to understand how this research relates to people like you.
Please tell use which version of the FAQ you prefer.
Please tell us if you think that the FAQ would be better without paragraphs (a) and (b).Do you have any other thoughts or comments about the FAQ above?(If you have no comments, please type N/A) Yes, the FAQ is better without paragraphs (a) and (b) No, the FAQ is better with paragraphs (a) and (b) Part 2 Part 2: Revisions to Selected FAQ, continued.
Please let us know if you have any comments related to the previous question.(If you have no comments, please type in N/A) Part 2.1 Which option do you prefer?Do you have any other thoughts or comments about the FAQ above?(If you have no comments, please type in N/A) Part 3 Part 3: Thoughts About Risk in Database Research