Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Monday, March 11, 2019 at 4:00 PM to 4:30 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?


Currently accepted at: Journal of Medical Internet Research

Date Submitted: Oct 21, 2019
Open Peer Review Period: Oct 21, 2019 - Dec 3, 2019
Date Accepted: Feb 21, 2020
(closed for review but you can still tweet)

This paper has been accepted and is currently in production.

It will appear shortly on 10.2196/16757

The final accepted version (not copyedited yet) is in this tab.

Evaluating privacy-preserving record linkage within a public health surveillance system that uses de-identified records

  • Long Nguyen; 
  • Mark Stoove; 
  • Douglas Boyle; 
  • Denton Callander; 
  • Hamish McManus; 
  • Jason Asselin; 
  • Rebecca Guy; 
  • Basil Donovan; 
  • Margaret Hellard; 
  • Carol El-Hayek; 



The Australian Collaboration for Coordinated Enhanced Sentinel Surveillance (ACCESS) has been established to monitor national testing and test outcomes for blood borne viruses (BBV) and sexually transmissible infections (STI) in key populations. ACCESS extracts anonymous data from sentinel health services that include general practice, sexual health and infectious disease clinics, as well as public and private laboratories that conduct a large volume of BBV/STI testing. An important attribute of ACCESS is the ability to accurately link individual level records within and between the participating sites, as this enables the system to produce reliable epidemiological measures.


GRHANITEĀ® software is used in ACCESS to extract and link anonymous data from participating clinics and laboratories. Irreversible hashed linkage keys are generated based on patient identifying data captured in the patient electronic medical records (EMRs) at the site. The algorithms to produce the data linkage use probabilistic linkage principles to account for variability and completeness of the underlying patient identifiers, producing up to four linkage key types per EMR. Errors in the linkage process can arise from imperfect or missing identifiers, impacting on the systems integrity. Therefore, it is important to evaluate the quality of the linkages created and evaluate the outcome of the linkage for ongoing public health surveillance.


While ACCESS data are de-identified we created two gold-standard data sets where true match status could be identified to compare against record linkage results arising from different approaches of the GRHANITE Linkage Tool. We report sensitivity, specificity and positive and negative predictive values where possible and estimate specificity by comparing a history of HIV and hepatitis C antibody results for linked EMRs.


Sensitivity and specificity was 100% when applying the GRHANITE Linkage Tool to a small gold-standard dataset of 3700 clinical medical records. Medical records in this dataset contained a very high level of data completeness by having name, date of birth, post code and Medicare number available for use in record linkage. In a larger gold standard dataset containing 86,538 medical records across clinics and pathology services, with a lower level of data completeness, sensitivity was over 94% and estimated specificity over 90% in 4 of the 6 different record linkage approaches.


Our findings suggest the accuracy of record linkage using the GRHANITE Linkage Tool is high and can be used to make reliable population-based epidemiological assessments including disease incidence and prevalence using ACCESS data.


Please cite as:

Nguyen L, Stoove M, Boyle D, Callander D, McManus H, Asselin J, Guy R, Donovan B, Hellard M, El-Hayek C

Evaluating privacy-preserving record linkage within a public health surveillance system that uses de-identified records

Journal of Medical Internet Research. 21/02/2020:16757 (forthcoming/in press)

DOI: 10.2196/16757


Download PDF

Request queued. Please wait while the file is being generated. It may take some time.

© The authors. All rights reserved. This is a privileged document currently under peer-review/community review (or an accepted/rejected manuscript). Authors have provided JMIR Publications with an exclusive license to publish this preprint on it's website for review and ahead-of-print citation purposes only. While the final peer-reviewed paper may be licensed under a cc-by license on publication, at this stage authors and publisher expressively prohibit redistribution of this draft paper other than for review purposes.