This is an open-access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included.
There are many benefits to open datasets. However, privacy concerns have hampered the widespread creation of open health data. There is a dearth of documented methods and case studies for the creation of public-use health data. We describe a new methodology for creating a longitudinal public health dataset in the context of the Heritage Health Prize (HHP). The HHP is a global data mining competition to predict, by using claims data, the number of days patients will be hospitalized in a subsequent year. The winner will be the team or individual with the most accurate model past a threshold accuracy, and will receive a US $3 million cash prize. HHP began on April 4, 2011, and ends on April 3, 2013.
To de-identify the claims data used in the HHP competition and ensure that it meets the requirements in the US Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule.
We defined a threshold risk consistent with the HIPAA Privacy Rule Safe Harbor standard for disclosing the competition dataset. Three plausible re-identification attacks that can be executed on these data were identified. For each attack the re-identification probability was evaluated. If it was deemed too high then a new de-identification algorithm was applied to reduce the risk to an acceptable level. We performed an actual evaluation of re-identification risk using simulated attacks and matching experiments to confirm the results of the de-identification and to test sensitivity to assumptions. The main metric used to evaluate re-identification risk was the probability that a record in the HHP data can be re-identified given an attempted attack.
An evaluation of the de-identified dataset estimated that the probability of re-identifying an individual was .0084, below the .05 probability threshold specified for the competition. The risk was robust to violations of our initial assumptions.
It was possible to ensure that the probability of re-identification for a large longitudinal dataset was acceptably low when it was released for a global user community in support of an analytics competition. This is an example of, and methodology for, achieving open data principles for longitudinal health data.
Creating open data is considered an important goal in the research community. Open data is said to ensure accountability in research by allowing others access to researchers’ data and methods [
Although there is some evidence that sharing raw research data increases the citation rate of research papers [
There is a dearth of articles documenting methods for the creation of open health data that specifically address these privacy concerns. We provide a case study of de-identifying a health dataset for public release in the context of the Heritage Health Prize (HHP).
In April 2011 the Heritage Provider Network (HPN), a health maintenance organization based in California, launched the largest public health analytics competition to date: the HHP [
The public disclosure of health data for the purposes of attracting data analysts from around the globe to solve complex problems or to bring rapid advances to a field is not new.
In the United States there is no legislative requirement to obtain patient consent to disclose health information if the data are deemed de-identified. The Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule provides some definitions and standards for the de-identification of health data. Therefore, a credible claim must be made that the data are indeed de-identified according to one of those standards to allow their disclosure for the HHP without obtaining patient consent.
Recent examples of public releases of health data for the purpose of competitions.
Competition | Objective |
Predict HIV Progression [ |
Finding markers in the human immunodeficiency DNA sequence that predict a change in the severity of the infection |
INFORMS data mining contest [ |
Predicting hospitalization outcomes of transfer and death |
Practice Fusion medical research data [ |
Developing an application to manage patients with a focus on chronic diseases |
We describe how the HHP data were de-identified for the competition to (1) make the data publicly available, and (2) meet the requirements of the HIPAA Privacy Rule. Only one previous study explained the methods for de-identifying public-use health data files; however, it considered risks to Canadian patients and did not involve longitudinal data [
The contributions of this work are (1) a description of how we measured re-identification risk for the public release of a large health dataset in the United States (which can be a useful example for other open government and open data initiatives and programs in the United States [
The competition data consist of 3 years’ worth of demographic and claims data. For year 1 and year 2, the number of days of hospitalization in the subsequent year is also included. The claims data represent the predictors, and the number of days of hospitalization is the outcome. These data are used for training prediction models. Entrants use the year-3 claims data to predict the number of days of hospitalization for year 4, and the competition will be judged on the accuracy of that year-4 prediction. Therefore, entrants download the data for years 1–3, to predict days of hospitalization for year 4.
Managing the re-identification risk for the competition dataset consists of a combination of technical and legal measures. These measures are described in the following section.
All records that share the same quasi-identifier values are called an
Two kinds of disclosure are of concern. The first occurs when an adversary can assign an identity to a record in the disclosed dataset. For example, if the adversary is able to determine that record number 7 belongs to patient Alice Smith, then this is called
All known re-identification attacks of personal information that have actually occurred have been identity disclosures [
The claims dataset consists of two tables that include the fields shown in
The quasi-identifiers included in the dataset (indicated in
Description of the fields in the patients data table.
Field | Description |
MemberID | Unique identifier for the patient |
Agea | Age in years at the time of the first claim in year 1 |
Sexa | Patient’s sex |
DaysInHospital Y2a | Total number of days the patient was hospitalized in year 2 |
DaysInHospital Y3a | Total number of days the patient was hospitalized in year 3 |
a Quasi-identifier.
Description of the fields for the claims data table.
Field | Description |
MemberID | Unique identifier for the patient |
ProviderID | Unique identifier for the responsible provider giving care |
Vendor | Unique identifier for the vendor providing the service |
PCP | Unique identifier for the primary care provider |
Year | Indicator of claim year (year 1, year 2, or year 3) |
Specialtya | Specialty of provider |
PlaceOfServicea | Place of service |
CPTCodea | CPTb code: these codes provide a means to accurately describe medical, surgical, and diagnostic services, are used for processing claims and for medical review, and are the national coding standard under HIPAAc |
LOSa | Length of stay in hospital |
DSFCa | Number of days since first claim computed from the first claim for that patient for each year |
PayDelay | Number of days of delay between date of service and date of payment of the claim |
Diagnosisa | ICD-9-CMd code |
a Quasi-identifier.
b Current Procedural Terminology [
c Health Insurance Portability and Accountability Act.
d
We preprocessed the data to apply some basic de-identification steps before assessing any quantitative re-identification risk.
The MemberID, ProviderID, Vendor, and PCP fields were converted to irreversible pseudonyms [
Quantitative values that are considered uncommonly high are often limited to an upper bound, a procedure called
A commonly used heuristic for top-coding is to have a cut-off at the 99.5th percentile [
While it is not likely that an adversary would know the exact number of claims that an individual patient would have, it is plausible for an adversary to know whether an individual patient has had an abnormally large number of claims. For example, a patient may have 300 claims a year and be the only one in the population with more than 200 claims. Adversaries who know that their 50-year-old neighbor has had an unusually high number of hospital procedures could correctly guess that this extreme outlier is their neighbor.
We therefore truncated the number of claims per patient at the 95th percentile. To decide which claims to truncate we assigned each claim a score, and deleted claims with the highest scores from the dataset. A description of the scoring method is provided in
The truncation of claims was different from the censoring method that has been described in previous research for diagnosis codes [
Patients who were considered to be high risk were removed from the dataset to avoid the chance that their disease, condition, or procedure could be inferred from patterns in the data. These patients had
As
Providers could have patterns of treatment that make them stand out. An adversarial analysis by an independent party of a prerelease version of the HHP dataset noted how information about providers could potentially be used to predict the hospitals where procedures were performed (A Narayanan, unpublished data, 2011). Knowledge of the treating hospital would increase the risk of re-identification for the patients.
These patterns of treatment consisted of 4 quasi-identifiers: the place of service, specialty, CPT code, and diagnosis code. For example, a provider could be the only one with a particular specialty in a specific place of service who performed procedures on patients with a particular diagnosis. In cases where it was estimated from the HHP data that there were fewer than 20 providers with the same pattern in the HPN system, the provider ID was suppressed for those records. The choice of 20 is justified below in the section outlining thresholds. The estimation method used is described elsewhere [
To understand the type of de-identification required to protect patients, we first had to determine the threats that could exist for the duration of the competition. The following are the key facts and assumptions of the threat modeling used:
Fact: The dataset that was being released for the HHP consisted of a small sample of all HPN patients.
Fact: All entrants in the competition had to sign (or click through) an agreement saying that they would not attempt to re-identify patients in the dataset, contact patients, or link the HHP data with other datasets that would add demographic, socioeconomic, or clinical data about the patients (where such data could make the risk of re-identification much higher).
Assumption: It would not be possible for an adversary to know whether the record for a particular patient was in the HHP dataset. If an adversary made a guess, it would be equal to the sampling fraction. Most patients would themselves not know whether they were members of HPN, and therefore the most realistic sampling fraction to use would be from the population of counties in California covered by HPN. However, to err on the conservative side, we assumed that an adversary would know whether a patient was a member of HPN in our calculations of re-identification risk.
Assumption: An adversary would have background information about only a subset of the claims of a patient in the dataset. For example, if a patient had 100 claims, we did not deem it plausible for the adversary to know the exact information in all of those 100 claims and to use that information for re-identification purposes. Rather, we assumed the adversary would have information about only a subset of these claims. This has previously been referred to as the
These facts and assumptions shaped how we conceptualized re-identification risk and which kinds of attacks we considered plausible for this dataset.
We examined plausible attacks on the data as described below, and for each one we will discuss how we measured and managed the re-identification risks.
One important distinction to make at the outset pertains to subcontractors (eg, insurers, laboratories, or pharmacists) and employees of HPN, versus the entrants. Subcontractors process patient data during the regular provision of care and will have a large amount of information about the patients in the competition that can potentially be used for re-identification. However, HPN has contracts with these subcontractors and there are already mechanisms in place to enforce these agreements. In such a case, reliance on existing legal methods to protect against re-identification by subcontractors was deemed sufficient.
On the other hand, entrants in the competition could come from many countries in the world. Even though entrants had to agree to a certain set of rules, enforcement of the rules globally poses a practical challenge.
Therefore, we assumed that an adversary would be one of the entrants who has obtained the HHP data (1) by registering for the competition, or (2) through a data leak (deliberate, accidental, or malicious) from a legitimate entrant. Furthermore, it would not be prudent to assume that the adversary would adhere to conditions on other public or semipublic databases to which they have gained access. In such a case, we needed technical methods that provide stronger guarantees that the probability of re-identification is low.
Under this attack, the adversary would be an individual who (1) would be trying to re-identify a target individual who was an HPN patient (a specific individual, such as a neighbor or a famous person) or any individual who was known to the adversary to be an HPN patient (an arbitrary individual selected at random), (2) would not know whether the target individual was in the dataset, and (3) would have some basic background information about the target patient in terms of the patient’s demographics and information about some of the patient’s claims.
The adversary could be a patient’s neighbor, coworker, relative, or ex-spouse, or the target individual could be a famous person whose basic demographics and perhaps some of whose treatment information would be publicly known. There are known examples of this kind of attack. In one case a researcher re-identified the insurance claim transactions of the Governor of Massachusetts [
Under this type of attack, the risk metric would be the probability that an individual can be correctly re-identified. The probability of an individual being re-identified using this attack is the reciprocal of the equivalence class size in the HPN member population (from which the competition dataset is derived) [
For any patient in an equivalence class
Equations describing how re-identification risk was measured.
In California it is possible to obtain the voter registration list [
It has been shown that managing the risk in equation 2 also manages marketer risk [
In the United States, 48 states collect data on inpatients [
An adversary could potentially match the competition dataset with the SID data to discover something new about the individuals in the dataset. For example, if an individual were able to match the HHP records with the SID records, then the adversary could discover the exact month and year of birth of patients and their detailed diagnosis codes and procedures, even if we generalized them in the HHP data release (since these fields are included in the SID). Furthermore, the SID contains race information, which could be added to the HHP dataset after matching. This would provide more detailed information than was disclosed in the HHP dataset and would therefore raise the re-identification risk for any correctly matched patients.
Note that not all patients in the HHP dataset were hospitalized. Some may, for example, have been seen in an outpatient clinic. Therefore, by definition only a subset of the HHP dataset could be matched with the SID.
For this attack, the re-identification risk metric would be the proportion of individuals that could be matched between the HHP and the SID datasets. This can be measured using the marketer risk metric [
Since the SID covers all hospital discharges in California, the equivalence class sizes for hospitalized patients in the HPN population were equal to or smaller than the SID equivalence classes for those patients. This means that if we managed the risk in equation 2 for attack 1, we would also manage the risk for attack 3.
Based on the above analysis of the various possible attacks, if the re-identification risk from attack 1 could be managed, then the risks from all of the other attacks would also be managed. Below we describe the algorithm used in this study to manage the risk from attack 1. Additionally, during the empirical evaluation component of our study, we measured the re-identification risks from attacks 1 to 3 to confirm that the re-identification risks for all three attacks were acceptably low.
We used an automated algorithm to de-identify the dataset through generalization. Our base automated de-identification algorithm was OLA [
We will provide a brief overview of how LOLA works and its parameters, and then explain how we modified these parameters for the de-identification of the longitudinal HHP dataset.
LOLA has two inputs. The first is the
In our case we defined
For example, if we had set
Note that in practice more sophisticated methods for estimating
A key step in LOLA is generalization. Generalization reduces the precision in the data. As a simple example, a patient’s date of birth can be generalized to the month and year of birth, to the year of birth, or to a 5-year interval. Allowable generalizations are specified in generalization hierarchies. Let us consider an example dataset with only 3 quasi-identifiers: date of birth (d), gender (g), and date of visit (p).
All of the possible generalizations can be expressed in the form of a lattice as shown in
After efficiently evaluating the nodes in the lattice, LOLA identifies the candidate nodes that meet criterion 1 above. Out of the candidate nodes, LOLA then chooses the node with the smallest information loss among the candidate nodes, and this meets criterion 2. Information loss is measured in terms of a general entropy metric, which was found to have properties superior to those of other commonly used metrics in the literature [
The three domain generalization hierarchies for the 3 quasi-identifiers: date of birth (d), gender (g), and visit date (p).
A lattice showing the possible generalizations of the 3 quasi-identifiers: date of birth (d), gender (g), and visit date (p).
In the United States, the HIPAA Privacy Rule Safe Harbor de-identification standard was conceptualized using population uniqueness of individuals as the measure of risk, as documented in the responses to comments by the Department of Health and Human Services [
Risk exposure only comes from the records that have an unacceptably high probability of re-identification. In our case, the
A uniqueness threshold would be considered quite high by most standards (for example, see [
This means that, if the probability was equal to or lower than .05, then the data would be acceptable for release. To ensure a risk level that low we needed to ensure that
If we revisit our definition of
To retain the same level of maximum risk exposure as Safe Harbor with our proposed .05 probability threshold, we could accept only 0.8% of the records to have a probability that was higher than the .05 threshold for the same value of
Therefore, if the condition in equation 4 (
As
Each claim had up to 4 diagnosis codes. These were converted into 2 values. The ICD-9-CM diagnosis codes were generalized into 45 primary condition groups, which have been determined to be good predictors of mortality [
Description of the generalization hierarchies for the quasi-identifiers.
Quasi-identifier | Description |
Age | Years → 5-year interval; 80+ → 10-year interval; 80+ → 20-year interval; 80+ |
Sex | no change |
DaysInHospital Y2/Y3 | Days → days to 2 weeks; >2 weeks → days to 1 week; 1–2 weeks; >2 weeks |
Specialty | Original specialty → grouped specialty (see |
PlaceOfService | Original place of service → grouped place of service (see |
CPTCodea | Original CPT code → grouped CPT code |
LOSb | Days → days up to 6 days, weeks afterward → days up to 6 days; (1–2] weeks; (2–4] weeks; (4–8] weeks; (8–12 weeks]; (12–26] weeks; 26+ weeks → <1 week; (1–2] weeks; (2–4] weeks; (4–8] weeks; (8–12 weeks]; (12–26] weeks; 26+ weeks → <4 weeks; (4–8] weeks; (8–12 weeks]; (12–26] weeks; 26+ weeks |
DSFCc | Days → weeks → 2 weeks → months |
Diagnosis | ICD-9-CMd code → primary condition group (see |
a Current Procedural Terminology.
b Length of stay in hospital.
c Days since first claim.
d
The
Previous research that considered the power of the adversary always assumed that the power is fixed for all patients [
Also, it is likely that certain pieces of background information are more easily knowable than others by an adversary, making it necessary to treat the quasi-identifiers separately when it comes to computing the power of an adversary. For example, it would be easier to know a diagnosis value for patients with chronic conditions whose diagnoses keep repeating across claims. In such a case, if the adversary knew the information in 1 claim, then it would be easier to predict the information in other claims, increasing the amount of background knowledge that the adversary can have. In this case the diversity of values on a quasi-identifier across a patient’s claims becomes an important consideration. Therefore, we expect the power of an adversary to decrease monotonically with the diversity of values on the quasi-identifiers.
As
We defined the power for a particular individual in the data and for a particular quasi-identifier as
We made two assumptions about the knowledge of the adversary: (1) the adversary would not know which values on the quasi-identifiers were in the same claim (the
As noted earlier, the LOLA algorithm performs an efficient search through the lattice. During this search it needs to evaluate the percentage of records that are high risk for some of the nodes in the lattice. This is called
It would have been computationally very expensive for us to evaluate all combinations of
Therefore, we used a hierarchical bootstrapping approach [
After the de-identification of the dataset using LOLA, we wanted to empirically evaluate whether the risks from the three plausible attacks were appropriately managed. Hence, we performed an empirical evaluation.
To evaluate the actual probability of re-identification under this attack, we developed a separate attack program that would simulate exactly what an adversary would do. This program was developed by an independent programmer not involved in the development and application of LOLA described above.
The simulated attack assumed that an adversary would choose a patient from the HPN population at random. The adversary would not know whether that individual was in the HHP dataset, and hence this would introduce some uncertainty. If the individual was in the dataset then we computed the appropriate
The purpose of the simulated attack was to mimic what an adversary would do. We assumed that the adversary had background information about Alice. Alice may be the adversary’s neighbor or a famous person. She could also be someone the adversary selected at random from all HPN members.
The simulation dataset had two levels. Level 1 was the basic patient demographics as in
We also needed to create two versions of the de-identified dataset. Version D1 of the dataset had all of the claims for each patient. Version D2 of the dataset was the one with truncated claims. It is version D2 of the dataset that was released for the competition, but we needed D1 for the simulation. The level of generalization in the two datasets was exactly the same, the only difference being in the truncation of claims.
The following process was repeated 10,000 times:
We drew a sample from a binomial distribution with a probability of α. This reflects the probability that an individual that the adversary knew about was in the dataset. If the value drawn was 1, then we could continue; otherwise, we would go to the next iteration (and the current iteration was considered a failed match).
Then we chose a target individual from the D1 dataset at random.
We chose at random
We matched that background information to the records in D2. This produced a matching equivalence class.
One of the records was selected in the matching equivalence class at random.
If the selected record was the correct patient then that was a successful match; otherwise, it was considered a failure.
Across the 10,000 iterations we computed the proportion of times that a correct match was found. This was the re-identification probability for the dataset taking into account the uncertainty due to the fact that we had a sample and due to the adversary not knowing which claims were truncated.
To compute marketer risk [
We estimated the proportion of HHP records that could be correctly matched with the SID on the quasi-identifiers using the closed-form marketer risk calculation described in
We also analyzed sensitivity for the assumptions we made under attack 1. We explored three relaxations to the assumptions:
The maximum power of the adversary,
For 1 claim the adversary knew all of the quasi-identifiers for that claim. For example, say that we had only 2 quasi-identifiers, LOS and Diagnosis. Then we would assume that the adversary knew the LOS and Diagnosis values for the same claim. This relaxes the inexact knowledge assumption.
The adversary knew the order of 1 pair of quasi-identifier values. For example, the adversary would know that diagnosis A preceded diagnosis B. This would apply only in cases where the power for the quasi-identifier was greater than 1. We would apply this for a pair of claims for each quasi-identifier. This relaxes the inexact order assumption.
With these three types of sensitivity analyses we believed we covered plausible scenarios in which the adversary would have extensive knowledge about the individuals in the competition dataset.
The final claims dataset consisted of information from 113,000 patients, with 2,668,990 claims. The median number of claims per person was 11 and the maximum 136. Only 9556 patients had some of their claims truncated during the de-identification.
Making the conservative assumption that 0.8% of the individuals with a probability of re-identification higher than our threshold of .05 would have a probability of re-identification of 1, we would expect at most 5.8% of the patients to be re-identified based on our de-identification parameters.
After applying the LOLA algorithm to determine the optimal generalizations, we obtained the final results presented in
The risk calculation for attack 2 was that an expected proportion of 0.0005% of the HHP dataset could be correctly re-identified by matching with the appropriate counties in the California voter registration list. Furthermore, there are restrictions on the use of the California voter registration list that would prohibit such re-identification attempts [
The results for attack 3 are shown in
Final generalizations in the dataset.
Quasi-identifier | Generalization |
Age | 10-year interval; 80+ |
Sex | No change |
DaysInHospital Y2 | Days to 2 weeks; >2 weeks in year 2 |
DaysInHospital Y3 | Days to 2 weeks; >2 weeks in year 3 |
Specialty | Grouped specialty (see |
PlaceOfService | Grouped place of service (see |
CPTCodea | Grouped CPT code (see |
LOSb | Days up to 6 days; (1–2] weeks; (2–4] weeks; (4–8] weeks; (8–12 weeks]; (12–26] weeks; 26+ weeks |
DSFCc | 4 weeks |
Diagnosis | Primary condition group (see |
a Current Procedural Terminology.
b Length of stay in hospital.
c Days since first claim.
Estimated proportion of all records in the Heritage Health Prize dataset that would be correctly matched against the State Inpatient Database.
Age | LOSa | Sex | Number of visits | PCGb | CPTc | Year 1 | Year 2 | Year 3 | All years |
X | X | X | X | 0.001612 | 0.001478 | 0.001515 | 0.005141 | ||
X | X | X | X | 0.007105 | 0.005684 | 0.005965 | 0.009735 | ||
X | X | X | X | 0.013334 | 0.010156 | 0.010928 | 0.013579 | ||
X | X | X | X | X | 0.017272 | 0.012702 | 0.013797 | 0.015991 |
a Length of stay in hospital.
b Primary Condition Group.
c Current Procedural Terminology.
Percentage of total records correctly matched under simulated attack with different assumptions about the number of claims (power).
Power of adversary | |||
Assumption | 5 | 10 | 15 |
Original adversary assumptions | 0.84% | 0.94% | 1.17% |
Multiple quasi-identifiers in the same claim | 3.67% | 3.72% | 3.87% |
Ordered claims | 0.96% | 1.0% | 1.2% |
The detailed re-identification risk assessment on the HHP dataset allowed the disclosure of comprehensive longitudinal claims information on a large number of individuals while being able to make strong statements about the ability to re-identify these individuals. The de-identification we performed ensured that the risk was acceptable under different types of attacks, even to the extent that we allowed for some of our initial assumptions to be incorrect. In particular, we were able to ensure that the risk exposure was at or below the current risk exposure under the HIPAA Safe Harbor de-identification standard.
Ensuring the utility of the dataset is an important requirement in any de-identification effort. If no team is able to meet the prediction performance threshold to win the grand prize, then this may be because the threshold was too ambitious or because the de-identification itself made achieving that threshold difficult. An evaluation of the accuracy of the models before and after de-identification would be a useful exercise to help inform future competitions and fine-tune de-identification methods.
As our literature review in
Alternative ways for grouping the diagnosis and procedure codes could have been used. For example, we could have clustered the codes based on the average number of days of hospitalization. This would potentially have retained some important relationships in the data. Furthermore, it would ideally be necessary to perform this clustering using all of the quasi-identifiers to ensure that the multivariate relationships are retained. The practical challenge with such an approach was that many patients had zero days in hospital (for example, they were outpatients). This would then have resulted in coarser groupings than those we included with our analysis. Appropriate grouping of such nominal variables is an important area of future research to address constraints imposed by real datasets.
We did not consider the real possibility that there were errors in the background knowledge of the adversary. If errors exist then the match percentages would be lower than those we presented in our results.
Our analysis did not address risks from attribute disclosure. As noted earlier, there are no known attribute disclosure attacks on health data, and the HIPAA Privacy Rule does not require the management of attribute disclosure. This makes it difficult to determine what acceptable risk standards for attribute disclosure might be. Nevertheless, it would be appropriate to develop acceptable standards for managing attribute disclosure for future data releases.
Technical description of the de-identification methods.
Agency for Healthcare Research and Quality
Current Procedural Terminology
Heritage Health Prize
Health Insurance Portability and Accountability Act
Heritage Provider Network
International Classification of Diseases, 9th revision, clinical modification
longitudinal optimal lattice anonymization
length of stay
optimal lattice anonymization
primary condition group
State Inpatient Database
We wish to thank Dr Jim King and Dr Carole Gentile for their advice on aspects of the work presented here. This work was funded by the Heritage Provider Network as part of the preparations for the Heritage Health Prize.
None declared.