This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.
The importance of blockchain-based architectures for personal health record (PHR) lies in the fact that they are thought and developed to allow patients to control and at least partly collect their health data. Ideally, these systems should provide the full control of such data to the respective owner. In spite of this importance, most of the works focus more on describing how blockchain models can be used in a PHR scenario rather than whether these models are in fact feasible and robust enough to support a large number of users.
To achieve a consistent, reproducible, and comparable PHR system, we build a novel ledger-oriented architecture out of a permissioned distributed network, providing patients with a manner to securely collect, store, share, and manage their health data. We also emphasize the importance of suitable ledgers and smart contracts to operate the blockchain network as well as discuss the necessity of standardizing evaluation metrics to compare related (net)works.
We adopted the Hyperledger Fabric platform to implement our blockchain-based architecture design and the Hyperledger Caliper framework to provide a detailed assessment of our system: first, under workload, ranging from 100 to 2500 simultaneous record submissions, and second, increasing the network size from 3 to 13 peers. In both experiments, we used throughput and average latency as the primary metrics. We also created a health database, a cryptographic unit, and a server to complement the blockchain network.
With a 3-peer network, smart contracts that write on the ledger have throughputs, measured in transactions per second (tps) in an order of magnitude close to 102 tps, while those contracts that only read have rates close to 103 tps. Smart contracts that write also have latencies, measured in seconds, in an order of magnitude close to 101 seconds, while that only read have delays close to 100 seconds. In particular, smart contracts that retrieve, list, and view history have throughputs varying, respectively, from 1100 tps to 1300 tps, 650 tps to 750 tps, and 850 tps to 950 tps, impacting the overall system response if they are equally requested under the same workload. Varying the network size and applying an equal fixed load, in turn, writing throughputs go from 102 tps to 101 tps and latencies go from 101 seconds to 102 seconds, while reading ones maintain similar values.
To the best of our knowledge, we are the first to evaluate, using Hyperledger Caliper, the performance of a PHR blockchain architecture and the first to evaluate each smart contract separately. Nevertheless, blockchain systems achieve performances far below what the traditional distributed databases achieve, indicating that the assessment of blockchain solutions for PHR is a major concern to be addressed before putting them into a real production.
Two closely related concepts have been drawing the attention of the biomedical and health informatics community: electronic health record (EHR) and health information exchange (HIE). The former, broadly speaking, covers all the repositories of digital data concerning retrospective, concurrent, and prospective information for ongoing support for patient health care [
Despite having been separately presented, an EHR repository and an HIE protocol can be incorporated into the same system as a matter of fact. In general, they comprise systems to store, retrieve, and share health data and, invariably, lead to interoperability, scalability, reliability, privacy, and security issues regarding those data. Interoperability can reduce or even eliminate handmade administrative tasks, avoid duplicate clinical services, and facilitate access to relevant information, thereby decreasing cost and waste and improving coordinate and unplanned care [
In particular, privacy and security relating to EHRs have been especially important issues because health data are undoubtedly sensitive. Patients must have their personal information guaranteed by civil rights, that is, only used and disclosed under their consent to indeed have privacy. In this sense, health care providers and regulators should be previously authorized before they are able to examine such information. Furthermore, patients must be protected from unauthorized access, modification, and exclusion of their stored data to really be safe. In general, lack of security can result in data theft and leakage [
In view thereof, the aforesaid community has already provided an increasing number of blockchain uses: a decentralized record management to handle electronic medical records [
There are several contributions proposing blockchain-based architecture designs to address existing problems with EHR. However, most of them have targeted electronic medical records and electronic patient records, and only few approached PHR [
Roehrs et al [
Liang et al [
Uddin et al [
Using an Ethereum-based blockchain network [
Roehrs et al [
Through an Ethereum-based blockchain architecture, Lee et al [
Alongside the preceding papers, our work builds a blockchain-based architecture out of a permissioned distributed network in order to supply a PHR system for patients to securely collect, store, share, and manage their health data. Despite the similarities, it brings a novel ledger-oriented architecture model using Hyperledger Fabric, emphasizing the importance of suitable ledgers and smart contracts to operate the overall blockchain. In addition, it provides a detailed assessment of a 3-peer network—applying throughput and latency—under workload, ranging from 100 to 2500 simultaneous record submissions, and analyses, in this case for a fixed load, the impact of increasing the network, ranging from 3 to 13 peers. At the end, our work discusses the necessity of standardizing evaluation metrics to facilitate the comparison between related works.
Blockchain is a distributed, tamper-resistant, and continuously growing ledger for recording desirable assets and transactions in cryptographically chained blocks. It results from a protocol to add data blocks, using public-key cryptography and hash functions, and from a protocol to validate them, using a consensus algorithm on a peer-to-peer network [
Blockchain networks can be arranged either into a permissionless or a permissioned mechanism for selecting participants, to ensure the honest majority assumption, that is, the conjecture that the majority of the peers will be honest and run the consensus protocol correctly [
Smart contracts, in turn, are prespecified rules that allow a blockchain to be conducted in a consensual manner by all network participants. In practice, these rules represent transactions, which automatically operate digital assets and can be constructively used to state a bylaw among parties with common goals, attaining a decentralized autonomous organization [
As already suggested in the introduction, Ethereum and Hyperledger Fabric have been the main open-source platforms used to develop blockchain frameworks into EHR and HIE [
Hosted by the Linux Foundation, Hyperledger Fabric, in turn, is a decentralized operating system to create permissioned networks. It allows smart contracts (chaincodes) and distributed applications to be written in Go, Java, and Node. Using an ordering service implementation based on a crash-tolerance consensus [
As already mentioned, we opt for the latter platform to implement our permissioned network. Most of the existing platforms, including Ethereum, implement a traditional active replication for the consensus protocol, which first orders and broadcasts transactions to all peers and second waits for each peer to perform such transactions sequentially (order-execute paradigm), limiting performance and requiring an additional mechanism to prevent denial-of-service attacks from untrusted codes [
Using Hyperledger Fabric release 2.2, our blockchain network is structured with N peer nodes (P1, P2, …, PN), with N greater than or equal to 3, and an ordering service node. The peer nodes are the basic elements of the network because they store ledgers (L) and smart contracts (S) [
The peers are associated with their respective client nodes (CL1, CL2, …, CLN)—the elements outside the network that allow an application to be connected to the blockchain, that is, an external application accesses ledgers and smart contracts via client-peer connection. By means of a software development kit [
The peers get assigned to the consortium—the government, health organizations, civil society institutions, and hospitals in our example—by their respective certificate authorities (CA1, CA2, …, CAN), the elements that generate public and private key infrastructure to issue identities via digital certificates [
Lastly, the ordering service node mediates the interaction between peers during a transaction submission and ensures a consistent ledger after performing the consensus protocol. In Hyperledger Fabric, the endorsement policy occurs as a result of a 3-phase process: (1) proposal, (2) ordering and packing, and (3) validation and commit. Roughly speaking, in the first phase, a client node submits a transaction proposal, which is distributed to the endorsement peers and is independently executed by them, returning a set of endorsed responses—inconsistent responses can be already detected and discarded, finishing the workflow early. In the second phase, the ordering service node collects these responses and packages them into blocks, preparing for the next step. In the third phase, the ordering service node finally distributes the blocks to the peers, which in turn validate them to verify the endorsement phase and, only after that, commit to the ledger—failed transactions terminate the workflow without writing on the blockchain [
Turning the analysis to the ledgers and smart contracts, our approach considers 3 classes: (1) for personally identifiable information (PII), (2) for health record information (HRI), and (3) for record sharing information (RSI) (
PII is designed to store basic form data filled by the user at the moment of registration in the system. There are smart contracts to add, update, retrieve, and view history, respectively, to write a new record, rectify a registration error, perform a system login, and recover an updating log. To add a PII, the user needs to register with a password—converted into a hash value for security—and thus, receive a unique identifier (PII ID). Once registered, the PII ID is only recovered from a login, that is, identity number or email and the correct password hash. All other smart contracts, including those from HRI and RSI, are only able to write and read the ledger by means of a PII ID as the prefix of a composite key. In such a way, each user just accesses her/his data. HRI, in turn, is designed to store metadata from a health document, together with a hash value and a database ID, for reasons to be explained later in the text. Similar to the PII, there are smart contracts to add, update, retrieve—in this case, to recover a single record—and view history, and one further to list all records for a user. Finally, RSI is designed to store HIE logs in order to track every time a copy of a health document leaves the repository, either for downloading or sharing. There are smart contracts to add, retrieve, and list. To keep HIE logs unchanged, we opt for not creating a smart contract to update them; hence, neither one to view history.
Notwithstanding the necessity of smart contracts to list HRI and RSI, for the sake of security, PHR systems do not need one to list PII. One such smart contract would allow an administrator to list users and associate them with their respective HRI and RSI. To prevent such a situation and actually grant to a user the exclusive right of her/his health data ownership, the PII ID is only retrieved with the correct password hash. Because PII ID is a required index prefix to use HRI and RSI smart contracts, the absence of a PII listing function represents an additional security element directly configured in the operation rules of the system. Note that these settings are not just programming practices. Because smart contracts state the logic of the blockchain network, a set of security practices at the present time can evolve to rule status in the near future. Indeed, using smart contracts is a great opportunity to create a bylaw or business logic for PHR, defining which is and is not permitted regarding the access to patient information.
Although there are several smart contracts, they consist of 2 basic network operations: writing and reading. The former is used to invoke either the creation of a new state on the ledger or the modification of an existing one—without deleting past states, evidently. Smart contracts to add and update fall into this type. To perform writing, a client node needs to start an endorsement policy and reach consensus—a process that involves all peers. The latter operation, in turn, is used to query the current state and history of a ledger. Smart contracts to retrieve, list, and view history fall into this another type. To perform reading, a client node just connects to its associated peer and thus queries the stored ledger, independently of the other peers. Similar to the client-peer connection resources, by means of another software development kit [
Design of our blockchain network, considering N endorsement peers and their respective clients and certificate authorities. Each channel is associated with a specific set of ledgers and smart contracts, respectively named as personally identifiable information, health record information, and record sharing information. Ideally, each triple peer-client-certificate authority must be under the responsibility of a different organization or institution. HRI: health record information; PII: personally identifiable information; RSI: record sharing information; P: peer; S: smart contract; L: ledger; CL: client; CA: certificate authority; C: channel.
Design of the ledgers and their respective smart contracts. They fall into 3 classes: personally identifiable information, health record information, and record sharing information. HRI: health record information; MIME: Multipurpose Internet Mail Extensions; PII: personally identifiable information; RSI: record sharing information.
Although blockchain technology provides security tools against record tampering, it is still not suitable for storing a large volume of data, despite the efforts made to meet this requirement [
As a further safeguard, the data and metadata are encrypted. When a user registers in our system, she/he automatically receives a key to encrypt information entering the system as well as to decrypt that leaving out by means of a cryptographic unit. Each user obtains her/his own key and is only capable of decrypting her/his own data evidently. Because our health database is configured to store documents smaller than or equal to 100 MB, we opt for using the advanced encryption standard (AES), a symmetric key block encryption algorithm recommended by the National Institute of Standards and Technology. The AES handles block sizes of at least 128 bits and key sizes of 128, 192, and 256 bits. The AES also accepts 5 modes of operation, that is, electronic codebook, cipher block chaining (CBC), cipher feedback, output feedback, and counter, for preventing identical ciphertexts to be generated from blocks containing the same data, a breach that facilitates a malicious opponent to accumulate enough plaintext-ciphertext pairs and thus find the key by exhaustion in a feasible time. In particular, CBC requires an initialization vector, which takes an exclusive-OR operation with the first plaintext block and, if randomly generated, provides different ciphertexts from the same data [
As a final module, we build a server infrastructure out of a Node framework to host the blockchain clients and, thereby, provide blockchain resources for external applications. Through a control unit, and performing specific calls for each smart contract as well as for each database operation, this server supports the registration and access of users, the inclusion, updating and retrieval of health documents, and the creation of links to download and share these documents—only with the consent and supervision of the respective user, evidently. Roughly speaking, this server executes 3 basic steps: (1) it receives requests from external applications, (2) according to each request, it accesses the corresponding network and database resources, and (3) it returns consistent responses to those applications. Because the server works as an intermediate system between blockchain network, health database, and external applications, it conveniently accommodates the cryptographic unit. In this way, sensitive information is encrypted as soon as it enters the system and only decrypted when leaving out.
Sketch of the overall system, exhibiting the interconnections between server, health database, and blockchain network, in order to provide personal health record resources for external applications. HRI: health record information; PII: personally identifiable information; RSI: record sharing information.
Flow of information during the query or record request of a health document. The server only returns a successful response if data and metadata are consistent. The flow can be interrupted earlier owing to lack of consensus.
To evaluate our blockchain-based architecture design, we use Hyperledger Caliper—a benchmark tool released by the Hyperledger community for measuring the performance of blockchain systems and producing reports containing metrics commonly accepted, such as throughput and latency. Caliper supports Ethereum and Hyperledger Fabric, allowing computer scientists and engineers to compare EHR proposals developed from the 2 main platforms at present. It is capable of generating a workload for a system under test (SUT) and continuously monitoring responses from this SUT [
To run an experiment, Caliper requires a benchmark file, a network file, and workload modules. The first one presents custom configurations to run the benchmark, such as the number of workers to perform a workload, the round settings, the number of submissions, the round length in seconds, the rate at which transactions are sent to the blockchain, among others. The second one presents the layout of the SUT—basically, the addresses and identities of the nodes and the channels and smart contracts to be used during the test. Lastly, workload modules are Node functions exported to simulate client nodes sending requests to the SUT, that is, in each round, a different workload module can be used to generate and submit transactions to the SUT, according to the configurations in the benchmark and network files. Therefore, Caliper can emulate many clients injecting workloads in a blockchain network [
As already mentioned, 2 basic metrics to assess blockchain performance are throughput and latency. The former, usually given in transactions per second (tps), represents the total number of valid transactions reached in a period of time [
The latter, in turn, usually given in seconds, represents the time taken for a transaction to conclude and return a response [
With a 3-peer network, our first benchmark is set to run a workload, from 100 to 2500 simultaneous submissions of health metadata, with steps of 100, on each smart contract of the PII, HRI, and RSI templates. We limit our test to 2500 loads because Hyperledger Fabric is standardly configured to perform a maximum of 2500 concurrent requests. Writing scenarios are configured to use 5 workers submitting at the same time 10,000 transactions, each one totalizing 50,000. Reading scenarios are configured to use the same 5 workers in parallel but to randomly request records during 600 seconds of continuous operation. The rate controller is kept in a fixed-load mode, starting at 50 tps and 500 tps, for writing and reading transactions, respectively, and growing to reach maximum rates. Because PII, HRI, and RSI are designed to store ciphertexts only, in our test, all simulated submissions of health metadata are randomly generated as strings of fixed length for each smart contract field. An empty blockchain network is raised in each load test to guarantee an equal condition. Our test environment consists of a machine having an Intel Xeon E-2246G processor (12 MB cache, 3.60 GHz, 6 cores, 12 threads), an NVIDIA Quadro P1000 graphic adapter, and a random access memory of 16 GB, running Ubuntu 18.04.5 LTS 64 bits operating system.
Even though throughputs of reading transactions present a similar order of magnitude, they have significant differences between them. Smart contracts to retrieve, list, and view history have throughputs varying, respectively, from 1100 tps to 1300 tps, from 650 tps to 750 tps, and from 850 tps to 950 tps. Their latencies, in turn, grow at slightly different linear rates, albeit alike. These 2 pieces of evidence suggest that reading transactions can impact the overall system response if they are equally requested. An external application under a real situation has to consider the smallest of these values as the upper limit to avoid overload. With a fixed load at 2000 submissions, our second benchmark is set to increase the network size from 3 to 13 peers, with steps of 2, and perform, for each case, the writing and reading scenarios of the previous experimental protocol. We limit the largest network to 13 peers because by considering our test environment, Hyperledger Fabric has a very poor performance beyond this value, resulting in many transaction failures.
As a final comment when observing throughputs and average latencies in
Throughput (measured in transactions per second) and average latency (measured in seconds) of all smart contracts under workload, ranging from 100 to 2500 concurrent submissions of health metadata, with steps of 100. HRI: health record information; PII: personally identifiable information; RSI: record sharing information; tps: transactions per second.
Throughput (measured in transactions per second) and average latency (measured in seconds) of all smart contracts, by considering a network increase from 3 to 13 endorsement peers, with steps of 2. HRI: health record information; PII: personally identifiable information; RSI: record sharing information; tps: transactions per second.
The results of this study are comparable to those reported previously in the literature [
In practice, most of the works focus more on describing how blockchain models can be used in a PHR scenario than whether these models are in fact feasible to support a large number of users. Because the health industry can easily cover tens or even hundreds of millions of patients in a single country, we think the assessment of blockchain solutions for PHR is a major concern to be addressed before putting them into a real production. In view thereof, there is a latent necessity of standardizing evaluation metrics to facilitate the comparison between related works. We think that throughput and average latency are suitable metrics for this purpose as well as Hyperledger Caliper and BLOCKBENCH [
Toward a consistent, reproducible, and comparable PHR evaluation, and by regarding throughput and latency, we are the first to evaluate with Hyperledger Caliper the performance of a PHR blockchain architecture. Because Caliper is the official benchmark to access blockchain networks built out of Fabric, we believe that our results bring important insights to the limits and advantages of using Fabric to design PHR repositories. Moreover, Caliper can be adapted to access Ethereum-based systems, facilitating the comparison between architectures created with the 2 main open-source platforms at the present time. To the best of our knowledge, we are also the first to evaluate each smart contract separately. Previous works considered smart contracts as falling only into writing and reading transactions and have just identified dissimilarities between these 2 types. However, we reveal that, especially in relation to reading ones, throughput and latency can have significant differences, impacting the overall system response if these transactions are equally requested under the same workload.
Specifically in relation to our proposal, as a first implementation, the blockchain network, the health database, and the server are allocated through virtual machines on a single physical device, only simulating a decentralized system, which represents a limitation of our work. Furthermore, because we are primarily interested in the blockchain architecture, the health database and the server are incorporated in the model but they are not actually tested considering an external application under a real situation, which represents an additional limitation. We leave these improvements for future work because we believe that our current results already provide important advice to the biomedical and health informatics community.
In conclusion, the importance of blockchain-based architectures for PHR lies in the fact that they are thought and developed to allow a patient to control and at least partly collect health data, as well as to share health information on her/his own. Ideally, these systems should provide the full control of such data for the respective owner [
advanced encryption standard
cipher block chaining
electronic health record
health information exchange
health record information
personal health record
personally identifiable information
record sharing information
system under test
transactions per second
We thank Foxconn for the financial support and Instituto do Coração, Hospital das Clínicas, Faculdade de Medicina, Universidade de São Paulo for the research infrastructure.
None declared.