This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included.
Open data is information made freely available to third parties in structured formats without restrictive licensing conditions, permitting commercial and noncommercial organizations to innovate. In the context of National Health Service (NHS) data, this is intended to improve patient outcomes and efficiency. EBM DataLab is a research group with a focus on online tools which turn our research findings into actionable monthly outputs. We regularly import and process more than 15 different NHS open datasets to deliver OpenPrescribing.net, one of the most high-impact use cases for NHS England’s open data, with over 15,000 unique users each month. In this paper, we have described the many breaches of best practices around NHS open data that we have encountered. Examples include datasets that repeatedly change location without warning or forwarding; datasets that are needlessly behind a “CAPTCHA” and so cannot be automatically downloaded; longitudinal datasets that change their structure without warning or documentation; near-duplicate datasets with unexplained differences; datasets that are impossible to locate, and thus may or may not exist; poor or absent documentation; and withholding of data for dubious reasons. We propose new open ways of working that will support better analytics for all users of the NHS. These include better curation, better documentation, and systems for better dialogue with technical teams.
Open data is briefly defined as data that anyone can access, use, modify, and share; more technical definitions are available from various sources [
The UK government has long recognized that simply publishing data is, in itself, not sufficient to meet these criteria and also not sufficient to drive change and innovation. The 2012 Open Data White Paper set out 14 information principles declaring that data should be easy to find, available without registration, and accompanied by meaningful descriptive text, alongside various other more technical recommendations [
The National Health Service (NHS) in England has long agreed that transparency can lead to better outcomes for patients and taxpayers [
Our group develops and maintains OpenPrescribing.net, an online and publicly accessible tool, to help users explore highly granular NHS primary care prescribing open data. It is widely used with over 15,000 unique users each month. Its users are predominantly from within the NHS, but industry and patient groups are also well represented. In England, the planning and commissioning of health care services for each local area is carried out by Clinical Commissioning Groups (CCGs), who alongside NHS England commission primary care services from individual general practitioner (GP) practices. GPs have considerable freedom in prescribing behavior, with the costs of prescriptions usually being borne by CCGs. The transfer of money from CCGs to pharmacies (and other organizations) who dispense prescriptions to patients is mediated by the NHS Business Services Authority (NHSBSA), which processes all prescribing transactions to determine correct payments. The NHSBSA is, therefore, also responsible for converting data submitted from pharmacies into a standard format. Although it exists for an economic purpose, the existence of this very high-quality dataset provides a unique opportunity to find ways to improve the quality, safety, and cost-effectiveness at all individual GP practices across England. Our tools support complex bespoke data queries alongside numerous predefined standard measures for safety, cost, and effectiveness. In total, 92.1% (176/191) of CCGs are signed up to monthly alerts, which automatically identify high-priority action items. We have published peer-reviewed research showing that prescribing is substantially improved in practices and CCGs where OpenPrescribing.net data are accessed [
OpenPrescribing.net is built on top of data that are theoretically publicly accessible. We have repeatedly encountered time-consuming barriers to accessing and processing these data. In this paper, we have described some of these barriers and made recommendations on how the NHS could share data more effectively.
The views set out below are informed by our technical work building OpenPrescribing.net but also by our broader background. The DataLab at the University of Oxford is a mixed team of software engineers, clinicians, academics, and analysts turning NHS data into tools and services to directly improve patient care. We aim to pool skills and combine best practices from software engineering and academia, producing open source software, open prototypes, and open workbooks. On GitHub, under open licenses, we have shared 44,000 lines of code in 34 public repositories with over 5000 commits; 850 Python files; 105 Structured Query Language (SQL) files containing 4600 lines of SQL; 140 Jupyter notebooks; and over 1000 GitHub issues, each containing detailed descriptions of specific problems we have encountered and their technical solutions. Many of us have also worked previously in organizations that promote open access to knowledge. In more concrete terms, as reference to our experience of working with NHS open data, at least 8 different datasets must be located, downloaded, converted, normalized, interpreted, combined, and then processed to create even 1, apparently simple, mapped insight on OpenPrescribing.net: “over the past 5 years, NHS North Cumbria spent £63,000 on Linaclotide.”
In the following section we have described a range of barriers we have encountered in accessing NHS open data. For each problem domain we describe the datasets we are aiming to access, the barriers encountered, and some suggested solutions that would make the data usable and impactful.
Each month, we download and process prescribing data for NHS England. The best practice [
For example, consistently locating the data is difficult: both initially and on an ongoing basis with each new month of data. The first challenge for a consumer of the data is picking a dataset to use. A total of 2 very similar datasets are published by 2 different organizations: NHS Digital, and NHSBSA. The NHS Digital dataset is published on the first Friday of the third month after data collection, whereas the NHSBSA dataset is usually available 6 weeks following data collection. Neither of these datasets reference the other in their documentation, and we have found no single location that identifies them as complementary sources.
Until 2017, we used NHS Digital’s version (known as practice level prescribing data), simply because this is the easiest to find. For 2 years we retrieved the data from NHS Digital’s data repository [
The version that we have used since 2017 is known as Practice Detailed Prescribing Information (PDPI) and is published by NHSBSA on their Information Services portal [
Second, although documentation is provided for PDPI, the documentation is incomplete: it refers to fields that do not exist, and does not refer to 15 fields that do exist [
CAPTCHA form for National Health Service Business Services Authority Practice Detailed Prescribing Information dataset.
No publicly available data should be protected behind a CAPTCHA.
Each dataset should have every field documented.
Every resource should have a consistent location (URL or machine-readable data index) for finding current data.
Internal reorganizations should not result in these URLs being deleted; if they are superseded, old URLs should be kept and set up to redirect to new locations.
When there is a plan to relocate or change datasets, this should be advertised and documented well in advance.
It should be easy to find all current prescribing data resources and to pick the most appropriate one. For example, there could be a single place listing all current prescribing data resources.
Each prescription is identified by a “BNF Code”: this is typically 15 characters long and uniquely identifies a presentation of a drug. For example, the code for Tramadol HCl 300 mg tablets is 040702040AAAMAM. To make prescribing data useful for analysts, all the British National Formulary (BNF) codes must be converted to human-readable BNF names. Data to support this are published by NHSBSA (behind another CAPTCHA) on the Information Services portal [
The coding scheme is based on the BNF’s old classification, which they no longer maintain themselves. Therefore, the NHS’ altered version is properly known as the “Pseudo BNF Classification” [
The fact that some BNF codes change over time makes time-based analysis of data difficult. For example, a user searching for Linaclotide, using its current BNF code, will find no prescribing before 2014. This is because the drug was moved from BNF section
British National Formulary Code data labeled available in November 2018.
As there is no mention that such changes are possible on the internet, we first inferred this was the case following user enquiries about apparently disappearing drugs. Following direct enquiries, we now obtain a spreadsheet detailing the changes every January by emailing NHSBSA directly and apply this to the imported prescribing data. By comparing the pseudo BNF code lists each month, we have inferred that codes also sometimes change mid-year but have not yet obtained access to these individual changes on a monthly basis [
Published, open data should never be protected behind a CAPTCHA.
The fact that the British National Formulary (BNF) scheme changes regularly should be documented.
There should be a single, obvious channel for data consumers to query possible issues in the data.
BNF code changes should be published monthly as a mapping.
Each data release should be clearly labeled on its index page, so users know when a new version has been released; there should be a way for users to subscribe to be notified of new releases.
To include patient list size in our analyses, and show practice names and addresses, we looked up extra information in a dataset provided by NHS Digital. The format of this dataset has not changed since 2015; however, we have encountered regular problems with its location changing, which has prevented us from fully automating this monthly process.
Until 2018, our procedure was to automate a search for the phrase “
Once practice data were obtained, we encountered difficulties with data quality. In general, the data provided by NHSBSA are of a high standard. However, there is no documentation for several known recurring errors and no way to report and correct them systematically.
For example, it is important to know whether an institution is a standard GP practice or a different kind of institution (eg, a homeless service or a drop-in center). However, in the data provided, there is a small but significant number of obvious errors in coding, such as classification of care homes [
As a final example, this problem is further compounded by list size data that regularly appears to contain fictional values. Sometimes we identify practices that have prescribing at improbable levels, far exceeding their total number of patients [
All data should be published in a predictable format and location.
“Nominal” values should not be used: missing values should be clearly coded as “missing.”
Where there are systemic issues with data quality, these should be documented.
There should be a clearly documented and centralized system for reporting and correcting errors in the data.
Data stewards should take responsibility for collecting error reports and aim to correct them.
To analyze data at a CCG level, we need to aggregate the per-practice data up to CCG level. The source data provide a CCG for each row, so this is straightforward for contemporary data.
We show CCG boundaries on maps in various places in OpenPrescribing.net: an example in
Prescribing of pericyazine as a proportion of all antipsychotics across all Clinical Commissioning Groups in England, as displayed on OpenPrescribing.net.
Our per-practice data provides a practice’s CCG membership for the current month. However, historic analysis is complicated by the fact that practices often change CCGs, CCG boundaries sometimes change, and CCGs often merge. In 2017, for example, the boundary between NHS Cumbria and NHS Morecambe Bay changed. We were able to infer from the data that 32 practices moved to Morecambe Bay as a result [
The problem that a practice may move between CCGs is addressed in the OpenPrescribing.net software by projecting the practice’s current CCG membership back in time: for example, a prescription dispensed in 2012 is allocated to whatever CCG that practice currently belongs to. This works well in most cases but becomes complicated when a practice has closed. In the case of Cumbria in April 2017, 5 practices had closed before the boundary change; these are, therefore, not reflected in current CCG membership data. This leaves the problem of which CCG to attribute them to: their patients have not disappeared, just moved, but it is impossible to find out or infer where they were moved to because the information about what happens to a practice’s patient list on closure is not available as data.
Our own research has established that when a practice closes (or merges), it must fill out at least two nearly identical forms to notify the prescription pricing division of NHSBSA [
Map files should be published in a single, easily found, permanent location to a regular schedule.
They should be published alongside (or indicate the location of) files supporting mapping to standard National Health Service (NHS) clinical commissioning group (CCG) codes as used in prescribing data.
Their format should stay constant over time where possible.
Practice merger and closure data should be published, showing where and when practice lists have transferred.
Even if this is not possible, the problem of tracking historic prescribing behavior via practice codes should be clearly documented.
We are unclear as to the value for the NHS of a system that requires CCGs to notify NHS England of practice changes but then leaves the information on paper.
A single row of prescribing data includes a column denoting the quantity of the item dispensed. For example, in the case of paracetamol tablets, a “quantity” of 25 means that 25
The quantity of a drug dispensed is measured in units depending on the formulation of the product, which is given in the drug name. Where the formulation is tablet, capsule, ampoule, vial etc. the quantity will be the number of tablets, capsules, ampoules, vials etc. Where the formulation is a liquid, the quantity will be the number of millilitres. Where the formulation is a solid form (eg. Cream, gel, ointment), the quantity will be the number of grammes.
However, this definition is not sufficiently precise for use in statistical analyses. For example, it is not obvious if a foam should be classified as a liquid or a solid. Further extensive investigation uncovered the existence of a “standard quantity unit” field for every product, which defines the property precisely. However, it can be found only in one place, the monthly prescription cost analysis spreadsheet [
Even when the standard quantity unit for a presentation is known, the definition of quantity sometimes varies, for example, between “dose” and “pack.” During development of our price-per-unit tool, we found a number of products where the highest price was orders of magnitude outside the normal range [
An NHSBSA glossary has this to say on the matter: “Where a product is packed in a 'special container'...
It is not clear from this statement if variation in the meaning of “quantity” for a single presentation is intentional, or accidental. We raised specific examples with NHSBSA, and this led to some of these figures being corrected retrospectively, but in other cases, we were told “work is under way to review this and agree a way forward.”
Errors in data are inevitable and to be expected. Overall, the NHSBSA dataset is remarkably free of errors. However, as analysts we need to understand where errors are and, if they are systematic, where, and how often we can expect them to appear. The detailed investigative analysis required to understand these data delayed the launch of our price-per-unit savings tool by several months. This kind of delay has real-world effects; published peer-reviewed data show that the tool saves CCGs millions of pounds a year [
By default, all prescribing data used internally at National Health Service Business Services Authority should be made available and described in 1 place.
All data should be accompanied by clear, user-focused documentation about the meaning of each field.
Where there are known problems with the data, these should be documented clearly and transparently.
During 2017, we set out to conduct a simple, low-cost randomized trial: we notified GPs of cost-saving and quality improvement opportunities in their prescribing and are currently measuring the impact of this notification on their behavior. The intervention was split between 3 methods of communication: letter, email, and fax. We assumed there must be at least one central NHS database of practices’ email addresses; for example, NHS England emails a monthly GP practice bulletin to GP practices. We knew there might be problems making this public, but we were also surprised by how difficult it was to find out if the database existed at all.
First, we checked WhatDoTheyKnow, a publicly accessible archive of requests made under the FOI Act for any past FOI requests for GP practice contact information. We found NHS England had refused a similar request for practice information stating that “NHS England does not hold information in relation to your request” [
However, the request was refused under 2 of the allowed exemptions in the FOI Act. The first was section 40 (an exemption relating to personal information). They argued it would be unfair to staff, who had signed up for one purpose, to be contacted for another purpose. The second was section 43 (an exemption relating to commercial interests). This is apparently because some of the GP email addresses had been purchased by NHS England from a third party under a license that forbids the NHS to share the information.
Having failed with one database we knew to exist, we made requests to every public body that might hold a database of GP email addresses. We preemptively included an argument that section 40 should not apply as these are work email addresses. All were refused, with similar arguments to those from NHS England or invoking section 21 (the information was already available—which is incorrect) or stating that they did not hold the information. The responses are summarized in
Summary of responses to Freedom of Information requests for general practitioner’s email addresses.
Body | Reasons for not supplying the data |
Department of Health and Social Care | Information not held [ |
NHSa England (new request) | S21 [ |
NHS England (follow-up) | S40 [ |
Medicines and Healthcare Products Regulatory Agency | S21, S40, S43 (their own commercial interests) [ |
NHS Business Services Authority | S21, S40, S43 [ |
NHS Digital | S21 and information not held [ |
aNHS: National Health Service.
In our view many of the responses gave the impression of an organization actively seeking ways to refuse releasing this information. NHSBSA argued that providing email addresses would damage commercial interests because it decreases security: “The e-mail addresses could be used by cyber criminals to target practices, CCGs etc. If such an attack was successful it could result in financial loss and/or loss of patient data.” This strikes us as an extremely unrealistic concern. The notion that hiding information intrinsically increases security has been long debunked in the security research community, where it is known disparagingly as “security through obscurity;” and in any case, the email addresses are all available through commercial data providers. Furthermore, most GP practices would expect to be contactable through email by their patients.
We were eventually able to run the randomized controlled trial (RCT) but only at greatly increased cost. We sent FOI requests to all 201 CCGs [
In summary, the reasons given for not supplying email addresses were inconsistent and sometimes hard to fathom. We believe the section 40 exemption (that it is unfair to individuals to release this data) is overused: given these are work addresses for managers, there would be a strong case for their release, based on current FOI guidelines. At the very least, given that 33% (66/201) of CCGs were willing to provide email addresses, the exemption is very unevenly interpreted and applied. We also understand that the vast majority of practices have a generic nhs.net inbox, which would certainly be exempt from section 40, but a list of even these email addresses is apparently unavailable.
Ultimately, it should not be difficult for researchers, health professionals, or indeed the public to have a way to contact any GP practice by email; and until this is the case, it should not be difficult (as it currently is) for a data consumer to establish definitively that there is no such resource. The problems we had assembling these data delayed the start of our RCT by several months. This delay indirectly affects care, as there is currently limited research available about how information is best disseminated through the NHS. We also note that the Secretary of State for Health and Social Care has prominently promoted the principle of using emails first, rather than letters, to communicate in the NHS. This is made harder if NHS organizations themselves are failing to make email addresses easily available, or actively blocking access.
There should be a contact database for general practitioner practice managers, including email addresses, which is available to the public. Currently the choice to make an email address public is taken by practices alone.
In the meantime, to save time and effort on the part of users of the data and NHS bodies, the fact that it is currently unavailable should be clearly documented in a single place, with an explanation, and suggestions for alternative sources.
For prescriptions written in primary care in England, the NHS reimburses community pharmacies for the medicines they purchase. The reimbursement price for generic medicines is set monthly by the Department of Health and Social Care (DHSC) in the NHS
Tariff prices and projected cost impact of price changes for Levetiracetam as on OpenPrescribing.net.
The tool was relatively easy to build. However, we could only build it after a large amount of difficult research and manual data editing. First, the data must be combined from spreadsheets found on 2 totally different websites, although both ultimately originate from DHSC. Second, each spreadsheet refers to the information in a different way, and they both provide separate files of data each month whose formats change over time. Finally, they are archived inconsistently, which makes it hard to locate historic data.
Finding the most recent data is relatively easy. Drug Tariff data are provided by the NHSBSA in a single location, which provides monthly spreadsheets for the last 2 years of the Drug Tariff [
To find earlier datasets for previous years, we turned to the NHSBSA FOI archive [
When archived data that are part of an already published time series are made available, the data should be published alongside the time series, not left in Freedom of Information requests.
Any data export process of relatively small datasets should ideally involve producing a single file of all the data each month. This avoids the problem of formats changing through time.
All data should be provided with identifiers.
All shared data should be indexed or indexable.
It is surprising that the price concession data are not already combined with Drug Tariff data somewhere in the National Health Service.
Price concessions should be mentioned in documentation wherever Drug Tariffs are mentioned.
As illustrated, although several NHS datasets are “open” by the narrowest definition, the NHS commonly breaches the principles of the Open Government White Paper and barely meets other best practice criteria such as the Berners-Lee 5-Star scale for open data. Collectively, the barriers described in this paper represent a substantial block to the development of new data-driven tools aiming to improve the quality, safety, and efficiency of NHS care. These barriers can be broadly divided into 4 areas: problems accessing the data, problems understanding the data, problems processing the data, and problems communicating with the NHS about the data. These imply 4 solutions: better curation, better documentation, better change management, and better dialogue with users. Here we summarize the barriers and offer some concrete suggestions of how the situation could be improved.
As documented in previous sections, there is a very substantial problem with curation of information in the NHS. Datasets are collected and shared at considerable expense but are then commonly undiscoverable or poorly indexed; they move location unpredictably, and often an interested user cannot establish whether a given dataset exists at all. The NHS England Data Catalogue [
Proactively, the NHS should invest in manually curating the data it already shares. This would entail detailed strategic input from experts in information management and librarianship; here we offer some brief principles. First, this curation should be done by people, with individuals or teams owning a particular topic area. Second, these teams must include domain experts already working within the relevant NHS organizations who understand the data. Third, instead of separate silos of data arranged by NHS organization, there should be a single location with links to all NHS data; and there should be confidence that all the data relevant to the topic are indexed in that one place as per best practice and government guidance [
Reactive curation also offers substantial benefits but at much lower cost. In short, where users are actively working on datasets, and they report to the NHS that something is missing, out of date, or poorly documented, then these errors, omissions, and shortcomings should be addressed and corrected. In short, there should be a simple means for users to report errors in the existing catalogs and for these errors to be corrected.
We can see no reason why any NHS resource should be behind a CAPTCHA, but if this is unavoidable, the reasons for this choice should be robustly documented and forewarning given in the catalog.
In the previous sections we have documented numerous cases where NHS datasets are hard to interpret because of poor documentation and where the NHS has not been reactive to questions around poor or absent documentation. Documentation is challenging and time-consuming. However, good documentation brings clear thinking: an organization that cannot produce or share documentation on the data it holds is unlikely to be working effectively with that data internally.
We think there is room for the “proactive and reactive” model described above. Proactively, every dataset should be accompanied by documentation that explains its context (how and why it is used in the NHS), its provenance, the meaning of each field, how often it is updated, and any known issues with the data. At minimum, datasets that are regularly downloaded, used, or enquired about should be prioritized for this best practice. Reactively, the NHS should respond to queries, and there should be easy routes for users to give feedback on ambiguities or errors in the documentation. However, the NHS should also work more strategically with end users of the data, as this is where the true value of that dataset is often realized: documentation should ideally be developed in the open, in collaboration with data consumers, to ensure it is current and relevant.
As documented in previous sections, there are substantial problems with NHS datasets changing structure and format over time, often without those changes being documented. Often these changes are trivial: the names of the columns in a 2-way table or their order. However, every time the format of a dataset changes then there is a material consequence, for every end user: the pipelines for importing and processing data will break, the fault must be discovered, and developers must work around it. Commonly, there seems to be no technical reason for the changes we have seen in NHS datasets: it is likely that these changes simply reflect carelessness, or a lack of interest and knowledge about how the data are being used and processed by end users.
In an ideal world, data formats would never change; however, occasional changes are inevitable. Therefore, clear communication of changes is vital. We suggest that every dataset should be accompanied by a change log in its catalog entry. This change log should describe the nature of each format change. Crucially, there should also be clear documentation of the reasons why the change has happened, as this is likely to act as an informal feedback system, prompting NHS staff to think through whether the change is really necessary. There should be a way for consumers of the data to subscribe to updates and receive advance notice of these changes and to provide feedback where changes have happened without documentation, prompting the change log to be updated.
A related issue is stability of data structures over time when working with older datasets. Users often want to automatically retrieve and process not only current data but historic data. In doing so, they hit 2 problems: finding all historic files and resolving the format differences between them. To aid discovery, the naming conventions for data (eg, “title, date”) should be documented in the same way as the data structure itself, and remain stable over time, to support automated retrieval. Where the formats of shared datasets must change, but the NHS holds historic data internally in a consistent format, then for all but the largest datasets, we suggest a bulk export of all historic data in the most current format should ideally be provided. Again, following the principle of transparency, where bulk exports are not possible, this should be mentioned and explained in the documentation.
We have returned repeatedly to the importance of better communication between data producers and consumers, if only as a means to reactively prioritize work around curation and documentation. In our view, this 2-way communication is vital: producers should be able to notify consumers of important changes in the data, and consumers should be able to notify producers of bugs in the data or ask questions. It is important to note here that good dialogue cannot be driven by a positive attitude alone: we have had many very positive interactions with single individuals in various NHS organizations who have been very helpful, but individuals can change jobs, or go on holiday, and finding the right person to talk to often relies on personal networks or sheer determination.
A strategic approach to dialogue requires good systems and formal structures. In short, the NHS needs a single place for users to ask questions about data, with the answers recorded and searchable in the public domain. This should be well publicized, open, public, and linked to liberally from across the NHS online estate. Given the NHS’ general commitment to transparency [
None of this requires custom software and could all be provided through standard, widely used, free, open services such as GitHub and GitLab. End users could contribute bug reports to the documentation; they could ask questions through the bug tracking systems; everyone could see everyone else’s input, helping raise standards and awareness; and the data producers could reply on built-in notification features to push feeds of updates to the end users. The most important part of solving this problem is not software but staff expertise and time. By reducing the friction between the 2 sides of the data exchange, better uses of data will emerge.
For clarity, this is not a “blue skies” or challenging suggestion. This is a standard way of working outside of the NHS, and it is how our own team works: we document every step of our problem solving publicly, in our closed and open “issues” on GitHub, which now number over 1000 [
Releasing data under open licenses was the starting point for open data and the open government movement, in which the United Kingdom has been a global leader. However, in our experience, the implementation of these open principles in the NHS has been absent or flawed, with poor documentation, poor curation, and poor dialogue presenting substantial barriers to innovation. We have chosen to spend time documenting these issues at length; many third parties confronted with similar barriers will either give up, concluding a service as impractical, or quietly expend substantial resource and effort on workarounds. This in turn will increase the cost of delivery, block innovation, and divert resources that should be spent on producing better services for clinicians, commissioners, and patients.
There is currently substantial appetite for better use of data and software in the NHS [
British National Formulary
clinical commissioning group
Department of Health and Social Care
Freedom of Information
general practitioner
National Institute for Health Research
National Health Service
National Health Service Business Services Authority
Office for National Statistics
Primary Care Support England
Practice Detailed Prescribing Information
randomized controlled trial
Systematized Nomenclature of Medicine
Structured Query Language
The barriers described in this paper, and the work to overcome them, represent the work of the authors’ whole technical and research team in the DataLab over the preceding 4 years. The authors are grateful for the hard work put in by all including the following: Richard Croker, Anna Powell-Smith, Dave Evans, Lisa French, Peter Inglesby, Alex Walker, Helen Curtis, and Brian MacKenna. They are also grateful for comments on an earlier draft from Jeni Tennison of the Open Data Institute, London, United Kingdom. Finally, the authors are grateful to those individuals from various NHS arm’s length bodies who have helped their team understand and overcome some of the barriers documented. There was no specific funding for this paper. The authors’ work on health care analytics is supported by the National Institute for Health Research (NIHR) Biomedical Research Centre, Oxford; a Health Foundation grant (award reference number 7599); an NIHR School of Primary Care Research grant (award reference number 327). Funders had no role in the study design, collection, analysis, and interpretation of data; in the writing of the report; and in the decision to submit the paper for publication.
BG and SB conceived the paper. SB wrote the first draft. Both authors revised and approved the final manuscript. BG supervised the project and is guarantor.
All authors have completed the standard ICMJE uniform disclosure form and declare the following: BG has received research funding from the Laura and John Arnold Foundation, the Wellcome Trust, the Oxford Biomedical Research Centre, the NHS NIHR School of Primary Care Research, the Health Foundation, and the World Health Organization; he also receives personal income from speaking and writing for lay audiences on the misuse of science and is Chair of the Health Tech Advisory Board, reporting to the Secretary of State for Health and Social Care. SB is employed on BG’s grants.