<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.0 20040830//EN" "http://dtd.nlm.nih.gov/publishing/2.0/journalpublishing.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" article-type="article-commentary" dtd-version="2.0">
  <front>
    <journal-meta>
      <journal-id journal-id-type="publisher-id">JMIR</journal-id>
      <journal-id journal-id-type="nlm-ta">J Med Internet Res</journal-id>
      <journal-title>Journal of Medical Internet Research</journal-title>
      <issn pub-type="epub">1438-8871</issn>
      <publisher>
        <publisher-name>JMIR Publications</publisher-name>
        <publisher-loc>Toronto, Canada</publisher-loc>
      </publisher>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="publisher-id">v25i1e46700</article-id>
      <article-id pub-id-type="pmid">36995757</article-id>
      <article-id pub-id-type="doi">10.2196/46700</article-id>
      <article-categories>
        <subj-group subj-group-type="heading">
          <subject>Commentary</subject>
        </subj-group>
        <subj-group subj-group-type="article-type">
          <subject>Commentary</subject>
        </subj-group>
      </article-categories>
      <title-group>
        <article-title>mHealth Systems Need a Privacy-by-Design Approach: Commentary on “Federated Machine Learning, Privacy-Enhancing Technologies, and Data Protection Laws in Medical Research: Scoping Review”</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="editor">
          <name>
            <surname>Leung</surname>
            <given-names>Tiffany</given-names>
          </name>
        </contrib>
      </contrib-group>
      <contrib-group>
        <contrib id="contrib1" contrib-type="author" corresp="yes">
          <name name-style="western">
            <surname>Tewari</surname>
            <given-names>Ambuj</given-names>
          </name>
          <degrees>PhD</degrees>
          <xref rid="aff1" ref-type="aff">1</xref>
          <address>
            <institution>Department of Statistics</institution>
            <institution>University of Michigan</institution>
            <addr-line>1085 S University Ave</addr-line>
            <addr-line>Ann Arbor, MI, 48109-1107</addr-line>
            <country>United States</country>
            <phone>1 734 615 0928</phone>
            <email>tewaria@umich.edu</email>
          </address>
          <xref rid="aff2" ref-type="aff">2</xref>
          <ext-link ext-link-type="orcid">https://orcid.org/0000-0001-6969-7844</ext-link>
        </contrib>
      </contrib-group>
      <aff id="aff1">
        <label>1</label>
        <institution>Department of Statistics</institution>
        <institution>University of Michigan</institution>
        <addr-line>Ann Arbor, MI</addr-line>
        <country>United States</country>
      </aff>
      <aff id="aff2">
        <label>2</label>
        <institution>Department of Electrical Engineering and Computer Science</institution>
        <institution>University of Michigan</institution>
        <addr-line>Ann Arbor, MI</addr-line>
        <country>United States</country>
      </aff>
      <author-notes>
        <corresp>Corresponding Author: Ambuj Tewari <email>tewaria@umich.edu</email></corresp>
      </author-notes>
      <pub-date pub-type="collection">
        <year>2023</year>
      </pub-date>
      <pub-date pub-type="epub">
        <day>30</day>
        <month>3</month>
        <year>2023</year>
      </pub-date>
      <volume>25</volume>
      <elocation-id>e46700</elocation-id>
      <history>
        <date date-type="received">
          <day>21</day>
          <month>2</month>
          <year>2023</year>
        </date>
        <date date-type="accepted">
          <day>22</day>
          <month>2</month>
          <year>2023</year>
        </date>
      </history>
      <copyright-statement>©Ambuj Tewari. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 30.03.2023.</copyright-statement>
      <copyright-year>2023</copyright-year>
      <license license-type="open-access" xlink:href="https://creativecommons.org/licenses/by/4.0/">
        <p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.</p>
      </license>
      <self-uri xlink:href="https://www.jmir.org/2023/1/e46700" xlink:type="simple"/>
      <related-article related-article-type="commentary-article" id="v25i1e41588" ext-link-type="doi" xlink:href="10.2196/41588" vol="25" page="e41588" xlink:type="simple">http://www.jmir.org/2023/1/e41588/</related-article>
      <abstract>
        <p>Brauneck and colleagues have combined technical and legal perspectives in their timely and valuable paper “Federated Machine Learning, Privacy-Enhancing Technologies, and Data Protection Laws in Medical Research: Scoping Review.” Researchers who design mobile health (mHealth) systems must adopt the same privacy-by-design approach that privacy regulations (eg, General Data Protection Regulation) do. In order to do this successfully, we will have to overcome implementation challenges in privacy-enhancing technologies such as differential privacy. We will also have to pay close attention to emerging technologies such as private synthetic data generation.</p>
      </abstract>
      <kwd-group>
        <kwd>mHealth</kwd>
        <kwd>differential privacy</kwd>
        <kwd>private synthetic data</kwd>
        <kwd>federated learning</kwd>
        <kwd>data protection regulation</kwd>
        <kwd>data protection by design</kwd>
        <kwd>privacy protection</kwd>
        <kwd>General Data Protection Regulation</kwd>
        <kwd>GDPR compliance</kwd>
        <kwd>privacy-preserving technologies</kwd>
        <kwd>secure multiparty computation</kwd>
        <kwd>multiparty computation</kwd>
        <kwd>machine learning</kwd>
        <kwd>privacy</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec sec-type="introduction">
      <title>Introduction</title>
      <p>Brauneck et al [<xref ref-type="bibr" rid="ref1">1</xref>] should be congratulated for reviewing privacy-enhancing technologies (PETs) from a legal standpoint. The right to privacy is a fundamental human right, the importance of which in the current digital age cannot be overstated. Protecting this basic human right will need the cooperation of scholars and experts from many disciplines. It is therefore heartening to see legal experts joining hands with technical experts to engage in a thoughtful discussion of how General Data Protection Regulation (GDPR) legislation in the European Union relates to commonly used PETs including federated learning (FL), differential privacy (DP), and secure multiparty computation (SMPC).</p>
      <p>The GDPR recognizes that privacy should be a primary design consideration when designing systems that deal with personal data. Privacy is not something to be added on as an afterthought once the system has already been designed. Researchers in the health sciences, especially mobile health (mHealth), are beginning to adopt a “privacy-by-design” mindset. My own group at the University of Michigan and my clinical collaborators have started to seriously study privacy in the context of mHealth [<xref ref-type="bibr" rid="ref2">2</xref>,<xref ref-type="bibr" rid="ref3">3</xref>], but much remains to be done.</p>
    </sec>
    <sec>
      <title>Differential Privacy</title>
      <p>Brauneck et al [<xref ref-type="bibr" rid="ref1">1</xref>] correctly point out that FL alone does not sufficiently protect user privacy. This is well known. In fact, the original paper that proposed FL itself pointed out that FL will have to be supplemented with technologies such as DP and SMPC to achieve adequate privacy protection. In this commentary, I will primarily focus on DP. Since I am not a legal expert, my comments will necessarily be from a technical perspective.</p>
      <p>DP and its variants have emerged as a leading PET. It has been adopted by technology companies such as Apple and Google. The US Census Bureau also chose it for the 2020 US Census. Calls to revisit foundational statistical theory to incorporate privacy constraints have also formulated the problem using DP [<xref ref-type="bibr" rid="ref4">4</xref>].</p>
      <p>DP has some clear strengths. It is a clear formalism with desirable theoretical properties and increasing software support. However, the epsilon parameter in DP is hard to interpret in the context of specific applications. Its mathematical meaning is precise, but it is often very hard to choose a good value of epsilon to achieve a careful balance between privacy and statistical utility. Researchers have proposed building an “Epsilon Registry” to help the community make sensible implementation choices [<xref ref-type="bibr" rid="ref5">5</xref>]. More community efforts, especially from the medical informatics community, will be needed to successfully realize the potential of DP.</p>
      <p>It is also important to note that recent DP literature is nicely complemented by older statistics literature on statistical disclosure control [<xref ref-type="bibr" rid="ref6">6</xref>]. It is unlikely that a one-size-fits-all solution will emerge for all data protection scenarios. It is therefore important for system designers to have a broad understanding of available tools. Moreover, old and new tools need to be examined from a legal perspective just as Brauneck et al [<xref ref-type="bibr" rid="ref1">1</xref>] have done for FL, DP, and SMPC. This is challenging because technology and the law are both undergoing changes. Hopefully, PETs and privacy laws will coevolve so that society will benefit from the ongoing data revolution without threats to the fundamental human right to privacy.</p>
    </sec>
    <sec>
      <title>Private Synthetic Data</title>
      <p>Brauneck et al [<xref ref-type="bibr" rid="ref1">1</xref>] do not mention private synthetic data generation as a PET, but I believe that private synthetic data has tremendous potential for enabling data-driven innovation in health care without sacrificing privacy. The use case the authors considered is one where a data processing workflow (eg, FL) needs to be modified to ensure that it satisfies DP. A different use case is where we simply publish synthetic data that “is similar” to the original sensitive data but which protects user privacy (eg, in the DP sense). This way, downstream data analysts do not have to modify their workflows and can simply work with the synthetic data just as they would with the original data.</p>
      <p>However, what does it mean to “be similar” to the original data set? One possibility is that one might hope to preserve correlations between attributes. For a while, it was thought that this could only be done using methods that will be computationally intractable. However, there is recent progress in this area [<xref ref-type="bibr" rid="ref7">7</xref>], which has renewed interest in the possibility of generating statistically useful synthetic data that nevertheless provably protects user privacy.</p>
    </sec>
  </body>
  <back>
    <app-group/>
    <glossary>
      <title>Abbreviations</title>
      <def-list>
        <def-item>
          <term id="abb1">DP</term>
          <def>
            <p>differential privacy</p>
          </def>
        </def-item>
        <def-item>
          <term id="abb2">FL</term>
          <def>
            <p>federated learning</p>
          </def>
        </def-item>
        <def-item>
          <term id="abb3">GDPR</term>
          <def>
            <p>General Data Protection Regulation</p>
          </def>
        </def-item>
        <def-item>
          <term id="abb4">mHealth</term>
          <def>
            <p>mobile health</p>
          </def>
        </def-item>
        <def-item>
          <term id="abb5">PET</term>
          <def>
            <p>privacy-enhancing technology</p>
          </def>
        </def-item>
        <def-item>
          <term id="abb6">SMPC</term>
          <def>
            <p>secure multiparty computation</p>
          </def>
        </def-item>
      </def-list>
    </glossary>
    <fn-group>
      <fn fn-type="conflict">
        <p>None declared.</p>
      </fn>
    </fn-group>
    <ref-list>
      <ref id="ref1">
        <label>1</label>
        <nlm-citation citation-type="journal">
          <person-group person-group-type="author">
            <name name-style="western">
              <surname>Brauneck</surname>
              <given-names>A</given-names>
            </name>
            <name name-style="western">
              <surname>Schmalhorst</surname>
              <given-names>L</given-names>
            </name>
            <name name-style="western">
              <surname>Majdabadi</surname>
              <given-names>MMK</given-names>
            </name>
            <name name-style="western">
              <surname>Bakhtiari</surname>
              <given-names>M</given-names>
            </name>
            <name name-style="western">
              <surname>Völker</surname>
              <given-names>U</given-names>
            </name>
            <name name-style="western">
              <surname>Baumbach</surname>
              <given-names>J</given-names>
            </name>
            <name name-style="western">
              <surname>Baumbach</surname>
              <given-names>L</given-names>
            </name>
            <name name-style="western">
              <surname>Buchholtz</surname>
              <given-names>G</given-names>
            </name>
          </person-group>
          <article-title>Federated Machine Learning, Privacy-Enhancing Technologies, and Data Protection Laws in Medical Research: Scoping Review</article-title>
          <source>J Med Internet Res</source>
          <year>2023</year>
          <volume>25</volume>
          <fpage>e41588</fpage>
          <comment>
            <ext-link ext-link-type="uri" xlink:type="simple" xlink:href="https://www.jmir.org/2023/1/e41588/"/>
          </comment>
          <pub-id pub-id-type="doi">10.2196/41588</pub-id>
        </nlm-citation>
      </ref>
      <ref id="ref2">
        <label>2</label>
        <nlm-citation citation-type="journal">
          <person-group person-group-type="author">
            <name name-style="western">
              <surname>Liu</surname>
              <given-names>JC</given-names>
            </name>
            <name name-style="western">
              <surname>Goetz</surname>
              <given-names>J</given-names>
            </name>
            <name name-style="western">
              <surname>Sen</surname>
              <given-names>S</given-names>
            </name>
            <name name-style="western">
              <surname>Tewari</surname>
              <given-names>A</given-names>
            </name>
          </person-group>
          <article-title>Learning from others without sacrificing privacy: simulation comparing centralized and federated machine learning on mobile health data</article-title>
          <source>JMIR Mhealth Uhealth</source>
          <year>2021</year>
          <month>03</month>
          <day>30</day>
          <volume>9</volume>
          <issue>3</issue>
          <fpage>e23728</fpage>
          <comment>
            <ext-link ext-link-type="uri" xlink:type="simple" xlink:href="https://mhealth.jmir.org/2021/3/e23728/"/>
          </comment>
          <pub-id pub-id-type="doi">10.2196/23728</pub-id>
          <pub-id pub-id-type="medline">33783362</pub-id>
          <pub-id pub-id-type="pii">v9i3e23728</pub-id>
          <pub-id pub-id-type="pmcid">PMC8044739</pub-id>
        </nlm-citation>
      </ref>
      <ref id="ref3">
        <label>3</label>
        <nlm-citation citation-type="journal">
          <person-group person-group-type="author">
            <name name-style="western">
              <surname>Shen</surname>
              <given-names>A</given-names>
            </name>
            <name name-style="western">
              <surname>Francisco</surname>
              <given-names>L</given-names>
            </name>
            <name name-style="western">
              <surname>Sen</surname>
              <given-names>S</given-names>
            </name>
            <name name-style="western">
              <surname>Tewari</surname>
              <given-names>A</given-names>
            </name>
          </person-group>
          <article-title>Exploring the relationship between privacy and utility in mobile health: a simulation of federated learning, differential privacy, and external attacks</article-title>
          <source>J Med Internet Res (forthcoming)</source>
          <year>2023</year>
          <pub-id pub-id-type="doi">10.2196/43664</pub-id>
        </nlm-citation>
      </ref>
      <ref id="ref4">
        <label>4</label>
        <nlm-citation citation-type="confproc">
          <person-group person-group-type="author">
            <name name-style="western">
              <surname>Wainwright</surname>
              <given-names>MJ</given-names>
            </name>
          </person-group>
          <article-title>Constrained forms of statistical minimax: computation, communication and privacy</article-title>
          <source>Proceedings of the International Congress of Mathematicians</source>
          <year>2014</year>
          <conf-name>ICM 2014</conf-name>
          <conf-date>Aug 13-21, 2014</conf-date>
          <conf-loc>Seoul, South Korea</conf-loc>
          <comment>
            <ext-link ext-link-type="uri" xlink:type="simple" xlink:href="https://people.eecs.berkeley.edu/~wainwrig/Barcelona14/Wainwright_ICM14.pdf"/>
          </comment>
        </nlm-citation>
      </ref>
      <ref id="ref5">
        <label>5</label>
        <nlm-citation citation-type="journal">
          <person-group person-group-type="author">
            <name name-style="western">
              <surname>Dwork</surname>
              <given-names>C</given-names>
            </name>
            <name name-style="western">
              <surname>Kohli</surname>
              <given-names>N</given-names>
            </name>
            <name name-style="western">
              <surname>Mulligan</surname>
              <given-names>D</given-names>
            </name>
          </person-group>
          <article-title>Differential privacy in practice: expose your epsilons</article-title>
          <source>JPC</source>
          <year>2019</year>
          <month>10</month>
          <day>20</day>
          <volume>9</volume>
          <issue>2</issue>
          <pub-id pub-id-type="doi">10.29012/jpc.689</pub-id>
        </nlm-citation>
      </ref>
      <ref id="ref6">
        <label>6</label>
        <nlm-citation citation-type="journal">
          <person-group person-group-type="author">
            <name name-style="western">
              <surname>Slavković</surname>
              <given-names>A</given-names>
            </name>
            <name name-style="western">
              <surname>Seeman</surname>
              <given-names>J</given-names>
            </name>
          </person-group>
          <article-title>Statistical data privacy: a song of privacy and utility</article-title>
          <source>Annu Rev Stat Appl</source>
          <year>2022</year>
          <month>11</month>
          <day>18</day>
          <volume>10</volume>
          <issue>1</issue>
          <pub-id pub-id-type="doi">10.1146/annurev-statistics-033121-112921</pub-id>
        </nlm-citation>
      </ref>
      <ref id="ref7">
        <label>7</label>
        <nlm-citation citation-type="journal">
          <person-group person-group-type="author">
            <name name-style="western">
              <surname>He</surname>
              <given-names>Y</given-names>
            </name>
            <name name-style="western">
              <surname>Vershynin</surname>
              <given-names>R</given-names>
            </name>
            <name name-style="western">
              <surname>Zhu</surname>
              <given-names>Y</given-names>
            </name>
          </person-group>
          <article-title>Algorithmically effective differentially private synthetic data</article-title>
          <source>arXiv.</source>
          <comment>Preprint posted online Feb 11, 2023</comment>
          <pub-id pub-id-type="doi">10.48550/arXiv.2302.05552</pub-id>
        </nlm-citation>
      </ref>
    </ref-list>
  </back>
</article>
