This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included.
Transition to digital pathology usually takes months or years to be completed. We were familiarizing ourselves with digital pathology solutions at the time when the COVID-19 outbreak forced us to embark on an abrupt transition to digital pathology.
The aim of this study was to quantitatively describe how the abrupt transition to digital pathology might affect the quality of diagnoses, model possible causes by probabilistic modeling, and qualitatively gauge the perception of this abrupt transition.
A total of 17 pathologists and residents participated in this study; these participants reviewed 25 additional test cases from the archives and completed a final psychologic survey. For each case, participants performed several different diagnostic tasks, and their results were recorded and compared with the original diagnoses performed using the gold standard method (ie, conventional microscopy). We performed Bayesian data analysis with probabilistic modeling.
The overall analysis, comprising 1345 different items, resulted in a 9% (117/1345) error rate in using digital slides. The task of differentiating a neoplastic process from a nonneoplastic one accounted for an error rate of 10.7% (42/392), whereas the distinction of a malignant process from a benign one accounted for an error rate of 4.2% (11/258). Apart from residents, senior pathologists generated most discrepancies (7.9%, 13/164). Our model showed that these differences among career levels persisted even after adjusting for other factors.
Our findings are in line with previous findings, emphasizing that the duration of transition (ie, lengthy or abrupt) might not influence the diagnostic performance. Moreover, our findings highlight that senior pathologists may be limited by a digital gap, which may negatively affect their performance with digital pathology. These results can guide the process of digital transition in the field of pathology.
Digital pathology (DP) intends to use computer workstations and digital whole slide imaging to diagnose a pathological process [
Most of the reported discordances between diagnoses in DP and those by the gold standard (ie, evaluation of a glass slide under a microscope) are less than 10% [
In this study, we aimed to (1) quantitatively describe how abrupt transition to DP might affect the quality of diagnosis, (2) model the possible causes via probabilistic modeling, and (3) qualitatively gauge the perception of this abrupt transition.
A detailed description of the study methods is described in
No ethics approval was required for this study. The study participants (ie, pathologists and residents) agreed to—and coauthored—the study.
This study involved 17 participants who were divided into the following 4 groups or career levels, based on their pathology experience: (1) senior (pathologists with >20 years of experience, n=2), (2) expert (pathologists with 10-20 years of experience, n=5), (3) junior (pathologists with <10 years of experience, n=6), and (4) resident (1st year, n=1; 2nd year, n=3). Each of the 17 participants evaluated 25 digital cases, with a total of 425 digital images examined in the study. Overall, 1445 questions were examined (ie, 85 questions per participant) in the study.
In addition to their own diagnostic tasks, which were not considered in this study, the pathologists and residents received (1) a set of digital cases within the area of general surgical pathology, (2) specific questions to be addressed while reviewing the cases, and (3) a survey about their digital experience.
We set up 5 sets of digital cases representing 3 different specialties (breast: n=2; urology: n=1; and gastrointestinal: n=2) and assigned them to each study participant. Each test comprised 5 cases, represented by one or more slides of a single case that was previously diagnosed using conventional microscopy by the referral pathologist at our institution. The information reported about the original diagnosis was considered as the gold standard. To cover a spectrum of conditions overlapping the routine situation, we considered biopsy and surgical specimens (specimen type). Cases were digitalized using the Aperio AT2 scanner (Leica Biosystems) and visualized using the WebViewer APERIO ImageScope (version 12.1). The slides used for the tests were from 8 nontumoral and 17 tumoral cases. Of the tumoral cases, 7 tumors were benign and 10 were malignant; all malignant tumors were infiltrative and equally distributed between grade 2 and grade 3; 14 cases were biopsy and 11 were surgical.
Participants answered (all or some) of the following questions (ie, categories of diagnostic task), for each case: (1) Is it neoplastic or negative for neoplasia? (2) Is it a malignant (in situ or infiltrative) or a benign neoplasia? (3) What is the histopathological diagnosis? (4) What is the histotype of the lesion? (5) What is the grade of the lesion? Questions 1 and 3 were answered for all cases, question 2 was answered only for neoplastic lesions, and questions 4 and 5 were answered for malignant neoplasms.
To model data clusters, we used a varying effects, multilevel (hierarchical) model [
For each pathologist (PID), their career level (LEVEL), the specific diagnostic question (CATEGORY), the specimen type (SPECIMEN), and the subspecialty of the case (SPECIALTY), we used the logit link function and modeled the varying intercepts as follows:
The prior distribution for the intercepts and SD values were as follows:
The hyperpriors for the hyperparameters average pathologist
The SD value for
The survey was inspired by previous published works [
The pathologists answered 1345 of the total 1445 questions (100 missing answers in total), of which 1228 (91.30%) corresponded to the original diagnoses and were considered correct.
Proportion of errors among different groups.
Group | Number of tasks performed | Number of errors | Error rate | |
|
||||
|
P1 | 84 | 5 | 0.06 |
|
P2 | 78 | 4 | 0.05 |
|
P3 | 82 | 7 | 0.09 |
|
P4 | 67 | 1 | 0.01 |
|
P5 | 82 | 7 | 0.09 |
|
P6 | 82 | 6 | 0.07 |
|
P7 | 83 | 2 | 0.02 |
|
P8 | 84 | 3 | 0.04 |
|
P9 | 82 | 5 | 0.06 |
|
P10 | 83 | 3 | 0.04 |
|
P11 | 82 | 9 | 0.11 |
|
P12 | 83 | 3 | 0.04 |
|
P13 | 81 | 26 | 0.32 |
|
P14 | 64 | 9 | 0.14 |
|
P15 | 84 | 12 | 0.14 |
|
P16 | 79 | 9 | 0.11 |
|
P17 | 65 | 6 | 0.09 |
|
||||
|
Resident | 310 | 47 | 0.15 |
|
Junior | 460 | 30 | 0.07 |
|
Expert | 411 | 27 | 0.07 |
|
Senior | 164 | 13 | 0.08 |
|
||||
|
Neoplasia? | 392 | 42 | 0.11 |
|
Malignant/benign? | 258 | 11 | 0.04 |
|
Histopathological diagnosis? | 388 | 35 | 0.09 |
|
Histotype? | 160 | 2 | 0.01 |
|
Grade? | 147 | 27 | 0.18 |
|
||||
|
Surgery | 716 | 40 | 0.06 |
|
Biopsy | 629 | 77 | 0.12 |
|
||||
|
Breast | 550 | 64 | 0.12 |
|
Gastrointestinal | 497 | 40 | 0.08 |
Urology | 298 | 13 | 0.04 | |
Total | 1345 | 117 | 0.09 |
Error rates among different categories. This dot-bar plot depicts the median (IQR) values of error rates among different categories. The error rates showed the widest IQR among individual pathologists (PID), whereas the least IQR was noted for the career level and the specimen type (biopsy vs surgical).
Differences in error rates for two important tasks—differentiation between neoplastic and nonneoplastic processes and that between benign and malignant neoplastic processes—were observed among pathologists at different career levels and for different specimen types. The same error profile was observed across career levels, although the former task had a higher error rate (
Raw proportion of errors across (A) career levels and (B) specimen types in performing two important tasks: differentiation between neoplastic and nonneoplastic processes and between malignant and benign tumors.
Diagnostics of the model’s fit are shown in
Prediction of average pathologist performance. Pathologists of intermediate levels of career perform better on average. The graph depicts the posterior predictive distributions for the multilevel model. Solid lines represent posterior mean values; shaded regions represent 89% high-posterior density interval; and dashed lines represent raw data.
Most pathologists reported a very good score (ie, 4 or 5 indicating they “moderately agree” and “strongly agree,” respectively) for their attitude toward DP (44/68, 64%), confidence in DP (75/119, 63%), and satisfaction with DP (56/102, 54.9%). A detailed analysis of these parameters showed that the residents reported the highest value for confidence, junior pathologists reported the highest values for attitude and satisfaction, whereas expert and senior pathologists reported relatively lower levels of confidence in and satisfaction with DP (
Overview of the psychological aspect of the study. This series of graphs summarize the results of the survey conducted among pathologists at different career levels (residents, junior, expert, and senior) to evaluate their attitudes toward, confidence in, and satisfaction with digital pathology solutions.
Our study showed an overall discordant rate of 9% among diagnoses performed using digital slides and those performed using the gold standard (ie, conventional microscopy). However, when we considered the different diagnostic tasks, this rate dropped to less than 5% in the category “benign versus malignant tumor”, which is probably the most clinically impacting category among the other diagnostic tasks. A systematic review of 38 pertinent studies published before 2015 reported a 7.6% overall discordance rate between digital and glass slide diagnoses. Among these studies, 17 studies reported a discordant rate higher than 5%, and 8 reported a discordant rate higher than 15% [
In our study, a high proportion of errors was generated in small biopsy specimen type (12.2%) and diagnostic tasks involving tumor grading (23%). These results are consistent with those of the review by Williams et al [
Moreover, recent studies have consistently reported high, intermediate, and low discordant rates for bladder, breast, and gastrointestinal tract specimens, respectively [
As compared by the study by Hanna et al [
Our study describes how the abrupt transition to DP affected the quality of diagnoses and qualitatively gauged the psychological aspects of this abrupt transition. Moreover, our study model highlighted the potential causes for these challenges and might inform what could be expected in other laboratories. In conclusion, the exceptional conditions dictated by the COVID-19 pandemic highlighted that DP could be adopted safely for diagnostic purposes by any skilled pathologist, even abruptly.
Supplementary materials and methods.
Coefficients of model parameters from the prior predictive simulation.
Simulation from the prior. This figure shows the meaning of the priors (ie, what the model thinks before it sees the data).
Proportion of errors among individual pathologists. Upper left panel shows the overall error rates. Upper right panel shows the error rates among different diagnostic tasks. Lower left panel shows the error rate among different specimen types. Lower right panel highlights the different error rates among different case subspecialties. GI: gastrointestinal, Uro: urology.
Proportion of errors among different career levels. Upper left panel shows the overall error rates. Upper right panel shows the error rates among the different diagnostic tasks. Lower left panel shows the error rate among different specimen types. Lower right panel highlights the different error rates among different case subspecialties. GI: gastrointestinal, Uro: urology.
Traceplot of the model fit - part A.
Traceplot of the model fit - part B.
Traceplot of the model fit - part C.
Model coefficients. Graphical representation of the coefficients for the model parameters conditional on the data. The lowest box depicts the coefficients for the hyper-parameter α¯ (alpha_bar) and the variances – the σ (sigma_a, b, [...] e) – of the categories of clusters modeled. All other boxes depict the distributions of the mean value for each element of the category considered. From top to bottom: the first box depicts the parameters of the pathologists’ performance; the second, the parameters regarding the career level; the third, the diagnostic category analyzed; the fourth, the specimen type; and the fifth, the case subspecialty. Interpretation of the model at the parameter level is not possible because they combine in a very complicated way: prediction (ie, see how the model behave on the outcome scale, Figure 4 in the manuscript) is the only practical way to understand what the model “thinks”.
digital pathology
None declared.