Abstract
Background: Cataracts are an eye condition characterized by high prevalence and blindness-inducing potential, and effective approaches are required for their early diagnosis, underscoring the clinical significance of this study.
Objective: This study aims to evaluate the performance of deep learning (DL) in cataract diagnosis and assess its potential as an effective tool for automated diagnosis, and compare the diagnostic accuracy of DL versus both machine learning and human experts.
Methods: A systematic search was conducted in Web of Science, Embase, IEEE Xplore, PubMed, and Cochrane Library until April 1, 2025, for studies on image-based DL for cataract detection or clinical subtype classification. The included studies were assessed for the risk of bias (RoB) using Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2). Bivariate mixed effects models were used for data analyses, and publication bias was assessed by Deeks’ funnel plots.
Results: Sixty-three studies were finally included. The quality assessment indicated a high or unclear RoB in the patient selection (34 studies) and index test (44 studies) domains. Meanwhile, in the reference standard domain, the risk of bias was high or unclear in only 2 studies. Image-based DL achieved a sensitivity of 96% (95% CI 0.95‐0.97) and a specificity of 98% (0.96‐0.98) for cataract detection, with an area under the ROC curve (AUC) of 0.99 (0.98‐1.00). For cataract classification, the sensitivity and specificity of image-based DL were 94% (0.93‐0.96) and 97% (0.96‐0.98), respectively, with an AUC of 0.99 (0.98‐0.99). Despite the strong overall performance, the model’s generalization capability was challenged by its lower performance observed on independent external datasets (detection: sensitivity 87%, specificity 93%; classification: sensitivity 89%, specificity 90%), potentially attributable to domain shift between the training and validation data.
Conclusions: Image-based DL has demonstrated high precision in the detection and classification of cataracts, showing potential advantages over traditional machine learning methods, though validation remains limited. Its performance falls within the range of reported accuracy of human experts, highlighting the high feasibility of automated diagnosis. However, validation data limitations, coupled with moderate-quality evidence and high heterogeneity, constrain the utility of DL in auxiliary diagnosis. The model’s sensitivity dropped to 87% in external validation, restricting its generalization capability, so caution should be exercised in broad clinical implementation.
doi:10.2196/78869
Keywords
Introduction
The lens is a biconvex, transparent structure in the anterior segment of the eye, which focuses light to project images of objects at varying distances onto the retina. A cataract, defined as clouding of the lens, is primarily an age-related degenerative disease, and congenital and pediatric cataracts also occur []. Early cataracts are asymptomatic, but progressive clouding can lead to visual impairment, greatly reducing quality of life and productivity []. Cataracts are the major cause of visual loss worldwide. In 2020, the estimated number of the blind aged 50 years and older was 15 million, and moderate to severe visual disorders due to cataracts affected an estimated 79 million people aged 50 years and older []. These figures suggest the number of patients with blindness and moderate to severe visual disorders rose by 30% and 93%, respectively, versus the year 2000 []. Blinding cataracts refer to cataracts causing severe visual impairment (visual acuity ≤3/60) or blindness (visual acuity <3/60) according to the WHO (World Health Organization) and ICD-11 (International Classification of Diseases, 11th Revision) []. About 94% of blinding cataracts occur in low- and middle-income countries (LMICs), and cataract-related visual disorders are causally related to poverty in low-resource settings (LRS) []. As the conventional diagnosis of cataracts relies on ophthalmologists and complex equipment, it is subjective and requires health care resources, making it difficult to satisfy the need for large-scale screening. Moreover, due to the limitations of infrastructure and the lack of trained personnel in LMICs, inequalities are present in early cataract detection and severity classification.
Deep learning (DL) can process pixel-level information that cannot be recognized by the human eye to greatly contribute to the analysis of medical images, assist doctors in clinical decision-making, and enhance screening efficiency. As a branch of artificial intelligence (AI), DL models are inspired by the brain and specialized in pattern recognition []. Therefore, DL has been rapidly evolving with broad prospects in the field of medical image analysis involving disease detection, classification, segmentation, and image registration []. For example, convolutional neural networks (CNNs), as the primary technique of DL for image learning, perform excellently in image classification and feature extraction, making it a cornerstone in medical imaging []. Residual network (ResNet), one of the landmark architectures of CNNs, has contributed to the development of DL, especially with outstanding performance in benchmarking in image recognition and classification []. The superiority of the ResNet model in the medical field lies in its capability to efficiently train the deep network and raise the accuracy of image recognition [].
The strategic importance of AI is to raise the quality of care and potentially reduce costs in high-income economies and to address critical health care issues and staffing shortages and provide access to specialized skills in LMICs []. Nowadays, DL is developed mostly based on data from high-income countries and regions and relies on high-resolution images and advanced electronic devices. In remote regions, however, fragmented health care systems are generally characterized by insufficient infrastructure, a shortage of professionals, and a lack of health care resources, so it is difficult to guarantee the quality of screening and diagnosis. The following problems are present in the available DL models: (1) lack of compatibility: mismatch with the low-cost equipment used in LRS []; (2) lack of generalization capability: significant decline in the model performance in real-world scenarios, especially under different lighting conditions or in different patient populations []; (3) lack of clinical validation: a recent systematic review of studies assessing the use of AI algorithms for medical image analysis found that only 6% of the included studies (n=516) conducted external validation, and the assessment of the diagnostic efficiency was lacking []; and (4) training data bias and lack of diversity in data (race, age, and disease subtypes) weaken the model’s generalization capability [].
In the past 5 years, breakthroughs have been made in the use of DL for ophthalmic image analysis in diabetic retinopathy (DR), retinopathy of prematurity, and glaucoma. By automating the massive processing of ophthalmic images, DL can achieve a more accurate and rapid diagnosis of cataracts, reducing the doctors’ subjectivity and errors of conventional methods. Moreover, DL can fuse multiple image data (eg, slit lamp, fundus, and optical coherence tomography [OCT] images) to make a comprehensive and accurate diagnosis through multimodal image analysis. In addition, diagnostic results can be obtained quickly from the DL model with real-time diagnostic capability, greatly improving work efficiency and achieving large-scale screening and early diagnosis, especially in primary health care institutions and LRS []. Many DL-based diagnostic tools have been approved by the US Food and Drug Administration (FDA), but they require further evaluation and independent quality review []. For example, IDx-DR [,], the first AI system approved by the FDA, is a CNN-based system for automated screening for DR, with high sensitivity and specificity, which can contribute to the early diagnosis and lower the risk of visual loss in clinical practice, especially in LRS.
Image-based DL has exhibited greater potential for automatic cataract detection and classification using fundus and slit lamp images. However, the existing findings are still heterogeneous, and a systematic review of DL algorithms for cataract image analysis is still lacking. Therefore, this study was conducted to systematically assess the performance of different DL models in cataract detection and classification from sensitivity, specificity, and the area under the ROC curve (AUC), thereby revealing the methodological and reporting quality and contributing to the clinical translation of DL.
Methods
Registration and Study Design
This study adhered to the PRISMA-DTA (Preferred Reporting Items for Systematic Reviews and Meta-analyses of Diagnostic Test Accuracy Studies) [], and the study protocol was registered on PROSPERO (International Prospective Register of Systematic Reviews; CRD420251030230). We acknowledge that the registration was completed on April 10, 2025, following the initial literature search conducted on April 1, 2025; thus, this is a retrospective registration.
Search Strategy, Eligibility Criteria, and Data Extraction
Based on the predefined criteria, 2 investigators (RXL and HYL) independently searched Web of Science, IEEE Xplore, Embase, PubMed, and Cochrane Library up to April 1, 2025, for studies published from 2019 to 2025.
The retrieved studies were first imported into EndNote to eliminate duplicate publications. Then the title and abstract were read, and the full text of clearly or potentially eligible studies was examined. No restrictions were imposed on the geographical location or study setting. The following study types were excluded, including letters, non–peer-reviewed reports, narrative reviews, animal studies, and conference abstracts. The search strategy is shown in , with specific search phrases, Boolean operations, and field restrictions. The following studies were excluded during full-text screening: studies failing to report key values such as true positives (TP), false positives (FP), false negatives (FN), and true negatives (TN), making it impossible to make a contingency table; studies reporting only composite metrics like sensitivity, specificity, or AUC without providing the underlying data; and studies that reported data ambiguously or in a manner that prevented accurate construction of a 2×2 contingency table. The following is a table of inclusion and exclusion criteria (). Discrepancies were resolved through discussion with a third investigator (SL or LHL) if needed.
| Variable | Inclusion criteria | Exclusion criteria |
| Study design | Randomized controlled trials, prospective observational study, retrospective diagnostic accuracy study, cross-sectional diagnostic study |
|
| Language | Full-text in English |
|
| Data extraction |
|
|
| Disease type | Cataracts |
|
| Intervention | Cataract diagnosis (detection and/or classification) |
|
Only studies on the performance of image-based DL algorithms in cataract detection and classification were included following the eligibility criteria ().
Two investigators (RXL and HYL) independently extracted the following data using standardized data extraction tables: basic characteristics (country, publication year, study site, and type), dataset characteristics (nature, number of images, and presence or absence of external validation), and performance metrics (sensitivity and specificity). Discrepancies were settled by discussion with a third investigator. Contingency tables were used to directly extract the data on binary diagnostic accuracy, including TP, FP, TN, and FN. During data extraction, double-checking was performed, and the original authors were contacted to obtain supplementary data. These data were then used to calculate pooled sensitivity, specificity, and other metrics. If a study provided multiple contingency tables for the same or different DL algorithms, they were assumed to be independent of each other.
| Studies | Participants | N | Median age (range) | Study design | ||
| Inclusion criteria | Exclusion criteria | Labels | ||||
| Lin et al (2019) [] | Patients aged less than 14 years, with or without eye symptoms, and with no history of eye surgery. All participants were required to undergo slit-lamp photography, and sedatives such as chloral hydrate when necessary. | Patients who already had a definitive diagnosis of cataract, other ocular abnormalities, or ocular trauma. | Noncataract or cataract | 350 | 6.58 (6.13-7.03) | Multicenter randomized controlled trial (RCT) |
| Deepak and Bhat (2024) [] | NR | NR | Cataract or glaucoma or normal. | NR | NR | Cross-sectional diagnostic study |
| Zhao et al (2024) [] | NR | NR | Noncataract or mild cataract or moderate cataract or severe cataract. | NR | NR | Cross-sectional diagnostic study |
| Zia et al (2023) [] | NR | NR | Cataract or glaucoma or diabetic retinopathy or neutral. | NR | NR | Retrospective diagnostic accuracy study |
| Zhang et al (2023) [] | NR | NR | Normal or low-grade or high-grade. | 543 | NR | Prospective diagnostic accuracy study |
| Zeboulon et al (2022) [] | Patients of either one of the following clinical categories: clear lens or cataract. Patients with clear lens had no history of refractive surgery and had a best corrected visual acuity (BCVA) of at least 20/20. Patients with cataract had significant visual discomfort and were scheduled for surgery. All types of cataracts were included, and 4 experienced cataract and refractive surgeons of the department performed the patient inclusions (authors PZ, CP, WG, and DG). | Patients with any corneal disease. | Normal or cataract or background. | 157 | NR | Retrospective diagnostic accuracy study |
| Zhang et al (2024) [] | NR | NR | NC severity level: normal or mild or severe. | 530 | NR | Retrospective diagnostic accuracy study |
| Zhang et al (2022) [] | NR | NR | NC severity level: normal or mild or severe. | 543 | NR | Retrospective diagnostic accuracy study |
| Xie et al (2023) [] | Patients with cataract whose best corrected distance visual acuity (BCDVA) was good (>0.6) within 1 month after cataract surgery, and patients without cataract without refractive media opacities. The fundus images were captured without mydriasis before surgery. | Traumatic cataracts, congenital cataracts and lens dislocation, corneal diseases, asteroid hyalosis, vitreous hemorrhage, and severe retinal and optic nerve diseases. Poor quality and unreadable images were also excluded: images out of focus; images underexposed; images overexposed; incomplete images with more than 1/3 peripheral halo. | Noncataract or mild cataracts/ or visually impaired cataracts. | 5245 | NR | Retrospective diagnostic accuracy study |
| Wu et al (2022) [] | NR | Patients with congenital cataract, intraocular lens, aphakic eye, severe eye trauma, or corneal opacity. | Cataract or noncataract with normal-quality images or noncataract with poor-quality images. | 30,668 | NR | Retrospective diagnostic accuracy study |
| Vasan et al (2023) [] | New patients of both the paid and free service facilities who were aged 40 years and older with the BCVA less than 20/40 in either eye. Participants were recruited immediately after the vision examination before further investigation and ophthalmologist examination. | Patients with trauma or vulnerabilities, patients who were unwilling to participate in the study or with dilated pupil. | Negative or positive or “can’t say” or “not asked.” | 1407 | NR | Prospective diagnostic study |
| Hassan et al (2024) [] | NR | NR | Normal or cataract or glaucoma or diabetic or uveitis. | NR | NR | Retrospective diagnostic accuracy study |
| Ueno et al (2024) [] | NR | NR | Normal or cataract or infectious keratitis or immunological keratitis or corneal scar or corneal deposits or bullous keratopathy or ocular surface tumor or primary angle-closure glaucoma. | NR | NR | Retrospective diagnostic accuracy study |
| Singh et al (2024) [] | NR | NR | Cataract or glaucoma or diabetic retinopathy. | NR | NR | Retrospective diagnostic accuracy study |
| Shafiq et al (2024) [] | NR | NR | Glaucoma or cataracts or diabetic retinopathy or myopia or macular degeneration. | NR | NR | Retrospective diagnostic accuracy study |
| Santone et al (2024) [] | NR | NR | Normal or cataract. | 4785 | NR | Retrospective diagnostic accuracy study |
| Jawad et al (2024) [] | NR | NR | Normal or glaucoma or cataract or myopia or others. | NR | NR | Retrospective diagnostic accuracy study |
| Janti et al (2024) [] | Patients who were aged 45 years and older, and participants of both genders, including males and females. | Patients found to be critically ill after the examination, and those who were not willing to participate in the study. | Cataract positive (mature or immature) or cataract negative (normal and intraocular lens). | 495 | 61.2 (NR-NR) | Prospective, observational diagnostic accuracy study |
| Emir and Colak (2024) [] | NR | NR | Healthy or diabetic retinopathy or glaucoma or cataract or age-related macular degeneration or hypertension/myopia or others. | NR | NR | Retrospective diagnostic accuracy study |
| Ogundokun et al (2024) [] | NR | NR | AMD or cataract or diabetes or glaucoma or hypertension myopia or normal. | NR | NR | Retrospective diagnostic accuracy study |
| Nguyen and Lin (2024) [] | NR | NR | Cataract or normal. | NR | NR | Retrospective diagnostic accuracy study |
| Mai et al (2024) [] | NR | NR | Control (no cataract present), without posterior polar cataract (PPC; cataract present without PPC), with PPC (cataract present with PPC). | 103 | NR | Retrospective diagnostic accuracy study |
| Raveenthini et al (2024) [] | NR | NR | AMD or cataract or diabetic retinopathy or glaucoma or normal. | NR | NR | Retrospective diagnostic accuracy study |
| Rafay et al (2023) [] | NR | NR | Cataract or diabetic retinopathy or glaucoma or normal. | NR | NR | Cross-sectional diagnostic study |
| Abbas et al (2023) [] | NR | NR | Glaucoma or diabetic retinopathy or cataract or normal. | 300 | NR | Cross-sectional diagnostic study |
| Uyar et al (2024) [] | NR | NR | Cataract or DR or glaucoma or normal. | NR | NR | Retrospective diagnostic accuracy study |
| Serwaa et al (2024) [] | NR | NR | Glaucoma-positive or glaucoma-negative or cataracts-positive or cataract-negative. | NR | NR | Retrospective diagnostic accuracy study |
| Zhang et al (2022) [] | NR | NR | NC severity level: normal, mild, or severe. | 543 | 61.30 (42.65-79.95) | Retrospective diagnostic accuracy study |
| Glaret Subin and Muthukannan (2022) [] | NR | NR | AMD, diabetic retinopathy, cataract, or glaucoma. | 5000 | NR | Retrospective diagnostic accuracy study |
| Xiao et al (2024) [] | NR | NR | Cortical cataract (CC) severity level: normal, mild, or severe. | 469 | NR | Retrospective diagnostic accuracy study |
| Wang et al (2024) [] | NR | NR | 2 coarse-grained types (noncataract No C and posterior subcapsular cataract [PSC]), 7 fine-grained types (NC II, NC III, ≥ NC IV, CC I, CC II, CC III, and CC IV). | NR | NR | Retrospective diagnostic accuracy study |
| Kumari and Saxena (2024) [] | NR | NR | Diseased class or normal class. | NR | NR | Retrospective diagnostic accuracy study |
| Devaraj et al (2024) [] | Immature cataract, mature cataract, no cataract, and prior cataract operation with intraocular lens (IOL) inserted. | NR | Cataract or noncataract. | 7726 | 50 (NR-NR) | Prospective observational study |
| Al-Saadi et al (2024) [] | NR | NR | Normal, early, moderate, or severe. | NR | NR | Retrospective diagnostic accuracy study |
| Elsawy et al (2023) [] | Phakic eyes. | Pseudophakic and aphakic eyes. | NC, cortical cataract (cortical lens opacity), or PSC. | 2573 | 69.84 (62.12-77.56) | Retrospective diagnostic accuracy study |
| Akram and Debnath, 2020 [] | The symptoms of selected eye diseases include several visual abnormalities in the eye region, particularly blurred, clouded, or yellowing lens, gray or white spots on the cornea, red or bloodshot eyes, yellow or greenish-yellow coatings on eyes, foamy white spots in sclera, swollen eyes, eyelid deformity such as the length of the lower eyelid being turned out from the eye, or reddish bumps on the edge of an inner eyelid depending on specific diseases, and symptoms are different for each disease. | NR | Bitot’s spot of vitamin A deficiency, cataracts, conjunctivitis, corneal ulcer, ectropion, healthy, or periorbital cellulitis or trachoma. | NR | NR | Cross-sectional diagnostic study |
| Jiang et al (2021) [] | NR | NR | Opacity area: Limited or Extensive | NR | 1.58 (0.7-2.46) | Retrospective diagnostic accuracy study |
| Yadav and Yadav (2023) [] | NR | NR | Severity: no, mild, moderate, or severe. | NR | NR | Retrospective diagnostic accuracy study |
| Yadav and Yadav (2023) [] | NR | NR | Severity: no, mild, moderate, or severe. | NR | NR | Retrospective diagnostic accuracy study |
| Subin and Kannan (2022) [] | NR | NR | Cataract, diabetic retinopathy, glaucoma, or normal. | NR | NR | Cross-sectional diagnostic study |
| Pratap and Kokil (2019) [] | NR | NR | Normal, mild, moderate, or severe. | NR | NR | Cross-sectional diagnostic study |
| Luo et al (2021) [] | NR | NR | Normal, glaucoma, cataract, or AMD. | NR | NR | Retrospective diagnostic accuracy study |
| Imran et al (2020) [] | NR | NR | Normal, mild, moderate, or severe. | NR | NR | Retrospective diagnostic accuracy study |
| Imran et al (2021) [] | NR | NR | Normal, mild, moderate, or severe. | NR | NR | Retrospective diagnostic accuracy study |
| Acar et al (2021) [] | NR | NR | Noncataract or cataract. | NR | NR | Retrospective diagnostic accuracy study |
| Olaniyan et al (2024) [] | NR | NR | Normal or cataract. | NR | NR | Retrospective diagnostic accuracy study |
| Ganokratanaa et al (2023) [] | NR | NR | Normal or cataract. | NR | NR | Retrospective diagnostic accuracy study |
| Gan et al (2023) [] | NR | NR | Four stages of cataract: incipient stage, intumescent stage, mature stage, or hypermature stage. | NR | NR | Cross-sectional diagnostic study |
| Tham et al (2022) [] | NR | Visual impairment caused by other pathologies, incomplete or missing data on cataract grading, or BCVA. | Normal or visually significant cataract. | 13,482 | NR | Retrospective diagnostic accuracy study |
| Siddique (2022) [] | NR | NR | Cataract, chalazion, normal, or squint. | NR | NR | Retrospective diagnostic accuracy study |
| Sirajudeen et al (2022) [] | NR | NR | Cataract or normal. | 200 | NR | Cross-sectional diagnostic study |
| Junayed et al (2021) [] | NR | NR | Cataract or noncataract. | NR | NR | Cross-sectional diagnostic study |
| Hu et al (2020) [] | NR | NR | Pronounced cataract (2) or early cataract (1) or normal (0). | NR | NR | Retrospective diagnostic accuracy study |
| Hu et al (2021) [] | NR | NR | Cataract or normal. | 38 | 58 (NR-NR) | Retrospective diagnostic accuracy study |
| Lai et al (2022) [] | NR | NR | Cataract or normal. | NR | NR | Retrospective diagnostic accuracy study |
| Askarian et al (2021) [] | NR | NR | Cataract or healthy. | NR | NR | Retrospective diagnostic accuracy study |
| Son et al, 2022 [] | Patients had available anterior segment photograph data. | Patients with pathologic features of the cornea, anterior chamber, lens, or iris that interfere with the detection of lens images (eg, corneal opacity or edema, uveitis, and iris defects including aniridia, coloboma, and iridocorneal endothelial syndrome) and a medical history of previous ophthalmic surgery (eg, keratoplasty, implantable Collamer lens, and cataract surgery); patients with retinal and vitreal diseases involving visual pathways that could interfere with visual acuity and final management plan. | Cortical opacity; nuclear color; nuclear opalescence; PSC: normal or mild or moderate or severe. | NR | NR | Cross-sectional diagnostic study |
| Saju and Rajesh, 2022 [] | NR | NR | Five types of cataracts: cortical or hyper mature or mature or nuclear or posterior. | NR | NR | Cross-sectional diagnostic study |
| Chellaswamy et al (2022) [] | NR | NR | Cataract or diabetic retinopathy or glaucoma or normal or AMD. | NR | NR | Retrospective diagnostic accuracy study |
| Lu et al (2022) [] | NR | Eyes with corneal opacity or other corneal disease that might significantly interfere with lens observation and blurred region of interest due to poor fixation or eyes with small pupils that prevented manual cataract evaluation. | Nuclear cataract or cortical cataract or posterior subcapsular cataract. | NR | NR | Retrospective diagnostic accuracy study |
| Al-Naji et al (2024) [] | NR | NR | Normal or cataract or foreign body or glaucoma or subconjunctival hemorrhage or viral conjunctivitis. | 645 | NR | Prospective observational study |
| Elloumi (2022) [] | NR | NR | Healthy or mild or moderate or severe cataract. | NR | NR | Retrospective diagnostic accuracy study |
| Zannah et al (2024) [] | NR | NR | Cataract or diabetic retinopathy or glaucoma or normal. | NR | NR | Retrospective diagnostic accuracy study |
aNR: not reported.
bNC: nuclear cataract.
cAMD: age-related macular degeneration.
Primary and Secondary Outcomes
Primary and secondary outcomes were defined to assess the performance of DL in cataract detection and classification. The primary outcomes included sensitivity, specificity, and positive and negative likelihood ratios.
The secondary outcomes were used to assess the accuracy of DL versus machine learning (ML) algorithms in cataract diagnosis through subgroup analyses and compare DL algorithms with human experts in studies using identical datasets. The datasets in each study were also investigated for the reference standard used to determine whether transfer learning was applied, the methods for model testing and validation, and the sources and characteristics of the datasets ().
| Studies | Indicator definition | Algorithm | Data source | ||||||
| Device | Exclusion of poor-quality cases | Algorithm architecture | ML or DL | Transfer learning applied | Source of data | Number of cases for training or test or internal or external validation | Data range | Open access data | |
| Lin et al (2019) [] | Slit-lamp photography | Yes | CC-Cruiser | DL | No | ZOC, located in Guangzhou in southern China. The other 4 eye clinics are affiliated with Shenzhen Eye Hospital, the Central Hospital of Wuhan, the Second Affiliated Hospital of Fujian Medical University, and Kaifeng Eye Hospital | NR/350/NR/NR | August 9, 2017-May 25, 2018 | NR |
| Deepak and Bhat (2024) [] | Retinal fundus camera | No | Darknet-53 | DL | Yes | Ocular Disease Intelligent Recognition (ODIR) | 4000/1000/NR/NR | NR | ODIR |
| Zhao et al (2024) [] | Slit-lamp photography | No | NCME-Net | DL | No | Shenzhen Eye Hospital and the Eye Hospital of Nanjing Medical University | 553/100/139/NR | NR | NR |
| Zia et al (2023) [] | Retinal fundus camera | No | Improved SqueezeNet model | DL | No | ODIR-IMAGE, Kaggle dataset | 1500/400/NR/NR | NR | ODIR-IMAGE, Kaggle dataset |
| Zhang et al (2023) [] | CASIA2 AS-OCT ophthalmology device (TOMEY Inc, Japan) | Yes | Ensemble Logistic Regression (EMLR) framework | ML | No | AS-OCT-NC2 dataset | 7831/3611/NR/NR | NR | NR |
| Zeboulon et al (2022) [] | Swept Source Optical Coherence Tomography (SS-OCT) | No | U-Net model | DL | No | Anterion (Heidelberg) | Development set/validation set: 504/1326 | NR | NR |
| Zhang et al (2024) [] | CASIA2 AS-OCT ophthalmology device (TOMEY Inc) | No | RCRNets | DL | No | AS-OCT-NC2 dataset | 9394/3390/3100/NR | NR | NR |
| Zhang et al (2022) [] | CASIA2 AS-OCT ophthalmology device (TOMEY Inc) | Yes | RIR-Net-2‐34 | DL | No | The local hospital | 7831/3611/NR/NR | NR | UCSD dataset, Heidelberg OCT dataset |
| Xie et al (2023) [] | Retinal fundus camera | Yes | DenseNet121 | DL | No | Zhejiang Eye Hospital at Wenzhou (ZEHWZ) | 4901/1048/1048/1398 | September 2020-March 2021 | NR |
| Wu et al (2022) [] | Retinal fundus camera | Yes | Anti-interference model (convolutional neural network [CNN]) | DL | No | The Chinese PLA (People’s Liberation Army) General Hospital | 14400/17765/1800/NR | September 2018-May 2021 | NR |
| Vasan et al (2023) [] | Smartphone camera | Yes | E-Paarvai App (CNN) | DL | No | A large eye care hospital in South India | 1400/NR/2619/NR | January 2022-April 2022 | NR |
| Hassan et al (2024) [] | Retinal fundus camera | No | OcularNET | DL | Yes | The Kaggle machine-learning platform | 4000/2200/NR/NR | NR | The Kaggle machine-learning platform. |
| Ueno et al (2024) [] | Smartphone camera | Yes | YOLO V.5 | DL | No | 23 tertiary eye centers in Japan | 5270/836/NR/NR | 2019‐2020 | NR |
| Singh et al (2024) [] | Retinal fundus camera | No | A novel ensembled deep learning CNN model | DL | Yes | The Kaggle database | 4217/NR/NR/NR | NR | The Kaggle database |
| Shafiq et al, (2024) [] | OCT, retinal fundus camera | No | The DualEye-FeatureNet model | DL | No | Structured analysis of the retina (STARE), DRIVE, high-resolution fundus (HRF) | NR/483/NR/NR | NR | STARE, DRIVE, HRF |
| Santone et al (2024) [] | Retinal fundus camera | No | The STANDARD_CNN model | DL | No | the ODIR 5K dataset | 6987/957/1627/NR | NR | the ODIR 5K dataset |
| Jawad et al (2024) [] | Retinal fundus camera | Yes | Swin Transformer models (Swin-T) | DL | No | ODIR, the Retina dataset, available on Kaggle | 7000/3400/NR/300 | NR | ODIR, the Retina dataset, available on Kaggle |
| Janti et al (2024) [] | Smartphone camera | Yes | Smartphone-based cataract screening application | DL | No | The AIIMS (All India Institute of Medical Sciences) Bibinagar, Hyderabad, Telangana, India | NR/990/NR/NR | April 2024-July 2024 | NR |
| Emir and Colak (2024) [] | Retinal fundus camera | Yes | The residual neural network (ResNet) 50 | DL | No | The ODIR dataset | 3198/930/471/NR | NR | The ODIR dataset |
| Ogundokun et al (2024) [] | Retinal fundus camera | No | MobileNetV2-SVM (support vector machine) | DL | No | Ocular dataset from the Kaggle repository | 16290/2012/1811/NR | NR | The Kaggle database |
| Nguyen and Lin (2024) [] | Retinal fundus camera | No | Hybrid CNN Approach | DL | Yes | The Kaggle database | 888/278/222/NR | NR | The Kaggle database |
| Mai et al (2024) [] | A Zeiss OPMI Lumera T surgical microscope | No | ConvNeXt-Tiny model | DL | No | Department of Ophthalmology, Far Eastern Memorial Hospital, New Taipei, Taiwan. | NR/NR/103/NR | January 1, 2018-December 31, 2021 | NR |
| M et al (2024) [] | Retinal fundus camera | No | XGB classifier model | ML | No | HRF, DR HAGIS, DIARET DB0, DRISHTI, KAGGLE, E-OPTHA, RIM ONE, ORIGA, ACRIMA, DRIONS-DB, STARE, ARIA, IDRID, ICHALLENGE AMD, ODIR, RFMID, KAGGLE CATARACT, HARVARD V1, DERBI DATA, ICHALLENGE, GLAUCOMA | 10447/2612/NR/NR | NR | HRF, DR HAGIS, DIARET DB0, DRISHTI, KAGGLE, E-OPTHA, RIM ONE, ORIGA, ACRIMA, DRIONS-DB, STARE, ARIA, IDRID, ICHALLENGE AMD, ODIR, RFMID, KAGGLE CATARACT, HARVARD V1, DERBI DATA, ICHALLENGE, GLAUCOMA |
| Rafay et al (2023) [] | Retinal fundus camera | No | EfficientNet B3 | DL | Yes | The Kaggle database | 2949/1268/NR/NR | NR | The Kaggle database |
| Abbas et al (2023) [] | Retinal fundus camera | No | Deep-ocular mode | DL | Yes | The retinal fundus multidisease image dataset (RFMiD) and ODIR | 1222/521/NR/NR | NR | RFMiD and ODIR |
| Uyar et al (2024) [] | Retinal fundus camera | No | ABC-based weighted ensemble model | DL | Yes | The Eye Disease Dataset (EDD), from the Kaggle | 3372/426/419/NR | NR | EDD, from the Kaggle |
| Serwaa et al (2024) [] | Retinal fundus camera | No | LBPSCN: Local Binary Pattern Scaled Capsule Network | DL | No | The Kaggle database | NR | NR | The Kaggle database |
| Zhang et al (2022) [] | CASIA2 AS-OCT ophthalmology device (TOMEY Inc) | Yes | Clinical-awareness attention network (CCA-Net) | DL | No | A local health physical center | 9619/3141/3441/NR | NR | The ACRIMA dataset, the UCSD dataset |
| Glaret Subin and Muthukannan (2022) [] | Fundus camera | No | FPOA-CNN | DL | No | Various medical centers in China collected by the Shanggong Medical Technology Co., Ltd. | NR | NR | ODIR database |
| Xiao et al (2024) [] | CASIA2 AS-OCT ophthalmology device (TOMEY Inc) | No | ResNet34-MSSA | DL | No | CASIA2 AS-OCT dataset | 3969/1271/1305/NR | NR | The LAG dataset, the SD-OCT dataset |
| Wang et al, 2024 [] | Slit-lamp photography | No | MGCNet | DL | No | The Cataract Center of Beijing Tongren Hospital (BTH; Beijing, China) | 2912/970/970/NR | NR | The APTOS2019 dataset, The HAM10000 dataset |
| Kumari and Saxena (2024) [] | Retinal fundus camera | No | RINet | DL | No | Multiple repositories | NR | NR | NR |
| Devaraj et al (2024) [] | Smartphone camera | No | EfficientNet-v2 Model | DL | No | The ophthalmology departments at King George’s Medical University (KGMU) and Balrampur Hospital, Lucknow | 1708/753/275/NR | October 29, 2022-September 23, 2023 | NR |
| Al-Saadi et al (2024) [] | Retinal fundus camera | No | An Automated Wavelet Scattering Network | DL | No | The ODIR database | 357/155/NR/NR | NR | The ODIR database |
| Elsawy et al (2023) [] | Color fundus photography (CFP) | No | Deep Opacity Net | DL | Yes | The AREDS2 dataset, the Singapore Epidemiology of Eye Diseases study (SEED) | 12227/3514/1773/17088 | 2006‐2008 | NR |
| Akram and Debnath (2020) [] | Digital camera | No | A deep convolution neural network (DCNN) model | ML | No | International Center for Eye Health, clinical images for symptoms on faces from the University of Rochester, UCSD School of Medicine and VA Medical Center, the Primary Care Dermatology Society, and other different resources | 1402/350/NR/NR | NR | NR |
| Jiang et al (2021) [] | Slit-lamp photography | No | CCNN-Ensemble | DL | Yes | Zhongshan Ophthalmic Center of Sun Yat-sen University | 470 (Training and validation)/132/79 | June 2015-February 2020 | NR |
| Yadav and Yadav (2023) [] | Retinal fundus camera | Yes | CNN with 2D DFT | DL | No | HRF, STARE, MESSIDOR, DRIVE, DRIONS_DB, and IDRiD datasets, as well as images obtained from the internet | NR | NR | HRF, STARE, MESSIDOR, DRIVE, DRIONS_DB, and IDRiD datasets |
| Yadav and Yadav (2023) [] | Retinal fundus camera | No | CNN with ensemble of SVM, NB, RF | DL | No | HRF, STARE, MESSIDOR, DRIVE, DRIONS_DB, and IDRiD databases, as well as other images collected from the internet | NR | NR | HRF, STARE, MESSIDOR, DRIVE, DRIONS_DB, and IDRiD databases |
| Subin and Kannan (2022) [] | Retinal fundus camera | No | AMSO-RNN (recurrent neural network) Model | DL | No | ODIR database | 2240/960/NR/NR | NR | ODIR database |
| Pratap and Kokil (2019) [] | Retinal fundus camera | Yes | Pre-trained CNN | TL | Yes | HRF image database, STARE, standard diabetic retinopathy database (DIARETDB0), e-ophtha: a color fundus image database, MESSIDOR database, DRIVE database, fundus image registration (FIRE) dataset, digital retinal images for optic nerve segmentation database (DRIONS-DB), IDRiD, available datasets from Dr Hossein Rabbani, and other internet resources | 400/400/NR/NR | NR | HRF image database, STARE, DIARETDB0, e-ophtha: a color fundus image database, MESSIDOR database, DRIVE database, FIRE dataset, DRIONS-DB, IDRiD, available datasets from Dr Hossein Rabbani, and other internet resources |
| Luo et al (2021) [] | Retinal fundus camera | No | FCL-EfficientNet-B3 | DL | No | Shanggong Medical Technology Co, Ltd. OIA-ODIR dataset | 1000/274/NR/NR | NR | OIA-ODIR dataset |
| Imran et al (2020) [] | Retinal fundus camera | No | The combination of DL models (AlexNet, ResNet, and VGGNet) and SVM | DL | Yes | The Tongren Hospital, China | 6424/1607/NR/NR | NR | NR |
| Imran et al (2021) [] | High-resolution fundus camera Canon-EOS-40D with additional settings such as 72 DPI resolution, no-flash, manual exposure, and auto-white balance | No | A novel hybrid method, namely CRNN, based on CNN and RNN | DL | Yes | The Tongren Hospital, China | 6424/1606/NR/NR | NR | NR |
| Acar et al (2021) [] | Retinal fundus camera | No | VGGNet | DL | Yes | The Kaggle Ocular Disease Recognition database | 3891/1216/973/NR | NR | The Kaggle Ocular Disease Recognition database |
| Olaniyan et al (2024) [] | Slit-lamp photography | No | Hybrid Siamese-VGG16 model | DL | No | Kaggle’s public repository | NR/121/NR/NR | NR | Kaggle’s public repository |
| Ganokratanaa et al (2023) [] | Slit-lamp photography | No | LeNet-CNN | DL | No | NR | 5600/1400/NR/NR | NR | NR |
| Gan et al (2023) [] | Slit-lamp photography | No | Automatic segmentation DTL platform | DL | No | Department of Ophthalmology, Jiangxi Provincial People’s Hospital | 517/130/nr/nr | NR | NR |
| Tham et al (2022) [] | Retinal fundus camera | Yes | The ResNet-50 | DL | No | Singapore Eye Research Institute | 8045/1692/NR/16005 | NR | NR |
| Siddique (2022) [] | Photos from phones and internet | No | MobileNet | DL | No | 4 hospitals from Bangladesh | 1762/439/NR/NR | NR | NR |
| Sirajudeen et al (2022) [] | Retinal fundus camera | No | Novel Kernel-based CNN | DL | No | The Kaggle database | 320/80/NR/NR | NR | The Kaggle database |
| Junayed et al (2021) [] | Retinal fundus camera | No | CataractNet | DL | No | Multiple databases | 904/226/NR/NR | NR | Multiple databases |
| Hu et al (2020) [] | Smartphone with slit-lamp | No | UDFA (Faster-RCNN) | DL | No | Marked Slit Lamp Picture Project (MSLPP) | 11272/819/4831/NR | NR | MSLPP |
| Hu et al (2021) [] | The iSpector-mini mobile phone slit lamp developed by Shenyang EyeROBO Intelligent Technology Co, Ltd. | No | ACCV | DL | No | A cooperation hospital | 1064/304/152/NR | NR | NR |
| Lai et al (2022) [] | Digital camera | No | CNNDCI | DL | No | GitHub.com | 7735/193/89/NR | NR | GitHub.com |
| Askarian et al (2021) [] | Smartphone camera | No | SVM | ML | No | NR | 63/30/7/NR | NR | NR |
| Son et al (2022) [] | Slit-lamp photography | No | An ensemble of 3 AI algorithms: ResNet18, WideResNet50-2, and ResNext50 | DL | Yes | Local outpatient clinic | 2706/792/446/NR | January 2017-December 2020 | NR |
| Saju and Rajesh (2022) [] | Slit-lamp photography | No | Dense CNN+BE_ResNet101 classification model | DL | No | The DRIMDB dataset, various hospitals | NR/264/NR/NR | NR | The DRIMDB dataset |
| Chellaswamy et al (2022) [] | Retinal fundus camera | No | WODCNN method | DL | No | KAGGLE, MESSIDOR, ORIGA, DRIVE, STARE datasets | 1661/414/NR/NR | NR | KAGGLE, MESSIDOR, ORIGA, DRIVE, STARE datasets |
| Lu et al (2022) [] | Slit-lamp photography | Yes | Faster R-CNN and ResNet | DL | No | An internal dataset from the EENT Hospital of Fudan University and an external dataset from the Pujiang Eye Study | 964/214/156/NR | An internal dataset of slit lamp photographs of the anterior segment of cataract-affected eyes taken between 2018 and 2020. Another external dataset of slit lamp photographs taken between March 2018 and August 2019. | NR |
| Al-Naji et al (2024) [] | Retinal fundus camera | No | InceptionResNetV2 | DL | No | The Balad Ruz General Hospital and Ibn Al-Haitham Teaching Eye Hospital | 453/194/NR/NR | January 2, 2023-July 7, 2023 | NR |
| Elloumi (2022) [] | Retinal fundus camera | No | Ensemble Learning (InceptionV3, MobileNet-V2, and NasNet-Mobile) | DL | Yes | the Kaggle platform | 354/118/118/NR | NR | The “Cataract Dataset,” “Ocular Disease Recognition (ODiR)” |
| Zannah et al (2024) [] | Retinal fundus camera | No | BayeSVM500 model | ML | Yes | Cataract dataset, Glaucoma dataset, High-Resolution Fundus (HRF) Image Database, Kaggle, IEEE-Dataport, and Pattern Recognition Lab | 4144/1037/NR/NR | NR | The Kaggle database, High-Resolution Fundus (HRF) Image Database |
aML: machine learning.
bDL: deep learning.
cNR: not reported.
dDenseNet: dense convolutional network.
eMobileNet: efficient convolutional neural networks for mobile vision apps.
fEfficientNet: rethinking model scaling for convolutional neural networks.
gVGG: visual geometry group.
hFaster-R CNN: fast region–based convolutional network.
Quality Assessment
Three investigators independently assessed the risk of bias (RoB) and applicability concerns in the clinical context in the included studies by using the Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) []. Cross-validation was implemented to enhance interrater agreement, and the methodology for consistency assessment was explicitly reported. Deeks’ funnel plots were drawn to evaluate the publication bias if more than 10 studies were included. Statistical significance was set at P<.05. Deeks’ funnel plot asymmetry test was performed using the Deeks command within the MIDAS package in STATA 18.0 (StataCorp LLC).
Statistical Analysis
STATA 18.0 and RevMan 5.4 (Review Manager; The Cochrane Collaboration) were used for data analyses. The summary receiver operating characteristic (SROC) curve plotting, assessment of heterogeneity, and analysis of publication bias were performed using STATA 18.0 to enhance transparency and reproducibility. The heterogeneity was assessed by the Cochran Q test and I² statistic. The threshold of the I² statistic for quantifying heterogeneity proposed by Higgins et al [] was adopted: I²≤25%: low, I²≈50%: moderate, I²≥75%: substantial. This strategy was specifically designed to identify whether inconsistencies in study results stemmed from random factors or reflected substantive discrepancies. When significant heterogeneity (P<.05 or I²>50%) was identified [], a bivariate mixed effects model was adopted. The bivariate random-effects model was fitted using the MIDAS package in STATA 18.0. This method demonstrated particular efficacy in synthesizing pooled estimates of sensitivity, specificity, and AUC across studies. Its advantage resides in the capability to systematically address metric variability while preserving the intrinsic correlation.
To evaluate the accuracy of the DL algorithm, a hierarchical SROC curve was fitted. We calculated corresponding 95% CIs for sensitivity, specificity, and AUC using the Delta method, which linearized the nonlinear relation of the log-transformed sensitivity and specificity by a first-order Taylor expansion. Then the variance-covariance matrix of the parameter estimates was propagated, whereas prediction intervals (PIs) incorporated between-study heterogeneity by modeling the covariance structure of sensitivity and specificity [,]. Sensitivity analyses were conducted on all DL algorithms to be evaluated, rather than only the one with the highest accuracy. A random-effects model was used to explain potential between-study variability.
Ethical Considerations
This study required no informed consent or ethical approval. Data previously collected from human subjects in ethical/institutional review board–approved studies were used. All studies included adhered to the Declaration of Helsinki.
Results
Search Results
Initially, 2235 studies were retrieved, of which 492 duplicates were excluded. Following study screening, 1680 studies were excluded from quantitative synthesis (meta-analysis), including 1617 studies involving animal research, nondisease studies, surgical technique investigations, reviews, and conference reports; 10 studies not using deep learning algorithms; 46 studies lacking sufficient data for constructing 2×2 contingency tables or reporting data in formats incompatible with pooling (eg, AUC only); 4 studies not focusing on cataract diagnostic models; and 3 studies that did not address cataract disease (; ).

Study Characteristics
As shown in -, all the included studies were published in 2019‐2024, involving 171,416 images. Retrospective data were used in 46 studies, prospective data in only 5 studies, and cross-sectional data in 12 studies. The data came from open-access sources in 31 studies. The sample size was prespecified in one study, and low-quality images were excluded in 15 studies. External validation was conducted in 12 studies, while the remainder performed internal validation. Six studies compared DL with traditional ML models using the same dataset, while another 6 compared DL models with human experts. Forty-four studies focused on cataract detection, 17 on cataract classification, and 2 on both detection and classification. Cataract detection was categorized as binary detection (presence vs absence of cataracts, n=21) and multidisease detection (n=25). Moreover, cataracts were classified into mild (n=18), moderate (n=11), and severe (n=19) types. Additionally, 49 studies did not describe cataract classification; among the other studies, clinical subtypes included posterior subcapsular cataract (PSC; n=4), pediatric cataract (n=2), posterior polar cataract (PPC; n=1), nuclear cataract (NC; n=10), and cortical cataract (CC; n=6).
| Studies | Reference standard | Type of internal validation | External validation |
| Lin et al (2019) [] | Expert consensus, consistent label | Hold-out cross-validation | No |
| Deepak and Bhat (2024) [] | Clinical diagnosis by ophthalmologists; validated by trained readers | 5-fold cross-validation | No |
| Zhao et al (2024) [] | AREDS No 4 guidelines | 5-fold cross-validation | No |
| Zia et al (2023) [] | Clinical diagnosis by ophthalmologists; validated by trained readers | 5-fold cross-validation | No |
| Zhang et al (2023) [] | The LOCS III system | Hold-out cross-validation | No |
| Zeboulon et al (2022) [] | The cataract fraction (CF) | 5-fold cross-validation | Yes |
| Zhang et al (2024) [] | The LOCS III system | Hold-out cross-validation | Yes |
| Zhang et al (2022) [] | The LOCS III system | Hold-out cross-validation | Yes |
| Xie et al (2023) [] | Cataract specialists | Hold-out cross-validation | Yes |
| Wu et al (2022) [] | Cataract specialists | Hold-out cross-validation | Yes |
| Vasan et al (2023) [] | Slit lamp diagnosis with dilated eyes by an ophthalmologist | Hold-out cross-validation | No |
| Hassan et al (2024) [] | Clinical diagnosis by ophthalmologists; validated by trained readers | Hold-out cross-validation | No |
| Ueno et al (2024) [] | Corneal specialists | Hold-out cross-validation | No |
| Singh et al (2024) [] | Clinical diagnosis by ophthalmologists; validated by trained readers | Hold-out cross-validation | No |
| Shafiq et al (2024) [] | Clinical diagnosis by ophthalmologists | NR | No |
| Santone et al (2024) [] | Clinical diagnosis by ophthalmologists (based on electronic medical records and verified by trained readers) | Hold-out cross-validation | No |
| Jawad et al (2024) [] | Clinical diagnosis by ophthalmologists; validated by trained readers | Hold-out cross-validation | Yes |
| Janti et al (2024) [] | The LOCS III system | NR | No |
| Emir and Colak (2024) [] | Clinical diagnosis by ophthalmologists | Hold-out cross-validation | No |
| Ogundokun et al (2024) [] | Clinical diagnosis by ophthalmologists; validated by trained readers | Hold-out cross-validation | No |
| Nguyen and Lin (2024) [] | Clinical diagnosis by ophthalmologists; validated by trained readers | 5-fold cross-validation | No |
| Mai et al (2024) [] | Visual inspection of the surgical view of the cataract in the surgery video | 5-fold cross-validation | No |
| Raveenthini et al (2024) [] | Clinical diagnosis by ophthalmologists | ten-fold cross-validation | No |
| Rafay et al (2023) [] | Clinical diagnosis by ophthalmologists; validated by trained readers | Hold-out cross-validation | No |
| Abbas et al (2023) [] | Clinical diagnosis by ophthalmologists | Hold-out cross-validation | No |
| Uyar et al (2024) [] | Clinical diagnosis by ophthalmologists; validated by trained readers | 10-fold cross-validation | No |
| Serwaa et al (2024) [] | Clinical diagnosis by ophthalmologists; validated by trained readers | NR | No |
| Zhang et al (2022) [] | Experienced ophthalmologists | NR | Yes |
| Glaret Subin and Muthukannan (2022) [] | Clinical diagnosis by ophthalmologists; validated by trained readers | 10-fold cross-validation | No |
| Xiao et al (2024) [] | Clinical grading by ophthalmologists | NR | No |
| Wang et al (2024) [] | Clinical diagnosis by ophthalmologists | Hold-out cross-validation | No |
| Kumari and Saxena (2024) [] | Clinical diagnosis by ophthalmologists | 5-fold cross-validation | No |
| Devaraj et al (2024) [] | Clinical diagnosis by ophthalmologists | Hold-out cross-validation | No |
| Al-Saadi et al (2024) [] | Clinical diagnosis by ophthalmologists; validated by trained readers | Hold-out cross-validation | No |
| Elsawy et al (2023) [] | The Wisconsin Cataract Grading System, the AREDS2 NS severity scale. | Hold-out cross-validation | Yes |
| Akram and Debnath (2020) [] | Clinical grading by ophthalmologists | 10-fold cross-validation | No |
| Jiang et al (2021) [] | Three senior ophthalmologists | 5-fold cross-validation | Yes |
| Yadav and Yadav (2023) [] | A professional ophthalmologist | NR | No |
| Yadav and Yadav (2023) [] | Clinical diagnosis by ophthalmologists | NR | No |
| Subin and Kannan (2022) [] | Clinical diagnosis by ophthalmologists; validated by trained readers | Hold-out cross-validation | No |
| Pratap and Kokil (2019) [] | Clinical diagnosis by ophthalmologists; validated by trained readers | Hold-out cross-validation | No |
| Luo et al (2021) [] | Trained ophthalmologists | 5-fold cross-validation | No |
| Imran et al (2020) [] | Two retinal experts | 5-fold cross-validation | No |
| Imran et al (2021) [] | Clinical diagnosis by ophthalmologists | 5-fold cross-validation | No |
| Acar et al (2021) [] | Clinical diagnosis by ophthalmologists | Monte-Carlo cross-validation | No |
| Olaniyan et al (2024) [] | Clinical diagnosis by ophthalmologists; validated by trained readers | Hold-out cross-validation | No |
| Ganokratanaa et al (2023) [] | NR | 5-fold cross-validation | No |
| Gan et al (2023) [] | Experienced ophthalmologists | 5-fold cross-validation | No |
| Tham et al (2022) [] | Wisconsin cataract grading system or AREDS system | 5-fold cross-validation | Yes |
| Siddique (2022) [] | Clinical grading by ophthalmologists | 5-fold cross-validation | No |
| Sirajudeen et al (2022) [] | Clinical diagnosis by ophthalmologists; validated by trained readers | 5-fold cross-validation | No |
| Junayed et al (2021) [] | Composite reference standard (Expert clinical diagnosis or grading from original source datasets) | 5-fold cross-validation | No |
| Hu et al (2020) [] | Ophthalmologists with more than 5 years of clinical experience | Hold-out cross-validation | No |
| Hu et al (2021) [] | The LOCS III system | Hold-out cross-validation | No |
| Lai et al (2022) [] | The LOCS III system | 5-fold cross-validation | Yes |
| Askarian et al (2021) [] | NR | 10-fold cross-validation | No |
| Son et al (2022) [] | The LOCS III system | Hold-out cross-validation | No |
| Saju and Rajesh (2022) [] | Clinical diagnosis by ophthalmologists | Hold-out cross-validation | No |
| Chellaswamy et al (2022) [] | Clinical diagnosis by ophthalmologists | Hold-out cross-validation | No |
| Lu et al (2022) [] | The LOCS III system | Hold-out cross-validation | Yes |
| Al-Naji et al (2024) [] | Clinical grading by ophthalmologists | Hold-out cross-validation | No |
| Elloumi (2022) [] | Clinical diagnosis by ophthalmologists; validated by trained readers | 5-fold cross-validation | No |
| Zannah et al (2024) [] | Clinical diagnosis by ophthalmologists; validated by trained readers | 5-fold cross-validation | No |
aLOCS III: Lens Opacities Classification System III.
bNR: not reported.
Pooled Performance of DL Algorithms
Finally, 63 studies [-] with sufficient data (97 contingency tables) were included for the assessment of DL performance in cataract detection and classification []. Hierarchical SROC curves for cataract detection (45 contingency tables) and classification (52 contingency tables) are provided in and , respectively. The classification task involved multiclassification (eg, mild, moderate, and severe cataracts), and separate SROC curves were generated for each category. For cataract detection, the pooled sensitivity and specificity of DL were 96% (0.95‐0.97) and 98% (0.97‐0.99), respectively, with an AUC of 0.99 (0.98‐1.00). For cataract classification, DL had pooled sensitivity and specificity of 94% (0.93‐0.96) and 97% (0.96‐0.98), respectively, with an AUC of 0.99 (0.98‐1.00). Great heterogeneity and inconsistency were observed across cataract severity (mild: I²=99%, moderate: I²=96%, severe: I²=99%; P<.001), suggesting substantial variability in diagnostic or methodological methods across studies. 25 contingency tables for mild cataracts were used in 18 studies, 16 contingency tables for moderate cataracts in 11 studies, and 28 contingency tables for severe cataracts in 19 studies, which may introduce classification imbalance, potentially influencing the performance assessment.


One or more DL algorithms were reported in most studies, and DL with the highest accuracy was selected, ultimately obtaining 53 contingency tables. For cataract detection, DL had pooled sensitivity and specificity of 97% (96%‐98%) and 98% (97%‐99%), respectively, with an AUC of 0.99 (0.98‐1.00) (). For cataract classification, the pooled sensitivity and specificity were 95% (0.92‐0.97) and 98% (0.96‐0.99), respectively, with an AUC of 0.99 (0.98‐1.00) (). Threshold analyses were conducted using STATA 18.0 to investigate threshold effects. For diagnostic models in primary studies that did not prespecify a threshold, sensitivity and specificity corresponding to the optimal threshold reported in the study were extracted. If a primary study reported results for multiple thresholds, we prioritized extracting and analyzing data corresponding to the threshold associated with the reported primary endpoint or optimal operating point. The spatial distribution of classification thresholds in ROC curves was analyzed; each threshold corresponded to a unique point on the ROC curve, and systematic evaluation of these points could help us quantify performance variability across thresholds and detect threshold-driven instability (defined as significant performance fluctuations within narrow threshold ranges). The SROC curve displayed no “shoulder-arm” distribution ( and ).
However, the model’s performance in independent external datasets (detection: sensitivity 87%, specificity 93%; classification: sensitivity 89%, specificity 90%) was lower than the overall estimates. Notably, the lower performance observed in external validation datasets compromised the generalization capability of the model, potentially attributable to domain shift, warranting caution when applied to new populations or settings.
Subgroup Analyses
Overview
Traditional ML or DL algorithms were reported in the included studies. These studies varied in primary objectives (detection, classification, or both detection and classification). Due to overlapping objectives among studies, the sum of the number of studies on traditional ML and DL algorithms did not match the total number of included studies. According to the Lens Opacities Classification System III (LOCS III) [] and the methods of Mackenbrock et al [] and Gali et al [], we classified cataracts into mild, moderate, and severe types and further categorized cataracts into PSC, pediatric cataract, PPC, NC, and CC.
Detection
Algorithm Types
DL algorithms were described in 36 studies (45 contingency tables). The pooled sensitivity and specificity of DL were 96% (95%‐97%) and 98% (97%‐99%), respectively, with an AUC of 0.99 (0.98‐1.00; ). Additionally, traditional ML algorithms were described in 5 studies (13 contingency tables). The pooled sensitivity and specificity of ML were 90% (87%‐91%) and 94% (91%‐96%), respectively, with an AUC of 0.95 (0.93‐0.97; ).

Disease Types
Around 37 studies (57 contingency tables) focused on nonspecific or general cataracts; the pooled sensitivity, specificity, and AUC were 96% (94%‐97%), 98% (97%-98%), and 0.99 (0.98‐1.00). The Cochran Q test revealed a statistically significant result (Q=199.023; P<.001) and high inconsistency (I²=99%; 95% CI 98‐99), indicating significant between-study variability. For cataract detection (CC: one study with one contingency table, pediatric cataract: 2 studies with eight contingency tables, NC: 2 studies with 3 contingency tables, PSC: one study with one contingency table, PPC: one study with one contingency table), the pooled sensitivity and specificity were 93% (90%‐95%) and 96% (92%‐98%), respectively, and the AUC was 0.97 (0.95‐0.98) in 4 studies (13 contingency tables). Due to the extremely limited sample sizes across clinical subtypes, with most subtypes reported by only one study, the meta-analysis results were less robust. Consequently, only a pooled heterogeneity assessment could be conducted. The Cochran Q test revealed a statistically significant result (Q=9.167; P=.005) and high inconsistency (I²=78%; 95% CI 53‐100), indicating significant between-study variability ().

Validation Types
Great heterogeneity and inconsistency were observed in the validation methods for cataract detection (I²=99% for both internal and external validation; P<.001), suggesting substantial variability in diagnostic or methodological methods across studies. The Deeks’ funnel plots revealed no great publication bias in the internal validation (P=.68) or the external validation (P=.23). The internal validation was conducted in 38 studies (54 contingency tables), which showed a pooled sensitivity of 96% (95%‐97%) and a pooled specificity of 98% (97%-98%), with an AUC of 0.99 (0.98‐1.00). The external validation was performed in only 8 studies (15 contingency tables), which showed a pooled sensitivity of 87% (81%‐92%) and a pooled specificity of 93% (86%‐96%), respectively, with an AUC of 0.95 (0.93‐0.97; ).

Classification
Algorithm Types
DL algorithms were described in 17 studies (52 contingency tables); the pooled sensitivity and specificity of DL were 94% (93%‐96%) and 97% (96%‐98%), respectively, with an AUC of 0.99 (0.98‐1.00; ). Additionally, traditional ML algorithms were described in 6 studies (15 contingency tables); the pooled sensitivity and specificity of ML were 88% (85%‐90%) and 94% (90%‐96%), respectively, with an AUC of 0.94 (0.92‐0.96; ).
Disease Types
For mild cataracts, 18 studies (25 contingency tables) had a pooled sensitivity of 92% (89‐94%) and a pooled specificity of 96% (94%‐97%), with an AUC of 0.98 (0.96‐0.99). For moderate cataracts, 11 studies (16 contingency tables) found comparable performance: SE=94% (90%‐96%), specificity=97% (95%‐98%), and AUC=0.99 (0.97‐0.99). For severe cataracts, 19 studies (28 contingency tables) had a sensitivity of 93% (90%‐95%) and a specificity of 98% (96%‐99%), with an AUC of 0.98 (0.97‐0.99). Great heterogeneity and inconsistency were observed across cataract severity (mild: I²=99%, moderate: I²=96%, severe: I²=99%; P<.001), suggesting substantial variability in diagnostic or methodological methods across studies. Furthermore, eight studies with 28 contingency tables focused on nonspecific or general cataracts, which had a pooled sensitivity of 95% (92%‐96%) and a pooled specificity of 98% (97%‐99%), with an AUC of 0.99 (0.98‐1.00). Great heterogeneity and inconsistency were observed across cataract severity (nonspecific or general cataract: I²=96%, NC: 100%; P<.001), suggesting substantial variability in diagnostic or methodological methods across studies. In the NC subgroup (9 studies with 27 contingency tables), extremely high heterogeneity was observed (I²=100%). Consequently, to maintain statistical consistency, we did not calculate a pooled estimate for this subgroup. Instead, the descriptive analysis showed a sensitivity of 89%‐93% and a specificity of 93%‐97% across these studies. Four studies with 15 contingency tables considered other clinical subtypes (PSC: one study with 3 contingency tables; CC: 4 studies with 12 contingency tables), with pooled sensitivity, specificity, and AUC of 91% (84%‐95%), 96% (94%‐97%), and 0.98 (0.96‐0.99). Due to the extremely limited sample sizes of other clinical subtypes, the meta-analysis results were less robust. Consequently, only a pooled heterogeneity assessment could be conducted. The Cochran Q test revealed a statistically significant result (Q=120.355; P<.001) and high inconsistency (I²=98%; 95% CI 97‐99), indicating significant between-study variability ().

Validation Types
The internal validation was adopted in 17 studies (59 contingency tables), which showed a pooled sensitivity of 93% (92%-95%) and a pooled specificity of 97% (96%‐98%), with an AUC of 0.99 (0.97‐0.99). The external validation was adopted in only 4 studies (9 contingency tables), which showed a pooled sensitivity of 89% (83%‐92%) and a pooled specificity of 90% (86%‐92%), with an AUC of 0.95 (0.93‐0.97; ).

Comparison Between DL and ML Algorithms
Detection
Two studies compared DL (8 contingency tables) and ML algorithms (7 contingency tables) using the same dataset. The pooled sensitivity was 94% (92%‐96%) for DL and 91% (85%‐95%) for ML algorithms. The pooled specificity was 99% (97%‐100%) for DL and 90% (86%‐93%) for ML algorithms. The AUC was 0.97 (0.95‐0.98) for DL and 0.96 (0.94‐0.97) for ML algorithms. For DL algorithms, the Cochran Q test revealed a statistically significant result (Q=9.675; P=.004) and high inconsistency (I²=79%; 95% CI 55‐100). For ML algorithms, the Cochran Q test also revealed a statistically significant result (Q=5.853; P=0.03) and high inconsistency (I²=66%; 95% CI 23‐100). Therefore, both DL and ML exhibited significant heterogeneity. However, due to overlapping CIs and no direct between-group comparisons, the statistical significance of the between-group heterogeneity could not be conclusively established based on these findings. The comparison between DL and ML algorithms for cataract detection was constrained at the time of this writing by minimal validation evidence, with only 2 studies available for direct benchmarking. As a result, it severely limited statistical power, compromised generalization capability, and inflated performance estimates for DL models ().

Classification
Four studies compared DL (10 contingency tables) and ML algorithms (10 contingency tables) using the same dataset. The pooled sensitivity was 95% (91%‐97%) for DL and 89% (86%‐91%) for ML algorithms. The pooled specificity was 98% (95%‐99%) for DL and 94% (89%‐97%) for ML algorithms. The AUC was 0.99 (0.98‐1.00) for DL and 0.94 (0.92‐0.96) for ML algorithms ().
DL Algorithms Versus Human Experts
Direct comparisons between DL algorithms (7 contingency tables) and human experts (10 contingency tables) were performed across 7 studies using the same datasets. For DL algorithms, quantitative pooling was not conducted due to extreme heterogeneity (Q=933.852; P<.001; I²=100%). Instead, a descriptive analysis revealed highly variable performance of DL algorithms, with sensitivity estimates ranging from 72% to 93% and specificity ranging from 64% to 99%. In contrast, human experts demonstrated moderate heterogeneity (Q=5.811; P=0.03; I²=66%), allowing for a pooled analysis. The pooled sensitivity and pooled specificity for human experts were 93% (95% CI 77%‐98%) and 95% (95% CI 79%‐99%), respectively, with an AUC of 0.98 (95% CI 0.97‐0.99; ). These results highlight that DL algorithms exhibited significantly higher heterogeneity than human experts. The Deeks’ funnel plot indicated no significant publication bias for human experts (P=.16), but potential borderline publication bias for DL algorithms (P=.05). The extreme heterogeneity (I²=100%) observed in the DL algorithms underscores that their efficacy is highly dependent on specific study conditions and architectures, precluding a uniform performance metric.

Publication Bias and Heterogeneity
The Deeks’ funnel plot revealed no significant publication bias for cataract detection (P=.48). However, potential borderline publication bias was detected for cataract classification (P=.05). However, the wide distribution of studies near the regression line in the plot should be further considered ().

The included studies showed substantial heterogeneity. The high I² values (sensitivity: 95.00% and specificity: 97.11% for cataract detection; sensitivity: 95.94% and specificity: 98.55% for classification) suggested considerable between-study variability (P<.001), which was further explored by subgroup analyses ( and ).


RoB
The quality of included studies was assessed using QUADAS-2, and a summary of findings is displayed in Figure S1 in (summary of methodological quality of 63 studies included). A detailed assessment for each item of RoB and applicability concern is also provided in Figure S2 in (diagrams of methodological quality of 63 studies included). High RoB or selection bias was detected in 34 studies due to a lack of randomization or eligibility criteria in the patient selection domain; RoB was high or unclear in 44 studies due to no predefined threshold in the index test domain.
Due to inconsistencies in reference standard (no reporting of whether blinding was implemented and the presence or absence of a predefined threshold), RoB was high or unclear in 2 studies. RoB was high or unclear in 34 studies in the domain of flow and timing due to no mention of whether the same gold standard was used or the presence or absence of an appropriate time gap.
High or unclear applicability in the domain of patient selection was detected in one study and unclear applicability in the domain of index test in 5 studies; 2 studies showed applicability concerns in the domain of reference standard.
Discussion
This meta-analysis included 63 studies and rigorously assessed the study quality using the QUADAS-2 to summarize the evidence to date on the performance of image-based DL in cataract diagnosis. The results revealed that DL algorithms might offer higher accuracy than traditional ML algorithms and fall within the range of reported accuracy of human experts in the detection of cataracts, demonstrating potential as tools for automated diagnosis. However, given the moderate quality and high heterogeneity of the current evidence base, these DL algorithms are considered primarily as adjuncts to cataract diagnosis.
DL has made tremendous progress in automated image analysis []. In clinical practice, a severe imbalance between supply and demand is present in ophthalmologic diagnosis. As AI advances, DL is expected to raise diagnostic efficiency and thus help alleviate health care resource inequality.
Four relevant systematic reviews and meta-analyses were identified. (1) Cheung et al [] found high sensitivity, specificity, and reproducibility and great heterogeneity of the ML model for cataract diagnosis in children and adults, but the strength of evidence was limited since only 11 studies with 13 contingency tables were included. (2) Liu et al [] found comparable performance of DL and experts in medical imaging, but only 2 out of 18 ophthalmology studies involved cataracts, necessitating in-depth research on DL in cataract diagnosis. (3) Aggarwal et al [] included 82 ophthalmology studies, which did not involve cataracts, and verified that DL possesses good sensitivity, specificity, and AUC for the feature identification of other eye diseases, but the algorithmic evaluation criteria remain to be standardized due to high heterogeneity. (4) Islam et al [] demonstrated that DL can be applied to retinal vessel segmentation, so it can be popularized in LMICs. Therefore, methodological limitations should be critically assessed during the clinical translation of DL to further improve its reliability.
This meta-analysis systematically evaluated the effectiveness of DL versus traditional ML in cataract detection and classification to provide a basis for clinical decision-making. For cataract detection, DL had a pooled sensitivity of 96% (95% CI 95%‐97%) and a pooled specificity of 98% (97%‐99%), with an AUC of 0.99 (0.98‐1.00); traditional ML had a pooled sensitivity of 90% (87%‐91%) and a pooled specificity of 94% (91%‐96%), with an AUC of 0.95 (0.93‐0.97). For cataract classification, the pooled sensitivity and specificity of DL were 94% (93%-96%) and 97% (96%‐98%), with an AUC of 0.99 (0.98‐1.00); the pooled sensitivity and specificity of traditional ML were 88% (85%‐90%) and 94% (90%‐96%), with an AUC of 0.94 (0.92‐0.96). Available evidence suggests that DL models exhibit high sensitivity and specificity in automated cataract diagnosis and that medical image-based DL still demonstrates superior robustness to ML despite the RoB in most studies. However, our conclusions were made partly based on studies of low quality due to a lack of external validation and nonstandardized reporting of performance metrics, which may overestimate algorithmic accuracy. The use of overlapping public repositories introduced potential clustering bias. To ensure robust generalizability, future research must prioritize validation on independent, multicenter, and nonpublic datasets. Furthermore, the lack of a sensitivity analysis excluding high-RoB studies implied that the high overall performance of DL models could be partially influenced by methodologically weaker studies, warranting validation in future high-quality trials. Finally, due to the limited number of eligible studies, we pooled smartphone-based data from varying modalities, including diffuse photography and slit-lamp adapters. While this aligned with our assessment of general mobile health accessibility, it introduced potential optical heterogeneity. Consequently, statistical quantification of performance differences between optical sectioning and diffuse lighting was not feasible, warranting future separate investigations.
We identified the potential of DL for clinical use from the included studies, but the number of studies directly comparing the performance of DL and ML in cataract diagnosis was limited, with only 2 in detection and 4 in classification. Traditional ML (eg, logistic regression, random forest, and support vector machine) is dependent on manual feature engineering and classifiers, whose performance is limited by feature quality and domain knowledge []. By contrast, DL (eg, CNN and transformer-based vision models) has higher accuracy in image recognition by automatic feature extraction but requires higher data volume and stronger arithmetic support [].
The number of included studies was limited, so image segmentation performance was not meta-analyzed. Image segmentation is essentially a pixel-level prediction task, which relies on the ability to characterize discriminative features. However, it is often difficult for traditional ML to effectively capture such features. In contrast, DL demonstrates significantly better performance by automated learning of complex patterns in highly heterogeneous data []. The property of DL models is such that they contain millions of parameters that need to be optimized by data-driven training, while traditional ML models display higher stability in small-sample multiclassification []. These findings suggest that traditional ML models may have the potential to raise the accuracy in cataract classification, while DL is more advantageous in reducing misdiagnosis rates. A combination of the 2 may facilitate the clinical translation of AI in the future.
Despite the available evidence on the clinical translational potential of DL, only 7 studies compared the efficacy of DL and human experts in cataract detection, and no studies compared their performance in cataract classification. Notably, significant interrater variability was present in expert performance due to heterogeneity in the cumulative clinical experience and health care resource allocation, highlighting the need for more comprehensive comparative studies. These studies generally report positive results of DL, but an optimism bias in algorithm performance may be produced due to insufficient sample sizes and methodological heterogeneity. Therefore, it is urgently needed to adopt a standardized study design and transparent reporting norms for improving the quality of evidence and clinical translational value of DL. DL has been successfully applied to retinal fundus image analysis [] for automated diagnosis of DR [] and glaucoma []. Considering the technical feasibility of DL in ophthalmic disease screening, it is recommended that DL serve as an adjunct to clinical diagnosis to optimize the treatment process by human-machine collaboration. Additionally, no great publication bias was detected, but the findings need to be interpreted with caution. It is recommended that a clinician’s diagnostic efficiency be used as a core assessment metric and that a real-world validation be incorporated in subsequent studies.
In the field of AI methodology, several standardized guidelines have been recently issued [,]. However, no unified consensus has been reached on AI for cataract diagnosis. At the time of this writing, computer-aided diagnostic techniques mainly use a combination of medical image processing and AI, but research focuses mostly on eye diseases of the posterior segment (eg, DR, and age-related macular degeneration) and less on the anterior segment (eg, cataracts). Nowadays, cataract diagnosis relies on slit-lamp microscopy of lens morphology, and cataract classification is based on the LOCS III [], the Oxford Clinical Cataract Classification and Grading System [], and the American Cooperative Cataract Research Group method []. DL has made a preliminary breakthrough in automated cataract classification, such as the turbidity-density-location assessment system developed by Lin et al [] based on anterior segment slit lamp images and the DL classification model developed by Zhou et al [] using retinal fundus images, but a standardized severity scale adapted to DL is urgently needed for its clinical use. Therefore, it is recommended that future studies standardize the criteria for performance assessment of DL algorithms, with a focus on improving the transparency of methodology reporting and repeatability validation process.
Data scarcity and lack of generalization capability are key scientific challenges for DL. In this systematic review, retrospective designs were adopted in most of the included studies, while prospective designs and multicenter clinical trials accounted for less than 5%, and their annotation criteria were not optimized for the need of DL, restricting methodological rigor. We should recognize that most of the studies used double-blind, annotated, and quality-controlled image data for model training, which effectively improved diagnostic accuracy and reduced the RoB while balancing data size and quality. The included studies generally used data enhancement techniques, indicating the lack of high-quality annotated datasets and prospective validation studies.
The acquisition of representative data for clinical validation remains the major bottleneck. The model’s clinical generalization capability is severely restricted due to the long time consumption of pixel-level annotation of fundus images, domain shift resulting from cross-device and cross-race variations, and a lack of fine-grained pathology characterization. In the future, research should focus on multimodal data fusion for modeling (eg, OCT plus fundus images), deployment of edge computing architectures, and embedding of causal inference modules, thereby establishing intelligent diagnosis and treatment systems with both clinical credibility and engineering practicality.
In this study, great methodological heterogeneity was found in the reporting of DL performance metrics, including incomplete reporting of sensitivity or specificity, a lack of 2×2 contingency tables, and general overreliance on aggregated metrics such as AUC-ROC and F1-score. Of particular note is that high AUC values (>0.90) may mask the risk of misjudgment of key positive events (eg, progressive cataract) in clinical scenarios with a severe imbalance in category distribution. Based on these findings, it is recommended that the confusion matrix serve as a core reporting metric. The above-mentioned problems can be gradually settled by high-quality studies.
Another key obstacle is no consensus on the interpretability of DL decision-making mechanisms. It is difficult to intuitively understand the decision-making logic of DL models due to their complex network structure and nonlinear feature extraction, which, as a “black-box” characteristic, has been deemed a challenge for clinical use. Recently, researchers have gradually revealed the intrinsic mechanism of DL models by gradient-weighted class activation mapping, adversarial testing, and causal inference []. For example, Chang et al [] used adversarial samples to analyze the decision-making basis of DL models in glaucoma detection. Araújo et al [] located the key region of DR in fundus images by multiple-instance learning. Explainable AI (XAI) is breaking through the limitations of traditional AI by synchronizing decision results with attributional explanations []. Abràmoff et al [] systematically reviewed the interpretability framework for DL models in the medical field, laying a theoretical foundation for their clinical translation. Future research needs to further explore the innovative methods of XAI in medical image analysis to enhance clinical credibility.
This study showed that DL was applied primarily to the screening of eye diseases such as DR for which mature diagnostic guidelines have been established. Notably, differences in health care resource allocation should be considered when popularizing DL. In subsequent studies, the clinical effect and health economic benefits should be assessed across different DL algorithms, and the “black-box” problem of DL algorithms should be solved using interpretability methods to enhance clinical acceptance.
Acknowledgments
This work was supported by the Regional Science Fund Project of the National Natural Science Foundation of China (grant No. 82360203). The sponsors or funding organizations had no role in the design or conduct of this research.
Funding
The authors declared no financial support was received for this work.
Data Availability
The data that support the findings of this study are available from the corresponding author upon reasonable request.
Authors' Contributions
Writing - original draft preparation: RL, HL
Writing - review and editing: RL, HL, DT
Conceptualization: RL, HL, SL, LL, DT, CL
Methodology: RL, HL, SL, LL, DT, CL
Formal analysis and investigation: RL, HL, SL, LL
Funding acquisition: DT
Resources: DT, CL
Supervision: DT, CL
And all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Conflicts of Interest
None declared.
Multimedia Appendix 1
Methodological quality summary and risk of bias graph of the studies included in the meta-analysis.
DOC File, 1647 KBReferences
- Cicinelli MV, Buchan JC, Nicholson M, Varadaraj V, Khanna RC. Cataracts. Lancet. Feb 4, 2023;401(10374):377-389. [CrossRef] [Medline]
- Steinmetz JD, Bourne RRA, Briant PS, et al. Causes of blindness and VISION impairment in 2020 and trends over 30 years, and prevalence of avoidable blindness in relation to VISION 2020: the Right to Sight: an analysis for the Global Burden of Disease Study. Lancet Glob Health. Feb 2021;9(2):e144-e160. [CrossRef]
- Bourne R, Steinmetz JD, Flaxman S, et al. Trends in prevalence of blindness and distance and near vision impairment over 30 years: an analysis for the Global Burden of Disease Study. Lancet Glob Health. Feb 2021;9(2):e130-e143. [CrossRef]
- Dawson-Squibb JJ, Davids EL, Viljoen M, Rice K, Stein DJ. The WHO international classification of diseases 11th revision (ICD-11). In: Matson JL, editor. Handbook of Clinical Child Psychology: Integrating Theory and Research into Practice. Springer International Publishing; 2023:53-78. [CrossRef]
- Ming S, Yao X, Guo X, et al. Performance of ChatGPT in ophthalmic registration and clinical diagnosis: cross-sectional study. J Med Internet Res. Nov 14, 2024;26:e60226. [CrossRef] [Medline]
- Chen X, Wang X, Zhang K, et al. Recent advances and clinical applications of deep learning in medical image analysis. Med Image Anal. Jul 2022;79:102444. [CrossRef] [Medline]
- Anwar SM, Majid M, Qayyum A, Awais M, Alnowami M, Khan MK. Medical image analysis using convolutional neural networks: a review. J Med Syst. Oct 8, 2018;42(11):226. [CrossRef] [Medline]
- He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. Presented at: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); Jun 27-30, 2016. [CrossRef]
- Hasanah SA, Pravitasari AA, Abdullah AS, Yulita IN, Asnawi MH. A deep learning review of ResNet Architecture for lung disease identification in CXR image. Appl Sci (Basel). 2023;13(24):13111. [CrossRef]
- Williams D, Hornung H, Nadimpalli A, Peery A. Deep learning and its application for healthcare delivery in low and middle income countries. Front Artif Intell. 2021;4:553987. [CrossRef] [Medline]
- Zhou K, Liu Z, Qiao Y, Xiang T, Loy CC. Domain generalization: a survey. IEEE Trans Pattern Anal Mach Intell. Apr 2023;45(4):4396-4415. [CrossRef] [Medline]
- Kim DW, Jang HY, Kim KW, Shin Y, Park SH. Design characteristics of studies reporting the performance of artificial intelligence algorithms for diagnostic analysis of medical images: results from recently published papers. Korean J Radiol. Mar 2019;20(3):405-410. [CrossRef] [Medline]
- Kelly CJ, Karthikesalingam A, Suleyman M, Corrado G, King D. Key challenges for delivering clinical impact with artificial intelligence. BMC Med. Oct 29, 2019;17(1):195. [CrossRef] [Medline]
- Li T, Bo W, Hu C, et al. Applications of deep learning in fundus images: a review. Med Image Anal. Apr 2021;69:101971. [CrossRef] [Medline]
- Benjamens S, Dhunnoo P, Meskó B. The state of artificial intelligence-based FDA-approved medical devices and algorithms: an online database. NPJ Digit Med. 2020;3:118. [CrossRef] [Medline]
- Khan Z, Gaidhane AM, Singh M, et al. Diagnostic accuracy of IDX-DR for detecting diabetic retinopathy: a systematic review and meta-analysis. Am J Ophthalmol. May 2025;273:192-204. [CrossRef] [Medline]
- Verbraak FD, Abramoff MD, Bausch GCF, et al. Diagnostic accuracy of a device for the automated detection of diabetic retinopathy in a primary care setting. Diabetes Care. Apr 2019;42(4):651-656. [CrossRef] [Medline]
- McInnes MDF, Moher D, Thombs BD, et al. Preferred Reporting Items for a Systematic Review and Meta-analysis of Diagnostic Test Accuracy Studies: the PRISMA-DTA statement. JAMA. Jan 23, 2018;319(4):388-396. [CrossRef] [Medline]
- Lin H, Li R, Liu Z, et al. Diagnostic efficacy and therapeutic decision-making capacity of an artificial intelligence platform for childhood cataracts in eye clinics: a multicentre randomized controlled trial. EClinicalMedicine. Mar 2019;9:52-59. [CrossRef] [Medline]
- Deepak GD, Bhat SK. Deep learning-based CNN for multiclassification of ocular diseases using transfer learning. Comput Methods Biomech Biomed Eng Imaging Vis. Dec 31, 2024;12(1). [CrossRef]
- Zhao J, Wan C, Li J, Zhang Z, Yang W, Li K. NCME-Net: nuclear cataract mask encoder network for intelligent grading using self-supervised learning from anterior segment photographs. Heliyon. Jul 2024;10(14):e34726. [CrossRef]
- Zia A, Mahum R, Ahmad N, Awais M, Alshamrani AM. Eye diseases detection using deep learning with BAM attention module. Multimed Tools Appl. 2024;83(20):59061-59084. [CrossRef]
- Zhang X, Xiao Z, Wu X, et al. Nuclear cataract classification in anterior segment OCT based on clinical global–local features. Complex Intell Syst. Apr 2023;9(2):1479-1493. [CrossRef]
- Zéboulon P, Panthier C, Rouger H, Bijon J, Ghazal W, Gatinel D. Development and validation of a pixel wise deep learning model to detect cataract on swept-source optical coherence tomography images. J Optom. 2022;15 Suppl 1(Suppl 1):S43-S49. [CrossRef] [Medline]
- Zhang X, Xiao Z, Yang B, Wu X, Higashita R, Liu J. Regional context-based recalibration network for cataract recognition in AS-OCT. Pattern Recognit DAGM. Mar 2024;147:110069. [CrossRef]
- Zhang X, Xiao Z, Fu H, et al. Attention to region: region-based integration-and-recalibration networks for nuclear cataract classification using AS-OCT images. Med Image Anal. Aug 2022;80:102499. [CrossRef] [Medline]
- Xie H, Li Z, Wu C, et al. Deep learning for detecting visually impaired cataracts using fundus images. Front Cell Dev Biol. 2023;11:1197239. [CrossRef] [Medline]
- Wu X, Xu D, Ma T, et al. Artificial intelligence model for antiinterference cataract automatic diagnosis: a diagnostic accuracy study. Front Cell Dev Biol. 2022;10:906042. [CrossRef] [Medline]
- Vasan CS, Gupta S, Shekhar M, et al. Accuracy of an artificial intelligence-based mobile application for detecting cataracts: results from a field study. Indian J Ophthalmol. Aug 2023;71(8):2984-2989. [CrossRef] [Medline]
- Ul Hassan M, Al-Awady AA, Ahmed N, et al. A transfer learning enabled approach for ocular disease detection and classification. Health Inf Sci Syst. Dec 2024;12(1):36. [CrossRef] [Medline]
- Ueno Y, Oda M, Yamaguchi T, et al. Deep learning model for extensive smartphone-based diagnosis and triage of cataracts and multiple corneal diseases. Br J Ophthalmol. Sep 20, 2024;108(10):1406-1413. [CrossRef] [Medline]
- Singh S, Banoub R, Sanghvi HA, et al. An artificial intelligence driven approach for classification of ophthalmic images using convolutional neural network: an experimental study. Curr Med Imaging. 2024;20:e15734056286918. [CrossRef] [Medline]
- Shafiq M, Fan Q, Alghamedy FH, Obidallah WJ. DualEye-FeatureNet: a dual-stream feature transfer framework for multi-modal ophthalmic image classification. IEEE Access. 2024;12:143985-144008. [CrossRef]
- Santone A, Cesarelli M, Colasuonno E, Bevilacqua V, Mercaldo F. A method for ocular disease diagnosis through visual prediction explainability. Electronics (Basel). Jul 2024;13(14):2706. [CrossRef]
- Jawad MA, Khursheed F, Nawaz S, Mir AH. Towards improved fundus disease detection using Swin Transformers. Multimed Tools Appl. 2024;83(32):78125-78159. [CrossRef]
- Janti SS, Saluja R, Tiwari N, et al. Evaluation of the clinical impact of a smartphone application for cataract detection. Cureus. Oct 2024;16(10):e71467. [CrossRef] [Medline]
- Emir B, Colak E. Performance analysis of pretrained convolutional neural network models for ophthalmological disease classification. Arq Bras Oftalmol. 2023;87(5):e20220124. [CrossRef] [Medline]
- Ogundokun RO, Awotunde JB, Akande HB, Lee CC, Imoize AL. Deep transfer learning models for mobile-based ocular disorder identification on retinal images. CMC. 2024;80(1):139-161. [CrossRef]
- Nguyen VV, Lin CL. Enhancing cataract detection through hybrid CNN approach and image quadration: a solution for precise diagnosis and improved patient care. Electronics (Basel). 2024;13(12):2344. [CrossRef]
- Mai ELC, Chen BH, Su TY. Innovative utilization of ultra-wide field fundus images and deep learning algorithms for screening high-risk posterior polar cataract. J Cataract Refract Surg. Jun 1, 2024;50(6):618-623. [CrossRef] [Medline]
- Raveenthini M, Lavanya R, Benitez R. Interpretable diagnostic system for multiocular diseases based on hybrid meta-heuristic feature selection. Comput Biol Med. Jan 2025;184:109486. [CrossRef] [Medline]
- Rafay A, Asghar Z, Manzoor H, Hussain W. EyeCNN: exploring the potential of convolutional neural networks for identification of multiple eye diseases through retinal imagery. Int Ophthalmol. Oct 2023;43(10):3569-3586. [CrossRef] [Medline]
- Abbas Q, Albathan M, Altameem A, Almakki RS, Hussain A. Deep-ocular: improved transfer learning architecture using self-attention and dense layers for recognition of ocular diseases. Diagnostics (Basel). Oct 10, 2023;13(20):3165. [CrossRef] [Medline]
- Uyar K, Yurdakul M, Taşdemir Ş. Abc-based weighted voting deep ensemble learning model for multiple eye disease detection. Biomed Signal Process Control. Oct 2024;96:106617. [CrossRef]
- Serwaa M, Mensah PK, Adekoya AF, Ayidzoe MA. LBPSCN: local binary pattern scaled capsule network for the recognition of ocular diseases. Int J Adv Comput Sci Appl. Jan 2024;15(6). [CrossRef]
- Zhang X, Xiao Z, Hu L, et al. CCA-Net: clinical-awareness attention network for nuclear cataract classification in AS-OCT. Knowl Based Syst. Aug 2022;250:109109. [CrossRef]
- Glaret Subin P, Muthukannan P. Optimized convolution neural network based multiple eye disease detection. Comput Biol Med. Jul 2022;146:105648. [CrossRef] [Medline]
- Xiao Z, Zhang X, Zheng B, Guo Y, Higashita R, Liu J. Multi-style spatial attention module for cortical cataract classification in AS-OCT image with supervised contrastive learning. Comput Methods Programs Biomed. Feb 2024;244:107958. [CrossRef] [Medline]
- Wang Q, Sun H, Feng Y, Dong Z, Bai C. MGCNet: multi-granularity cataract classification using denoising diffusion probabilistic model. Displays. Jul 2024;83:102716. [CrossRef]
- Kumari P, Saxena P. Cataract detection and visualization based on multi-scale deep features by RINet tuned with cyclic learning rate hyperparameter. Biomed Signal Process Control. Jan 2024;87:105452. [CrossRef]
- Devaraj M, Namasivayam V, Srichandan SS, et al. Development and testing of artificial intelligence-based mobile application to achieve cataract backlog-free status in Uttar Pradesh, India. Asia Pac J Ophthalmol (Phila). 2024;13(5):100094. [CrossRef] [Medline]
- Al-Saadi EH, Khdiar AN, Al-Saadi LH. An automated wavelet scattering network classification using three stages of cataract disease. Baghdad SciJ. Feb 2024;21(9):3044. URL: https://bsj.uobaghdad.edu.iq/home/vol21/iss9/12/ [Accessed 2026-04-24] [CrossRef]
- Elsawy A, Keenan TDL, Chen Q, et al. A deep network DeepOpacityNet for detection of cataracts from color fundus photographs. Commun Med (Lond). Dec 16, 2023;3(1):184. [CrossRef] [Medline]
- Akram A, Debnath R. An automated eye disease recognition system from visual content of facial imagesusing machine learning techniques. Turk J Elec Eng Comp Sci. Apr 2020;28(2):917-932. [CrossRef]
- Jiang J, Wang L, Fu H, et al. Automatic classification of heterogeneous slit-illumination images using an ensemble of cost-sensitive convolutional neural networks. Ann Transl Med. Apr 2021;9(7):550. [CrossRef] [Medline]
- Yadav S, Yadav J. Automatic cataract severity detection and grading using deep learning. J Sens. Jan 2023;2023(1). URL: https://onlinelibrary.wiley.com/doi/10.1155/2023/2973836 [Accessed 2026-04-24] [CrossRef]
- Yadav S, Yadav J. Enhancing cataract detection precision: a deep learning approach. Trait Signal. Aug 31, 2023;40(4):1413-1424. URL: https://www.iieta.org/journals/ts/paper/10.18280/ts.400410 [Accessed 2026-04-24] [CrossRef]
- Subin PG, Kannan PM. Multiple eye disease detection using hybrid adaptive mutation swarm optimization and RNN. Int J Adv Comput Sci Appl. 2022;13(9). [CrossRef]
- Pratap T, Kokil P. Computer-aided diagnosis of cataract using deep transfer learning. Biomed Signal Process Control. Aug 2019;53:101533. [CrossRef]
- Luo X, Li J, Chen M, Yang X, Li X. Ophthalmic disease detection via deep learning with a novel mixture loss function. IEEE J Biomed Health Inform. Sep 2021;25(9):3332-3339. [CrossRef] [Medline]
- Imran A, Li J, Pei Y, Akhtar F, Yang JJ, Dang Y. Automated identification of cataract severity using retinal fundus images. Comput Methods Biomech Biomed Eng Imaging Vis. Nov 1, 2020;8(6):691-698. [CrossRef]
- Imran A, Li J, Pei Y, Akhtar F, Mahmood T, Zhang L. Fundus image-based cataract classification using a hybrid convolutional and recurrent neural network. Vis Comput. Aug 2021;37(8):2407-2417. [CrossRef]
- Acar E, Türk Ö, Ertuğrul Ö, Aldemi̇r E. Employing deep learning architectures for image-based automatic cataract diagnosis. Turk J Elec Eng Comp Sci. Sep 2021;29(SI-1):2649-2662. [CrossRef]
- Olaniyan J, Olaniyan D, Obagbuwa IC, Esiefarienrhe BM, Odighi M. Transformative transparent hybrid deep learning framework for accurate cataract detection. Appl Sci (Basel). 2024;14(21):10041. [CrossRef]
- Ganokratanaa T, Ketcham M, Pramkeaw P. Advancements in cataract detection: the systematic development of LeNet-convolutional neural network models. J Imaging. Sep 26, 2023;9(10):197. [CrossRef] [Medline]
- Gan F, Liu H, Qin WG, Zhou SL. Application of artificial intelligence for automatic cataract staging based on anterior segment images: comparing automatic segmentation approaches to manual segmentation. Front Neurosci. 2023;17:1182388. [CrossRef] [Medline]
- Tham YC, Goh JHL, Anees A, et al. Detecting visually significant cataract using retinal photograph-based deep learning. Nat Aging. Mar 2022;2(3):264-271. [CrossRef] [Medline]
- Siddique M. Convolutional neural network modeling for eye disease recognition. Int J Onl Eng. Jul 2022;18(9):115-130. [CrossRef]
- Sirajudeen A, Balasubramaniam A, Karthikeyan S. Novel angular binary pattern (NABP) and kernel based convolutional neural networks classifier for cataract detection. Multimed Tools Appl. Nov 2022;81(27):38485-38512. [CrossRef]
- Junayed MS, Islam MB, Sadeghzadeh A, Rahman S. CataractNet: an automated cataract detection system using deep learning for fundus images. IEEE Access. 2021;9:128799-128808. [CrossRef]
- Hu S, Wang X, Wu H, et al. Unified diagnosis framework for automated nuclear cataract grading based on smartphone slit-lamp images. IEEE Access. 2020;8:174169-174178. [CrossRef] [Medline]
- Hu S, Luan X, Wu H, et al. ACCV: automatic classification algorithm of cataract video based on deep learning. Biomed Eng Online. Aug 5, 2021;20(1):78. [CrossRef] [Medline]
- Lai CJ, Pai PF, Marvin M, Hung HH, Wang SH, Chen DN. The use of convolutional neural networks and digital camera images in cataract detection. Electronics (Basel). 2022;11(6):887. [CrossRef]
- Askarian B, Ho P, Chong JW. Detecting cataract using smartphones. IEEE J Transl Eng Health Med. 2021;9:3800110. [CrossRef] [Medline]
- Son KY, Ko J, Kim E, et al. Deep learning-based cataract detection and grading from slit-lamp and retro-illumination photographs: model development and validation study. Ophthalmol Sci. Jun 2022;2(2):100147. [CrossRef] [Medline]
- Saju B, Rajesh R. Eye-vision net: cataract detection and classification in retinal and slit lamp images using deep network. Int J Adv Comput Sci Appl. 2022;13(12). [CrossRef]
- Chellaswamy C, Geetha TS, Ramasubramanian B, Abirami R, Archana B, Divya Bharathi A. Optimized convolutional neural network based multiple eye disease detection and information sharing system. Presented at: 2022 6th International Conference on Intelligent Computing and Control Systems (ICICCS); May 25-27, 2022. [CrossRef]
- Lu Q, Wei L, He W, et al. Lens opacities classification system III-based artificial intelligence program for automatic cataract grading. J Cataract Refract Surg. May 1, 2022;48(5):528-534. [CrossRef] [Medline]
- Al‐Naji A, Khalid GA, Mahmood MF, Chahl JS. Computer vision for eye diseases detection using pre‐trained deep learning techniques and raspberry Pi. J Eng. Jul 2024;2024(7). URL: https://ietresearch.onlinelibrary.wiley.com/toc/20513305/2024/7 [CrossRef]
- Elloumi Y. Cataract grading method based on deep convolutional neural networks and stacking ensemble learning. Int J Imaging Syst Tech. May 2022;32(3):798-814. URL: https://onlinelibrary.wiley.com/toc/10981098/32/3 [CrossRef]
- Zannah TB, Abdulla-Hil-Kafi M, Sheakh M, et al. Bayesian optimized machine learning model for automated eye disease classification from fundus images. Computation. 2024;12(9):190. [CrossRef]
- Whiting PF, Rutjes AWS, Westwood ME, et al. QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies. Ann Intern Med. Oct 18, 2011;155(8):529-536. [CrossRef] [Medline]
- Higgins JPT, Thompson SG, Deeks JJ, Altman DG. Measuring inconsistency in meta-analyses. BMJ. Sep 6, 2003;327(7414):557-560. [CrossRef] [Medline]
- Reitsma JB, Glas AS, Rutjes AWS, Scholten R, Bossuyt PM, Zwinderman AH. Bivariate analysis of sensitivity and specificity produces informative summary measures in diagnostic reviews. J Clin Epidemiol. Oct 2005;58(10):982-990. [CrossRef] [Medline]
- Dwamena B. MIDAS: Stata module for meta-analytical integration of diagnostic test accuracy studies. Statistical Software Components. Feb 2009;14. URL: https://www.semanticscholar.org/paper/MIDAS%3A-Stata-module-for-meta-analytical-integration-Dwamena/ba5239784c4ae152163c214e8d022c7d6c1202f2?utm_source=direct_link [Accessed 2026-04-23]
- Chu Y, Hu S, Li Z, et al. Image analysis-based machine learning for the diagnosis of retinopathy of prematurity: a meta-analysis and systematic review. Ophthalmol Retina. Jul 2024;8(7):678-687. [CrossRef] [Medline]
- Chylack LT, Wolfe JK, Singer DM, Leske MC, Bullimore MA, Bailey IL. The Lens Opacities Classification System III. Arch Ophthalmol. Jun 1, 1993;111(6):831. [CrossRef]
- Mackenbrock LHB, Labuz G, Baur ID, Yildirim TM, Auffarth GU, Khoramnia R. Cataract classification systems: a review. Klin Monbl Augenheilkd. Jan 2024;241(1):75-83. [CrossRef] [Medline]
- Gali HE, Sella R, Afshari NA. Cataract grading systems: a review of past and present. Curr Opin Ophthalmol. Jan 2019;30(1):13-18. [CrossRef] [Medline]
- Cheung R, So S, Malvankar-Mehta MS. Diagnostic accuracy of machine learning classifiers for cataracts: a systematic review and meta-analysis. Expert Rev Ophthalmol. Nov 2, 2022;17(6):427-437. [CrossRef]
- Liu X, Faes L, Kale AU, et al. A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis. Lancet Digit Health. Oct 2019;1(6):e271-e297. [CrossRef] [Medline]
- Aggarwal R, Sounderajah V, Martin G, et al. Diagnostic accuracy of deep learning in medical imaging: a systematic review and meta-analysis. NPJ Digit Med. Apr 7, 2021;4(1):65. [CrossRef] [Medline]
- Islam MM, Poly TN, Walther BA, Yang HC, Li YCJ. Artificial intelligence in ophthalmology: a meta-analysis of deep learning models for retinal vessels segmentation. J Clin Med. Apr 3, 2020;9(4):1018. [CrossRef] [Medline]
- Shen D, Wu G, Zhang D, Suzuki K, Wang F, Yan P. Machine learning in medical imaging. Comput Med Imaging Graph. Apr 2015;41:1-2. [CrossRef] [Medline]
- Castiglioni I, Rundo L, Codari M, et al. AI applications to medical images: from machine learning to deep learning. Phys Med. Mar 2021;83:9-24. [CrossRef] [Medline]
- Islam MM, Yang HC, Poly TN, Jian WS, Jack Li YC. Deep learning algorithms for detection of diabetic retinopathy in retinal fundus photographs: a systematic review and meta-analysis. Comput Methods Programs Biomed. Jul 2020;191:105320. [CrossRef] [Medline]
- Buisson M, Navel V, Labbé A, et al. Deep learning versus ophthalmologists for screening for glaucoma on fundus examination: a systematic review and meta-analysis. Clin Exp Ophthalmol. Dec 2021;49(9):1027-1038. [CrossRef] [Medline]
- Rivera SC, Liu X, Chan AW, Denniston AK, Calvert MJ, SPIRIT-AI and CONSORT-AI Working Group. Guidelines for clinical trial protocols for interventions involving artificial intelligence: the SPIRIT-AI Extension. BMJ. Sep 9, 2020;370:m3210. [CrossRef] [Medline]
- Liu X, Rivera SC, Moher D, Calvert MJ, Denniston AK, SPIRIT-AI and CONSORT-AI Working Group. Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: the CONSORT-AI Extension. BMJ. Sep 9, 2020;370:m3164. [CrossRef] [Medline]
- Sparrow JM, Bron AJ, Brown NAP, Ayliffe W, Hill AR. The Oxford clinical cataract classification and grading system. Int Ophthalmol. Dec 1986;9(4):207-225. [CrossRef] [Medline]
- Chylack LT, Lee MR, Tung WH, Cheng HM. Classification of human senile cataractous changes by the American Cooperative Cataract Research Group (CCRG) method. I. Instrumentation and technique. Invest Ophthalmol Vis Sci. Apr 1983;24(4):424-431. [Medline]
- Zhou Y, Li G, Li H. Automatic cataract classification using deep neural network with discrete state transition. IEEE Trans Med Imaging. Feb 2020;39(2):436-446. [CrossRef] [Medline]
- Chang J, Lee J, Ha A, et al. Explaining the rationale of deep learning glaucoma decisions with adversarial examples. Ophthalmology. Jan 2021;128(1):78-88. [CrossRef] [Medline]
- Araújo T, Aresta G, Mendonça L, et al. DR|GRADUATE: uncertainty-aware deep learning-based diabetic retinopathy grading in eye fundus images. Med Image Anal. Jul 2020;63:101715. [CrossRef] [Medline]
- van der Velden BHM, Kuijf HJ, Gilhuijs KGA, Viergever MA. Explainable artificial intelligence (XAI) in deep learning-based medical image analysis. Med Image Anal. Jul 2022;79:102470. [CrossRef] [Medline]
- Abràmoff MD, Cunningham B, Patel B, et al. Foundational considerations for artificial intelligence using ophthalmic images. Ophthalmology. Feb 2022;129(2):e14-e32. [CrossRef] [Medline]
Abbreviations
| AI: artificial intelligence |
| AUC: area under the receiver operating characteristic curve |
| CC: cortical cataract |
| CNN: convolutional neural network |
| DL: deep learning |
| DR: diabetic retinopathy |
| FDA: the US Food and Drug Administration |
| FN: false negative |
| FP: false positive |
| ICD-11: International Classification of Diseases, 11th Revision |
| LMIC: low- and middle-income country |
| LOCS III: lens opacities classification system III |
| LRS: low-resource settings |
| ML: machine learning |
| NC: nuclear cataract |
| OCT: optical coherence tomography |
| PI: prediction interval |
| PPC: posterior polar cataract |
| PRISMA-DTA: Preferred Reporting Items for Systematic Reviews and Meta-analyses of Diagnostic Test Accuracy |
| PROSPERO: International Prospective Register of Systematic Reviews |
| PSC: posterior subcapsular cataract |
| QUADAS-2: Quality Assessment of Diagnostic Accuracy Studies-2 |
| ResNet: residual network |
| RevMan: Review Manager |
| RoB: risk of bias |
| SROC: summary receiver operating characteristic curve |
| TN: true negative |
| TP: true positive |
| WHO: World Health Organization |
| XAI: explainable artificial intelligence |
Edited by Andrew Coristine; submitted 11.Jun.2025; peer-reviewed by Gary L Legault, Hirotaka Tanabe, Xiaomin Wang, Yi Lu; final revised version received 14.Feb.2026; accepted 18.Feb.2026; published 29.Apr.2026.
Copyright© Ruixi Li, Hongyi Li, Chong Li, Shuo Li, Linhong Lei, Dan Tao. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 29.Apr.2026.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research (ISSN 1438-8871), is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.

