Stigmatizing Language in Gender-Expansive Patient Records: Corpus, Disparity Analysis, and NLP-based Detection
Date Submitted: Jan 8, 2026
Open Peer Review Period: Jan 8, 2026 - Mar 5, 2026
Background: Stigmatizing language (SL) in electronic health records (EHRs) can influence clinical decision-making, propagate bias across care encounters, and undermine patient trust. Gender-expansive patients (GEPs) may be particularly vulnerable to documentation-based stigma, yet large-scale quantitative evidence and fairness-aware evaluation of automated SL detection methods remain limited. Objective: To quantify stigmatizing language in clinical documentation for gender-expansive patients by introducing a labeled corpus, analyzing demographic disparities, and evaluating fairness-aware natural language processing (NLP) methods for SL detection. Methods: We developed a corpus of 780 discharge summaries from a large academic health system, annotated for SL and its subtypes. Notes were categorized by GEP versus non-GEP status. We conducted logistic regression to assess associations between GEP status and SL presence, adjusting for demographics. Multiple NLP models, including transfer learning approaches, were benchmarked for SL detection. We implemented fairness-aware thresholding to reduce subgroup performance gaps. Results: SL appeared in 61.9% of GEP notes compared to 26.1% of non-GEP notes. After adjustment, GEP status remained a significant predictor of SL (odds ratio > 4). Baseline NLP models exhibited subgroup disparities, with high performance gaps in accuracy, true positive rate, and false positive rates between GEP and non-GEP patients. Our fairness-aware thresholding approach reduced error disparities (ΔFPR from 21.16% to 6.65% and ΔTPR from 21.11% to 0.00% with minimal accuracy loss), while maintaining overall accuracy. Conclusions: Stigmatizing language is common in EHR documentation and disproportionately affects gender-expansive patients, with automated detection models showing persistent subgroup performance gaps. This study introduces the first annotated corpus focused on SL in GEP documentation, quantifies demographic disparities, and demonstrates practical fairness-aware NLP strategies that can reduce error-rate inequities while preserving accuracy. These findings support equity-focused interventions and inform digital health workflows aimed at reducing stigmatizing language in EHRs.
