
Highlight
This study evaluates the effectiveness of the Geriatric End-of-Life Screening Tool (GEST), an AI-based model, versus the traditional physician answered surprise question (SQ) in predicting 6-month mortality among elderly emergency department (ED) patients. The collaborative GEST+SQ model improved calibration over GEST alone, and a sequential screening protocol significantly reduces physician burden while maintaining prediction accuracy.
Study Background
Emergency departments frequently encounter older patients with complex health statuses, where identifying those at near end-of-life is critical for timely palliative care discussions and appropriate resource allocation. Traditional prognostication often relies on clinicians’ intuitive assessments such as the “surprise question” (SQ): “Would you be surprised if this patient died in the next 6 months?” Although widely used, the SQ is limited by subjectivity and variable accuracy. Artificial intelligence (AI) models leveraging electronic health record data present an opportunity to improve prediction reliability and complement clinicians’ assessments in acute care settings.
Study Design
This prospective cohort study was conducted at a single tertiary academic emergency department from November 2022 to June 2023. It enrolled patients aged 65 years and older presenting to the ED. Three prediction modalities for 6-month mortality were compared:
- The clinical surprise question (SQ), answered by physicians at ED disposition and recorded in the electronic health record
- The Geriatric End-of-Life Screening Tool (GEST), an AI model incorporating laboratory results, vital signs, demographics, and medical history to calculate mortality risk
- A novel combined logistic regression model (GEST+SQ) integrating both SQ and GEST outputs
Mortality data were obtained through electronic and state records. The study evaluated sensitivity, specificity, area under the receiver operating characteristic curve (ROC-AUC) for discrimination, and expected calibration error for model calibration. Additionally, the authors designed a sequential screening pathway where patients initially stratified by GEST into low- and high-risk categories bypassed further SQ screening, while only intermediate-risk patients received physician SQ assessment.
Key Findings
Out of 9,256 eligible patients, 3,479 had completed SQ responses (37.6%), with a 6-month mortality rate of 13.3%. When GEST sensitivity was matched to that of the SQ (83.8%), GEST demonstrated superior specificity (61.5% vs. 50.8%). Conversely, at matched specificity (50.8%), GEST sensitivity (90.0%) surpassed that of SQ (83.8%). The ROC-AUC was 0.79 for GEST versus 0.80 for the combined GEST+SQ model, indicating modest improvement in overall discrimination.
Importantly, the GEST+SQ model significantly improved calibration metrics (expected calibration error 0.01 vs. 0.042 for GEST alone), suggesting better alignment between predicted and actual mortality risk. Implementing a sequential screening strategy allowed for physician SQ input in only 5% of patients (the intermediate-risk subgroup), potentially reducing clinician assessment burden by 95% compared to SQ-only approaches.
Overall, the AI-based GEST outperformed the physician SQ in mortality prediction, and the collaborative model enhanced risk calibration without significantly improving discrimination. The sequential screening model combining automated risk scoring with targeted physician input is practical and resource-efficient for ED settings.
Expert Commentary
This study addresses a critical challenge in emergency medicine: identifying older adults near end-of-life in a high-paced environment. The use of AI tools like GEST leverages routinely collected clinical data to provide objective, reproducible risk assessment. While the SQ offers valuable clinical intuition, it is subject to heterogeneous interpretation and incomplete application, as seen with only 37.6% response rate in this study cohort.
The modest improvement in ROC-AUC by integrating SQ with GEST reflects that AI tools can capture much prognostic information, but clinician judgment remains valuable for refining risk calibration. The sequential screening approach is innovative and pragmatic, potentially enhancing workflow by minimizing physicians’ cognitive load and focusing efforts on patients where clinical judgment adds most value.
Limitations include the single-center design and incomplete SQ response data, which may affect generalizability. Future research should validate these findings in diverse practice settings and assess the impact of such screening strategies on clinical outcomes and palliative care referrals.
Conclusion
The study demonstrates that an AI-driven screening tool, GEST, modestly outperforms the traditional clinician surprise question for predicting six-month mortality in elderly ED patients. A collaborative model combining GEST and SQ improves risk calibration and, when applied in a sequential screening pathway, can drastically reduce physician effort. These findings support integrating automated AI screening tools with clinician input to enhance end-of-life prognostication and resource allocation in emergency care.
Funding and Clinical Trials
The original study funding sources and clinical trial registration were not reported in the abstract. Readers are encouraged to consult the full published article for detailed disclosures.
References
- Haimovich, A.D., Erion-Barner, G., Nathanson, L.A., et al. Improving End-of-Life Screening in the Emergency Department With Collaborative Artificial Intelligence. Ann Emerg Med. 2026 Jun 18. PMID: 42313042.
- Downar J, Goldman R, Pinto R, et al. The “Surprise Question” for Predicting Death in Seriously Ill Patients: A Systematic Review and Meta-analysis. CMAJ. 2017;189(13):E484-E493.
- Tomasini C, Bursi F, Petrini L, et al. Artificial Intelligence and Machine Learning in Palliative Care: A Narrative Review. J Palliat Med. 2021;24(10):1542-1558.