Summary: In a post hoc analysis of 251 biopsy-confirmed NASH patients (fibrosis F1-F3) from a 72-week phase II randomized placebo-controlled trial, machine-learning (ML) digital pathology was compared with expert pathologists for scoring liver histology after subcutaneous semaglutide. Both methods detected significantly greater NASH resolution without worsening of fibrosis with semaglutide 0.4 mg versus placebo (pathologist 58.5% vs 22.0%, p<0.0001; ML 36.9% vs 11.9%, p=0.0015), and ML continuous scores flagged an antifibrotic signal (p=0.0099) not captured by categorical reads.
PICO Summary
| Element | Detail |
|---|---|
| Population | 251 patients with biopsy-confirmed NASH and fibrosis stage F1-F3; post hoc analysis of a subset from a 72-week randomized, double-blind, placebo-controlled phase II trial (NCT02970942); multinational. |
| Intervention | Once-daily subcutaneous semaglutide 0.1, 0.2, or 0.4 mg; week-72 biopsies digitized and scored by PathAI machine-learning models using categorical assessments and continuous scores (fibrosis, steatosis, inflammation, ballooning). Primary comparison reported for the 0.4 mg arm. |
| Comparison | Placebo, with the same biopsies independently read by two expert pathologists (reference standard) alongside the ML models. |
| Outcome | NASH resolution without worsening of fibrosis (semaglutide 0.4 mg vs placebo): pathologist 58.5% vs 22.0%, p<0.0001; ML categorical 36.9% vs 11.9%, p=0.0015. Secondary endpoint (fibrosis improvement without NASH worsening): higher but NONSIGNIFICANT for both methods. ML continuous scores detected a quantitative fibrosis reduction (p=0.0099) not seen on pathologist or ML categorical assessment. No CI, ARR, or NNT reported in the abstract. |
AI vs Pathologist Scoring in Semaglutide NASH
Phase II RCT · NASH F1-F3 · 72 weeks
Both pathologist and ML categorical reads confirmed greater NASH resolution with semaglutide 0.4 mg vs placebo. ML continuous scoring added an antifibrotic signal that categorical reads missed; hypothesis-generating, vendor-supplied.
Expert Commentary
The verdict is measured: this is a methods-validation exercise, not a fresh efficacy claim. As a post hoc analysis of a subset from a phase II trial, its purpose was to test whether a machine-learning pathology model reproduces what trained pathologists already saw, and it largely did. The categorical concordance for NASH resolution is reassuring, and the continuous scoring surfaced an antifibrotic signal that conventional categorical reads missed, which is the genuinely interesting hint. That signal, however, should be read as hypothesis-generating rather than established, because the prespecified categorical fibrosis-improvement endpoint was nonsignificant for both methods, and a post hoc continuous-score finding from a subset cannot carry confirmatory weight. The single limitation that matters most is sponsorship: the trial is funded by the drug manufacturer and the ML tool is supplied by its commercial vendor, so both the intervention and the measuring instrument originate from interested parties, and absolute effect sizes, confidence intervals, and number-needed-to-treat are not provided here. Can I use this with my patients? Not yet as a clinical tool. Nothing here changes prescribing, and AI biopsy scoring is not ready for the bedside; it is a trial-efficiency and reproducibility instrument that still requires independent, prospective validation against hard outcomes. I would like to see this continuous-score antifibrotic signal confirmed in an adequately powered phase III dataset with non-vendor adjudication before it informs care.
References
Ratziu V, Francque S, Behling CA, Cejvanovic V, Cortez-Pinto H, Iyer JS, et al. Artificial intelligence scoring of liver biopsies in a phase II trial of semaglutide in nonalcoholic steatohepatitis. Hepatology. 2024;80(1):173-185. doi:10.1097/HEP.0000000000000723
