Defining ground truth for prostate segmentation of transrectal ultrasound images: Inter‐ and intra‐observer variability of manual versus semi‐automatic methods

Abstract Background Accurate prostate segmentation in transrectal ultrasound (TRUS) imaging is essential for diagnosis, treatment planning, and developing artificial intelligence (AI) algorithms. Although manual segmentation is often recommended as the ground truth for AI training, it is time‐consuming, prone to inter‐ and intra‐observer variability, and rarely used in everyday clinical practice. Semi‐automatic methods provide a faster alternative but lack thorough multi‐operator evaluations. Understanding variability in segmentation methods is crucial to defining a reliable reference standard for future AI training. Purpose To investigate the inter‐individual variability in manual and semi‐automatic prostate contour segmentation on 3D TRUS images and to compare both approaches to determine the most consistent method that could serve as a reference standard for future AI model development. Methods This study is a methodological investigation and not an AI study. Four urology experts independently performed manual and semi‐automatic segmentation on 100 prostate 3D TRUS exams obtained from patients undergoing fusion prostate biopsy. Inter‐individual and intra‐individual variability for manual segmentation was assessed using the Average Surface Distance (ASD) between manually placed points and a reference mesh. Two methods were used to create the reference prostate mesh after manual point positioning: a statistical shape model (manual_SSM) and a deformable model (manual_soft‐SSM). Semi‐automatic segmentations were evaluated using ASD, Dice similarity coefficient, and Hausdorff distance. A Simultaneous Truth and Performance Level Estimation (STAPLE) like consensus method was applied to assess variability across experts in semi‐automatic segmentation. Statistical comparisons used Wilcoxon tests, and effect sizes were calculated using Cohen's d. Bonferroni correction was applied for multiple comparisons. A significance level of p < 0.05 (adjusted as needed) was used. Results Manual segmentation inter‐individual variability was higher with the manual_SSM method [ASD = 2.6 mm (Inter Quartile Range (IQR) 2.3–3.0)] compared to the manual_soft‐SSM [ASD = 1.5 mm (IQR 1.2–1.8), P < 0.001]. Intra‐individual variability also showed lower ASD values with manual_soft‐SSM compared to manual_SSM, [(1.0 (0.8‐1.1) versus 2.2 (1.9‐2.6), p < 0.001], respectively. For semi‐automatic segmentation, inter‐individual variability yielded an ASD of 1.4 mm (IQR 1.1–1.9), Dice of 0.90 (IQR 0.88–0.92), and Hausdorff distance of 5.7 mm (IQR 4.47–7.36). Manual and semi‐automatic segmentation comparisons demonstrated an ASD of 1.43 mm (IQR 1.20–1.90). Conclusions The semi‐automatic segmentation method evaluated in this study demonstrated comparable accuracy to manual segmentation while reducing inter‐ and intra‐individual variability. These findings suggest that the tested semi‐automatic approach can serve as a reliable reference standard for AI training in prostate segmentation.

Mots clés

Domaines

Fichier principal

118227_2_merged_1752128452-last.pdf (6.41 Mo)

Origine	Fichiers produits par l'(les) auteur(s)

Jocelyne Troccaz : Connectez-vous pour contacter le contributeur

https://hal.science/hal-05261332

Soumis le : lundi 15 septembre 2025-14:01:03

Dernière modification le : jeudi 18 septembre 2025-03:19:29

Dates et versions

hal-05261332 , version 1 (15-09-2025)

Licence

Paternité

Identifiants

HAL Id : hal-05261332 , version 1
DOI : 10.1002/mp.18025

Citer

Louis Lenfant, Clément Beitone, Jocelyne Troccaz, Gaelle Fiard, Bernard Malavaud, et al.. Defining ground truth for prostate segmentation of transrectal ultrasound images: Inter‐ and intra‐observer variability of manual versus semi‐automatic methods. Medical Physics, 2025, 52 (8), ⟨10.1002/mp.18025⟩. ⟨hal-05261332⟩