Penalized likelihood optimization for censored missing value imputation in proteomics

Lucas Etourneau; Laura Fancello; Samuel Wieczorek; Nelle Varoquaux; Thomas Burger

doi:10.1093/biostatistics/kxaf006

Article Dans Une Revue Biostatistics Année : 2025

Penalized likelihood optimization for censored missing value imputation in proteomics

(1) , , (1) , , (1)

Lucas Etourneau

Fonction : Auteur

EDyP - Etude de la dynamique des protéomes

Laura Fancello

Fonction : Auteur

Samuel Wieczorek

Fonction : Auteur

EDyP - Etude de la dynamique des protéomes

Nelle Varoquaux

Fonction : Auteur
PersonId : 15044
IdHAL : nelle-varoquaux
ORCID : 0000-0002-8748-6546
IdRef : 192799991

Thomas Burger

Fonction : Auteur
PersonId : 753188
IdHAL : thomas-burger
ORCID : 0000-0003-3539-3564
IdRef : 126286469

EDyP - Etude de la dynamique des protéomes

Résumé

Label-free bottom-up proteomics using mass spectrometry and liquid chromatography has long been established as one of the most popular high-throughput analysis workflows for proteome characterization. However, it produces data hindered by complex and heterogeneous missing values, which imputation has long remained problematic. To cope with this, we introduce Pirat, an algorithm that harnesses this challenge using an original likelihood maximization strategy. Notably, it models the instrument limit by learning a global censoring mechanism from the data available. Moreover, it estimates the covariance matrix between enzymatic cleavage products (ie peptides or precursor ions), while offering a natural way to integrate complementary transcriptomic information when multi-omic assays are available. Our benchmarking on several datasets covering a variety of experimental designs (number of samples, acquisition mode, missingness patterns, etc.) and using a variety of metrics (differential analysis ground truth or imputation errors) shows that Pirat outperforms all pre-existing imputation methods. Beyond the interest of Pirat as an imputation tool, these results pinpoint the need for a paradigm change in proteomics imputation, as most pre-existing strategies could be boosted by incorporating similar models to account for the instrument censorship or for the correlation structures, either grounded to the analytical pipeline or arising from a multi-omic approach.

Domaines

Statistiques [math.ST]

Fichier principal

2023.11.09.566355v2.full.pdf (12.08 Mo)

Origine	Fichiers produits par l'(les) auteur(s)
Licence	Autorisation HAL

Connectez-vous pour contacter le contributeur

https://hal.science/hal-05002295

Soumis le : dimanche 23 mars 2025-16:03:04

Dernière modification le : samedi 7 février 2026-05:30:43

Dates et versions

hal-05002295 , version 1 (23-03-2025)

Licence

Autorisation HAL

Identifiants

HAL Id : hal-05002295 , version 1
DOI : 10.1093/biostatistics/kxaf006

Citer

Lucas Etourneau, Laura Fancello, Samuel Wieczorek, Nelle Varoquaux, Thomas Burger. Penalized likelihood optimization for censored missing value imputation in proteomics. Biostatistics, 2025, 26 (1), ⟨10.1093/biostatistics/kxaf006⟩. ⟨hal-05002295⟩

Penalized likelihood optimization for censored missing value imputation in proteomics

Résumé

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Collections

Altmetric

Partager