« Health insurance data analysis for occupational health surveillance of French agricultural workers »
Place : Amphithéâtre of Grenoble Institut des Neurosciences (GIN), 31 Chemin Fortuné Ferrini, 38700 La Tronche
Thesis Supervision :
- Pr Vincent Bonneterre, Professeur des Universités - Praticien Hospitalier, Université Grenoble Alpes, Director
- Pr Olivier François, Professeur des Universités, Grenoble INP, Co-director
- Dr Pierre LEBAILLY, Maître de conférences, Centre régional de lutte contre le cancer François Baclesse, INSERM - UMR 1086 «ANTICIPE» (Reporter)
- Pr Marie ZINS, Professeur des universités - praticien hospitalier, Hôpital Paul Brousse, INSERM - UMS 011 « Cohortes en population » (Reporter)
- Dr Florence FORBES, Directrice de recherche, INRIA Grenoble Rhône Alpes (Examiner)
- Dr Rémy SLAMA, Directeur de recherche, Inserm U1209 / CNRS UMR 5309, IAB, Université Grenoble Alpes (President)
Introduction: Health surveillance and vigilance (identification of new risks in particular) represent a major challenge in the field of occupational health. In addition to classical epidemiological studies, the systematic analysis, without a priori, of data collected routinely could be an asset for the early detection of diseases related to work. In this context, the social protection scheme dedicated to French agricultural workers, known as “Mutualité Sociale Agricole” (MSA), wanted to develop its vigilance activity by exploiting its medico-administrative data, used for the reimbursement of health expenditures. In partnership with the French Agency for Food, Environmental and Occupational Health & Safety (ANSES), a data mining project has been set up in which this thesis work fits. The aim of the thesis is, more precisely, to test, without any prior assumptions, the existence of associations between agricultural activities and pathologies recognized as long-term disease (LTD).
Method: The work presented was conducted on self-employed population (heads of farms or enterprises) affiliated to the MSA. It relied on the one hand on a contributors’ database which includes, at the individual level, information about occupational activities, demographic and socio-economic characteristics, and on the other hand, on a medico-administrative database with declarations of long-term diseases (LTD) and associated information like ICD-10 diseases. Thanks to the agreement of the French Data Protection Authority (CNIL), a unique identifier was created so that, for the first time, these administrative and medico-administrative data could be merged and restructured to allow the application of models. Logistic regression models were performed, adapting variable selection for each LTD and using cross-validation to limit over-fitting of models. Several methods have been tested to better take into account potential confounders. These different models were evaluated via robustness measures and applied at two-level of precision for pathology (LTD and ICD-10). The statistical associations between each combination of occupational activity and LTD were characterized by p-values, corrected for multiple tests, and odds ratio.
Results: Data management allowed us to consider a population of 899 212 self-employed affiliated between 2006 and 2016. Among them, it was possible to identify 100 706 individuals with at least one declaration of LTD over the observation period. The applied methodology revealed 54 statistically significant associations between an occupational activity and an LTD, making it possible to capture already known or suspected health determinants but also to generate interesting hypotheses. After adjusting for confounding factors, the agricultural sectors most associated with LTD, among the self-employed, are viticulture, timber exploitations, landscaping and gardening or reforestation.
Discussion: This thesis provides a first demonstration of the feasibility and relevance of the systematic analysis of data collected routinely for insurance purposes, concerning the overall agricultural population, to search for health risks associated with occupational activities. The statistical "signals" thus highlighted will then be investigated by a group of experts from different scientific and occupational fields. Other models should be tested like survival models. This approach may thus be a valuable tool contributing to the health surveillance system dedicated to agricultural workers.
Medico-administrative databases, Health insurance, Epidemiologic surveillance, Data mining, Occupational risks, Agricultural workers