Date de l'évènement :
Lieu de l'évènement :
Organisé par :
Background: In the era of big-data and artificial intelligence, the management and knowledge of data-quality remains challenging in medicine. Data quality has generated a large body of work in many domains: computer science, biology, medicine, and researches cover the numerous dimension of quality from data-missingness to correctness, and so forth. In this work, we studied the impact of internal and external factors on two types of data: laboratory data and the use of ICD10 codes for billing purposes, and more particularly on changes occurring over time. This dimension, referred as timeliness (as defined by Saez et al.), can have a strong impact on retrospective studies or studies considering data over large periods of time.
Methods: We explored the timeliness on two types of data. (1) We observed the distributions of laboratory data over a period of 17 years in a monocentric clinical data warehouse. We identified patterns of evolution including breakpoints, discretization and trends, and proposed methods to characterize these patterns. (2) We then considered a nationwide dataset: the diagnostic codes of hospitalization discharge summary (PMSI). We performed temporal clustering on relative frequencies for ICD codes to identify profiles of evolution.
Results: In both applications, we identified intrinsic and extrinsic factors influencing the timeliness of the data. We searched for possible explanation of evolution. We were able to identify extrinsic factors responsible for some evolution. Examples of extrinsic factors include the renewal of a laboratory automaton, or the implementation of a new health policy over a territory. It appeared that bulk analysis of quality issues allowed for the detection of bias that could not appear of small samples.
Conclusion: Data quality is capital to ensure the meaningfulness of the results of studies, especially in the era of big-data and longitudinal data warehouse. Data quality awareness should a be a prerequisite to the secondary use of data and is as much a responsibility of the creators of clinical data warehouses as of the investigators.