In January, Future4care launched the January4data campaign to highlight the essential role played by healthcare data in the value chain. To this end, we solicited the expertise of members of our ecosystem, which has given rise to these notes. The genesis of healthcare data, technological developments and uses, ethics and applications in the hospital sector are just some of the topics covered in this series of articles.
What is your perspective on the use of healthcare data?
For over five decades, I've witnessed and played an active role in major developments in the field of healthcare data. My work has always been driven by the conviction that quality data are essential for decision-making and research in general, and particularly for research into rare diseases, a field that Orphanet has been instrumental in helping to develop.
When I began my career in the 1970s, the landscape of data collection and organization was sparse and unstructured, limited to research survey data due to the lack of accessible technology for digitizing real-life data. Moreover, there was no standardization of nomenclatures, which was a major obstacle to collaborative research.
The emergence of medical informatics in the 1980s enabled progress to be made. Medical records are coded using the ICD, the WHO's international classification of diseases. The ICD listed around a hundred diseases in the XIXᵉ century, several thousand today. However, genetic diseases and rare non-genetic diseases were largely absent from this classification. The ICD 11 currently in use contains only 450 of these, and Orphanet has helped to introduce another 4,500 into the ICD 12 adopted by the WHO, but unfortunately not yet in use, as changing versions is costly for public players.
At the same time, in 1966, Victor McKusick proposed the Mendelian Inheritance in Man (OMIM) catalog of human genes and genetic diseases, a nomenclature widely used in research, but which does not enable diseases to be properly coded, as some diseases are caused by multiple genes, and some genes can be the cause of several diseases.
Drawing on my long experience in genetic epidemiology and medical genetics, I founded Orphanet in 1997, a relational, exhaustive and polyhierarchical database that considers different perspectives and reconciles the various existing nomenclatures. Today, Orphanet lists more than 11,000 rare diseases and offers classifications adapted to the specific needs of research, care, public health and the pharmaceutical industry. Orpha codes enable diseases to be coded either zoomed out (broad categories useful for health system management) or zoomed in (molecular level essential for biomedical R&D).
What are the challenges facing the use of healthcare data?
It took a long time for medical informatics to be used, due to a lack of awareness among doctors and decision-makers, but above all because of the cumbersome nature of the equipment, the cost of the technologies and the lack of user-friendly access to the data.
The data collections that were built up presented problems of data quality and interoperability. The databases were disparate and poorly structured, limiting their usefulness. At the same time, the constraints imposed by national and European regulations, although justified to protect nominative personal data, have complicated the use of data for research, even though they are always pseudonymized. Is it justified to protect such data to the extreme, when the risk of re-identification is extremely low, and the potential damage caused low? This position, prevalent in Europe but taken to extremes in France, slows down research, or even hinders it, even though these are desirable projects for public health, improved care and innovation. This is detrimental to clinical research in general, and industrial clinical research in particular.
An emblematic example concerns genetic data, often considered as indirectly identifiable, which prevents their systematic digitization in medical files, even though this is key information for both care and research. Doctors circumvent these restrictions by scanning the paper results they need for care, an inefficient practice that compromises both the usefulness and security of the data.
How do you see the future of healthcare data?
Today, we have the technological tools and infrastructures to collect and analyze quality data, thanks to the construction of healthcare warehouses and the existence of cohorts, registers and research databases. However, this potential remains under-exploited due to a lack of regulatory harmonization, sufficient investment, operator training and psychological resistance. Primary health data must be protected, but pseudonymized secondary data must be reasonably released. There is no known experience of collective or individual harm resulting from the use of pseudonymized data. On the other hand, the benefits of using such data can be significant for the community (e.g. relevance of care, prevention of health scandals, evaluation of innovations) and for the individual (e.g. development of new treatments), as well as having an impact on the individuals themselves. The benefit/risk balance is therefore very much in favor of the benefits. What remains to be done is to remove the obstacles to moving towards a one-stop shop. We need to stop communicating only about risks and not benefits, even though well-informed citizens are in favor of using their data for research. An audit of the obstacles to research in the current system would be useful to convince the reluctant.
The European health data space represents a major opportunity for us in France, as we have already structured the recommended national node. This is the Health Data Hub.
Together, we can build a future where data is used to save lives and improve public health. There is widespread public support for this approach.