A major challenge for advancing research, the large-scale reuse of health data needs to be structured and easy to access for the players in the ecosystem, while guaranteeing respect for patients' rights. In this context, Institut Curie has joined forces with Arkhn and Owkin, alongside leading cancerology institutions, in the OncoLab project to structure health data and thus facilitate research and innovation in oncology.

The OncoLab project aims to make oncology data from healthcare institutions accessible to all ecosystem stakeholders for research and innovation purposes. The multi-modal data stored in the institutions will be standardized, structured, and studied in a synchronized manner.

The Oncolab project is a great opportunity for Institut Curie to share its expertise in data standardization, which was initiated several years ago, within a leading consortium, and to amplify this dynamic at the national level before looking to the international level. Data-driven cancer research will necessarily involve dialogue between decentralized databases, placing access to harmonized technical bases and sharing standards at the heart of the issues.

declares Amaury Martin, PhD, deputy director of Institut Curie and head of Carnot Curie Cancer.

A public-private consortium of expert centers to facilitate oncology research and innovation

The OncoLab project aims to deploy data architectures for research and innovation in oncology at four leading institutions in the field: Institut Curie, Institut Bergonié, IUCT-Oncopole and Toulouse University Hospital. The goal is to respond to the various current challenges of managing and accessing health data, by providing a common and standardized technical foundation for health institutions and their partners. These data architectures will be developed by Arkhn, the project leader, and studied in a decentralized manner thanks to Owkin's expertise in data science and artificial intelligence, in order to preserve the confidentiality of the data and the sovereignty of the healthcare institutions. In total, the project has a budget of nearly 11 million euros.

Towards standardized data warehouses that guarantee the sovereignty of institutions to facilitate access to health data

The data architectures deployed by OncoLab will integrate all types of oncology data (reports, examinations, imaging, biology, etc.) for all types of cancers, collected from hundreds of thousands of patients monitored by healthcare institutions. A secure platform will considerably simplify technical access to data for each center wishing to make it available for research and innovation projects, thus drastically reducing their costs and implementation times. This direct access to standardized data will open up new opportunities for the healthcare institutions that originate the data and their partners. This will enable healthcare institutions to maintain full control over their patient data, including modern technologies such as Federated Learning, which allows research projects to be conducted without data being extracted from the institutions.

Training advanced Artificial Intelligence algorithms to better structure and exploit data for cancer research

The OncoLab project will improve the Automatic Language Processing (NLP) methods needed to automatically analyze tens of thousands of medical documents (prescriptions, hospitalization reports, liaison letters, etc.) and extract relevant information for research. Developed in collaboration by the project partners, these methods are based on state-of-the-art Artificial Intelligence models developed by Inria's ALMAnaCH project team.

