Data Enrichment

Data Enrichment and Curation: Eliminating Barriers in Access to Robust Data

Christian Evans
Thu 12 November 2020

The life science industry today has a significant opportunity at hand with the surge in data dependency. With the legislation of the 21st Century Cures Act, data has become the fuel for accelerating the drug development and innovation processes. The Cures Act defines the real-world evidence (RWE) as the clinical evidence about the usage and potential benefits or risks of a medical product derived from analysis of the real-world data (RWD). This facilitates an accurate view of the patient populations comprising larger socioeconomic, racial, age, and gender diversity, mostly un-represented by the RCTs.

Although the industry is rapidly adopting RWD, there are challenges in harnessing the data’s full potential. The problem does not lie in collecting the data, but in collecting the ‘right’ data in a robust, standardized form. The need for data enrichment and curation sits right at the cusp of the transition between real-world data to real-world evidence with the need to extract the information trapped in unorganized sets of data.

Data Curation refers to the process of improving the quality of data collated from disparate sources of information, usually involving alteration of cases and measures that don’t meet a quality standard. For instance, removing cases or observations for missing values, missing inclusion criteria, negative annotations about quality, etc. Data Enrichment, on the other hand, is about adding the required information to make the data asset more valuable. It enables the businesses to draw detailed conclusions from the pool of unstructured, raw data.

Struggle in Attaining the ‘Right’ Data

While RWD offers significant opportunities for improving healthcare research, innovation, and decision‐making, it is faced with challenges to leveraging its full potential. These challenges range from technical, to ethical, to analytical data utility.

  • Data Quality: Since most of RWD is not collected with the intention of research, the data collection is episodic, reactive, and, at best, offers a partial picture of the context. As a result, RWD is scattered and requires statistically rigorous and valid methods to clean the data and correct its inconsistencies. Careful data curation, using both structured and unstructured data, is of great essence to avoid missing out on any data point.
  • Interoperability: The standards for the development and maintenance of data assets are not yet in sync with the rapid evolution of RWD. Lack of interoperability between the real‐world databases creates hurdles for combined analysis and collaboration between data holders. The lack of consolidated or centralized data storage may lead to difficulties in analyzing data across different data sets.
  • Analytical Platforms: With the rise in adoption and use of EHRs, extracting meaningful data from EHRs in an accurate and efficient manner remains a significant challenge. This happens as a considerable portion of high‐value clinical information in EHRs is often stored in unstructured, free‐text clinical documents that are inaccessible to algorithms and require layers of pre-processing.

Business Benefits of Data Enrichment and Curation

  • Access to simplified data
  • Improved data accuracy
  • Reduced time consumption in data management
  • Integration of data from disparate sources
  • Enhanced data security

The Road Ahead

Data Enrichment and curation work towards developing an integrated layer of standardized data that is easily consumable by any application or workflow within a life sciences organization. It empowers the decision-makers to make precise decisions based on facts, trends, and real-world studies while delivering personalized solutions to meet patient needs and exhibiting the full value of their pharmaceutical portfolios.

