The Data Dilemma: How can Life Science Progress with De-identified Patient Information?

Christian Evans

Wed 16 Sep 2020

Share this blog:

There aren't many industries where data management is as complex as in life science and healthcare. There are numerous sources of data that generate an astonishing amount of information. It has been predicted that real-world data will experience a compound annual growth rate (CAGR) of 15 percent between 2019 to 2024. It sounds like a promising development as robust databases are generally a precursor to rich data analytics, potentially driving significant improvements in care delivery and health outcomes. However, there are some unique data challenges in the healthcare ecosystem that need to be addressed to build a useable database in the first place.

The data challenge of the life science industry

For clinical trials, organizations have to accumulate individuals’ healthcare data. The data is collected when the patients receive care, pay for the clinical services, and take health surveys. Once the data is collected, it is organized in a fashion where the records of individuals reflecting a particular characteristic are grouped, or data from disparate sources corresponding to a specific person are linked. From a research perspective, this is an ideal situation, having a complete view of the patients.

However, it’s important to remember the data was collected from the source under promises and legal obligations of privacy, confidentiality and security. Federal regulations under the HIPAA Privacy Rule (Health Insurance Portability and Accountability Act) prohibit the exposure of an individual's health data. It can have repercussions on their health insurance plan eligibility and employment opportunities. Besides, it may discourage them from seeking clinical care at the right time or prevent them from participating in the trials altogether.

Addressing the privacy, confidentiality and security concerns

The solution to the privacy and security concerns of clinical trials is simply protecting the participants' identities - by 'de-identifying' or 'anonymizing' the data in a way that it doesn't include any personal information of the participants.

However, it is essential to ensure that the data can't be traced back to the corresponding participants. In 2013, two separate genomic studies by a Harvard researcher and few researchers at Whitehead Institute used publicly accessible online resources to re-identify the participants. The Harvard study re-identified 241 out of 1,130 participants and the Whitehead study, nearly 50 participants.

Owing to such incidences U.S. Department of Health and Human Services (HHS) laid some ground rules to protect the privacy of participants. According to HIPAA, there are only two de-identification methods that are acceptable, “a formal determination by a qualified expert and the removal of specified individual identifiers as well as absence of actual knowledge by the covered entity that the remaining information could be used alone or in combination with other information to identify the individual”.

The Problem

Using participant data, devoid of personal information is well within privacy laws. However, when the data is no longer traceable to the corresponding participant, linking their data from electronic health records, billings, surveys and other sources become extremely challenging. To this day, many life science organizations still manually match and categorize all the patient data together.

Gaining a complete patient view with a FHIR-enabled Data Activation Platform

Life science organizations need to overcome their data challenges to drive innovation. To achieve that, they must activate the data and compress it into a single longitudinal record. Innovaccer's FHIR-enabled Data Activation Platform has a schema that links and harmonizes de-identified data from different sources, integrates it, normalizes it, runs quality checks and maps it to a common data model to produce a unified patient record.

In addition to providing a 360-degree patient view, the platform offers actionable insights. Accessing the right information at the right time can allow life science organizations to take advantage of changing market conditions and develop and launch new products successfully. The life science industry is fiercely competitive, and the early-adopter advantage can be quite beneficial.

The Road Ahead

Siloed data is obstructing innovation and efficiency in life sciences. An array of sources and systems need to be connected in order to foster collaboration and thereby improve clinical outcomes and patient experience. Like other industries, Big Data has the potential to drive considerable progress in life science. However, due to stringent regulations and many data sources, most organizations are still determining how to leverage Big Data and extract valuable insights from it.

Today, life science is at the cusp of a breakthrough. A new data curation and enrichment ecosystem is materializing that will demonstrate the full value of their pharmaceutical portfolios and deliver personalized solutions to meet patient demands with faster iteration cycles. Furthermore, it will allow organizations to ensure compliance with evolving regulatory requirements that promote and incentivize the adoption of application programming interfaces (APIs) and trust frameworks.