The phrase “data is the new oil” is often heard today, referring to how valuable data is to the global economy, as almost all industries today operate on and generate huge quantities of data. However, just like oil, data isn’t useful in its raw, unprocessed form. Thus, organizations must be able to properly source, curate, and analyze data in order to unlock all its benefits.
The healthcare industry is an area where demand for data is at an unprecedented level. In 2020, the U.S. healthcare system generated around 2,314 exabytes of data, or roughly 30% of the world’s data by volume. By 2025, the compound annual growth rate of healthcare data will reach 36%, outstripping other major industries such as manufacturing, financial services, and media and entertainment.
According to Advanced Data Sciences (ADS), a data research organization focusing on the healthcare industry, pharmaceutical and biotechnology companies have increased their focus on nonclinical trial data, or real-world data (RWD), to strengthen their operations. This is a type of data that is not generated from randomized clinical trials, but rather from electronic health records, insurance billing and claims, product and disease registries, and patient-generated data gathered by personal devices and health applications. These sources of data can provide insights that cannot be generated by clinical trials, as well as create economic benefits. A new, increasingly important use for data is for training AI models, including the development of game-changing Generative AI solutions that can generate content, automate routine knowledge work, and help develop new concepts.
“ADS helps organizations realize the deeper potential that healthcare data offers for improving patient care effectiveness while managing costs and improving operations,” says Doug Foster, Partner at Advanced Data Sciences. “Our principals have more than 60 years of combined experience sourcing, curating, and processing healthcare data. We have built configurable data pipelines, unified and curated tens of millions of patient records, and structured complex public-private partnerships for healthcare data sharing and commercialization. We have developed LLM-based Generative AI solutions designed specifically for healthcare data. Making sure that healthcare data is available and optimized for improving patient care is our passion and our focus.”
In 2020, a leading global management consulting firm estimated that an average large pharmaceutical company can save $300 million by adopting real-world evidence (RWE) analytics across its whole value chain. These savings have prompted most major drug companies to establish departments focused on the use of healthcare data across multiple diseases.
Furthermore, the historical barriers that created healthcare data silos, such as limited incentives to share, security and privacy concerns, and technical inconsistencies, have largely been reduced either through federal legislation or technological advances. This has made healthcare data more accessible than it has ever been before.
With multiple RWD sources now available, ADS says healthcare companies must choose the right data sources to avoid wasting time and money. It classifies these sources into two major types, primary stakeholders, or entities involved with the delivery of care, and secondary stakeholders, or commercial entities.
Primary Stakeholder Data
According to ADS, primary stakeholder data sources are the local data stores for those that create healthcare data including providers, labs, insurance companies, and patients. The data storage systems they use are Electronic Health Records (EHRs) for clinical data, picture archiving and communication systems (PACS) for images, lab information systems for lab data, claims databases for claims, and others. Sourcing data from primary stakeholder data sources is therefore good for highly customized data pulls. And, since it connects to the same systems that providers use, bidirectional integrations to primary stakeholder sources also offer the opportunity to integrate into workflows and audit source files. These types of connections are not uncommon. For example, one of the most prestigious medical institutions in the US and globally had licensed access to its de-identified patient data to 16 companies as of 2020.
However, ADS says that working with primary stakeholder sources can be difficult and time-consuming, as there is still some hesitancy in sharing data due to existing security, privacy, and technical requirements and the potential for liability. Various institutions, clinics, and other types of care settings frequently adhere to different data standards, diluting the benefit of any single standard. ADS also estimates that the vast majority of primary stakeholder data are unstructured, creating a barrier to syntactic and semantic interoperability.
Secondary Stakeholder Data
Secondary stakeholder data sources are third parties that aggregate data, or the permissions for the data, and license, or grant, access to a consumer. These sources fall into three categories: patient registries, data vendors, and data marketplaces. According to ADS, registries are good for retrospective research on specific patient populations or therapeutic areas. In general, these registries are used only for research and not for commercial purposes, with few exceptions. A significant disadvantage with registries is that the consumer has very little control over the content and data curation strategies.
Data vendors are an efficient data source for retrospective analyses of reasonably well-normalized datasets. Their main advantage is that they bring together many disparate data sources into a common database and typically have something for everyone. The drawback is high costs and the lack of workflow integrations with the data sources. Data marketplaces, which are essentially brokers of data, have the advantage of speed, allowing analytics to start almost immediately. But this speed may come with deficiencies in detail, as most datasets from marketplaces are on a ‘what-you-see-is-what-you-get’ basis.
According to ADS, these various sources of real-world data available to healthcare companies each have their advantages and drawbacks, and having an expert data research organization in their corner will help these companies source, curate, and analyze healthcare data in the most effective way possible, helping them to reap the benefits faster and better. ADS conducts data diligence to determine whether the dataset is relevant and of sufficient quality for the client’s purposes. ADS then brings a detailed understanding of the data context and broad knowledge of customer requirements, applying the right AI and data science tools and methods for curating clients’ data to meet specific objectives.