All Posts By

nashvillebios

US vs European Healthcare

The Healthcare Divide: Privatized US vs Public European Models

By | Healthcare

Key Takeaways:

  • The US has a primarily private, market-based healthcare system, while most European countries have universal, tax-funded public healthcare systems.
  • Healthcare spending per capita is substantially higher in the US compared to Europe, yet the US lags behind on metrics like life expectancy and infant mortality.
  • European healthcare systems aim to provide comprehensive coverage to all citizens, while millions remain uninsured in the US despite the Affordable Care Act.
  • Wait times for non-urgent care tend to be longer in European public systems, though the US has longer wait times for emergency room visits.
  • The US healthcare system is dominated by for-profit providers and private insurance companies, leading to higher costs but more cutting-edge treatments.

The Opposing Models

For decades, the healthcare systems of the United States and Europe have taken vastly different approaches, sparking heated debate over which model delivers better care at a sustainable cost. While the US relies primarily on private insurance and market competition, most European nations have adopted tax-funded universal healthcare as a basic right for all citizens.

Aspect United States Europe
System Type Private, market-based healthcare system. Universal, tax-funded public healthcare systems.
Healthcare Spending Substantially higher per capita compared to Europe. 2019: $11,072 per person. Lower per capita compared to the US. Average across wealthy European nations in 2019: $5,505.
Health Outcomes Lags behind Europe on metrics like life expectancy and infant mortality. Better life expectancy and lower infant mortality rates compared to the US.
Coverage Millions remain uninsured despite the Affordable Care Act. Approximately 28 million non-elderly Americans are uninsured. Comprehensive coverage for all citizens. Nearly universal coverage with little to no out-of-pocket costs for services.
Provider System Dominated by for-profit providers and private insurance companies. Predominantly public-funded healthcare systems, with some countries incorporating a mix of private and public insurance schemes.
Wait Times Longer wait times for emergency room visits. More efficient access to specialized treatments and newly approved therapies. Longer wait times for non-urgent care in public systems due to budget and capacity management. Generally quicker access to emergency services.
Innovation and Costs High healthcare spending contributes to cutting-edge medical innovations. Costs driven up by profit motives, administrative complexities, and fee-for-service model. Emphasizes cost controls, standardized fee schedules, and integrated care delivery. While innovative, may have slower access to some new treatments due to budget considerations.
Future Challenges Aging population, rising chronic disease burden, and debates over the extent of government involvement. Calls for “Medicare for All” reflect ongoing debates. Similar challenges with aging populations and chronic diseases. Experiments with public-private hybrid schemes and value-based reimbursements to maintain sustainability.
Philosophical Approach Healthcare often viewed as a market commodity, with ongoing debates about its status as a human right. Healthcare generally considered a basic right for all citizens, funded through taxation.

 

US: Private Markets vs. Europe: Public Coverage

At its core, the American healthcare system is a private, decentralized collection of for-profit hospitals, clinics, insurance companies, and other providers incentivized to maximize revenues. Employers typically offer private insurance plans, and government programs like Medicare and Medicaid cover the elderly and low-income populations. However, an estimated 28 million non-elderly Americans remain uninsured despite the Affordable Care Act’s coverage expansions.

In stark contrast, European healthcare systems are overwhelmingly publicly-funded through taxation to ensure universal coverage for all legal residents. Countries like the UK, Spain, and Sweden operate single-payer national health services, while others like Germany, France, and the Netherlands have multi-payer universal systems that incorporate a mix of private and public insurance schemes.

Higher Costs, Lagging Outcomes in the US

One indisputable fact is that the US healthcare system is exorbitantly more expensive per capita than any European model, yet its health outcomes lag behind on metrics like life expectancy and infant mortality. In 2019, US healthcare spending reached $11,072 per person – over double the average of $5,505 across wealthy European nations.

Many experts attribute the US system’s high costs to profit-driven incentives, administrative complexities associated with private insurers, and a fee-for-service payment structure that encourages more treatment over quality outcomes. European systems emphasize cost controls, standardized fee schedules, and integrated care delivery as more efficient alternatives.

The Trade-Off: Innovation vs. Wait Times

However, America’s high healthcare spending does contribute to cutting-edge medical innovations and reduced wait times for specialized treatments like newly approved cancer therapies. European systems often face longer delays for non-urgent services as public health authorities aim to manage finite budgets and capacity. Although the US has lengthier emergency room wait times on average than European countries.

Universal Coverage vs. Uninsured Millions

Access to healthcare also differs significantly between the US and Europe. The Affordable Care Act brought the US’s uninsured rate below 10% for the first time, but millions still lack coverage and face potentially bankrupting medical bills. European universal systems cover all citizens cradle-to-grave – services like preventative screenings, hospital stays, specialist visits, surgeries, prescribed medications, and prenatal care are fully covered with little or no out-of-pocket costs beyond modest copays.

Challenges for the Future

Both systems face mounting challenges from aging populations and rising costs associated with chronic diseases and new technologies. European nations are experimenting with public-private hybrid schemes and value-based reimbursements, while American policymakers continue debating the role of government involvement amidst calls for “Medicare for All.”

Differing Philosophies

Ultimately, the US and Europe represent vastly different philosophical and economic approaches to the simple question: Should healthcare be considered a human right or a market commodity? The answer, and the path forward, remains highly contentious for two world powers with no easy solutions in sight.

 

Sources:

  • “U.S. Health Care Resources.” American Hospital Association, 2021.
  • “Health Insurance Coverage in the United States.” Centers for Disease Control and Prevention, 2022.
  • “Health Systems Characteristics.” OECD Health Statistics, 2021.
  • “Health Care Systems in the EU.” European Union, 2021.
  • “How Does the Quality of the U.S. Health-Care System Compare to Other Countries?” Peterson-KFF Health System Tracker, 2022.
  • “Health Expenditure Per Capita.” OECD Health Statistics, 2022.
  • Sawyer, B., et al. “Why Do Health Care Costs Keep Rising?” Peterson-KFF, 2022.
  • “The United States Leads Rising Availability of Cancer Medicines.” IQVIA, 2021.
  • “Waiting Times for Health Services Next?” EuroHealth, 2020.
  • “U.S. Emergency Department Visit Data Visualizations.” CDC, 2018.
  • “Universal Health Coverage.” World Health Organization, 2021.
  • “Value-Based Healthcare in Europe.” EIT Health, 2022.
unstructured data

From Doctors' Notes to New Therapies: The Promise of Unstructured Data

By | Health Data Types, Healthcare Data

Key Takeaways:

  • Unstructured data from sources like clinical notes can provide valuable real-world insights to augment structured clinical trial data in drug development.
  • Natural language processing (NLP) enables mining of unstructured text data for information on drug efficacy, side effects, patient behaviors, and more.
  • Challenges include data privacy, integration across sources, and developing reliable NLP models to extract accurate insights.
  • Proper governance and cross-functional collaboration is needed to safely and effectively leverage unstructured data.
  • Responsible use of unstructured notes has the potential to accelerate drug development, improve safety monitoring, and support value-based care models.

The Untapped Potential of Unstructured Data

In the meticulous world of drug development, every data point is precious. Clinical trials generate a wealth of rigorously structured efficacy and safety data. During routine clinical care, computerized physician order entry systems and electronic health records capture structured data that can enhance drug efficacy and safety monitoring. However, an underutilized treasure trove of real-world information exists in the unstructured text of clinical notes, hospital records, and other loosely formatted sources gathered as part of standard medical practice.

Unleashing Insights with Natural Language Processing

Historically, this unstructured data has been difficult to integrate and analyze alongside its structured counterparts. This is partly due to variability in documentation practices among different healthcare providers. Additionally, the extraction of relevant data has traditionally relied on manual review and interpretation by clinically trained personnel. But major pharmaceutical companies are now investing heavily in natural language processing (NLP) to mine these unstructured sources for insights.

While NLP does not eliminate the need for human involvement, it can significantly streamline the process. NLP serves as a tool that works in conjunction with human interaction, combining the efficiency of intelligent automation with the ability to incorporate human feedback. This combination allows for more effective extraction of insights from unstructured data, which ultimately aids in accelerating research, optimizing clinical trials, and enhancing drug safety monitoring.

Applications: From Safety Signals to Patient Experiences

So when and how can unstructured data provide value? One key application is using NLP models trained on doctors’ notes to identify potential safety signals that may not surface until after a drug is approved and prescribed at scale. These real-world signals can prompt further investigations and narrow the “surrogate to reality” gap between clinical trials and clinical practice.

Unstructured data has also shown promise in two critical areas: better defining appropriate inclusion/exclusion criteria for clinical trials and identifying under-represented patient populations who may benefit from a treatment. By processing clinical records from diverse practices, researchers can find more suitable study cohorts for targeted recruitment efforts to ensure that clinical trials are more representative of real-world patient populations. Furthermore, analyzing unstructured data allows for a better understanding of real-world behaviors like treatment adherence or self-reporting of side effects.

Another valuable application of NLP is in tracking and codifying patients’ experiences based on anecdotal descriptions found in clinical notes. For example, phrases such as “The medicine made me feel queasy” can provide qualitative context around drug effects and quality-of-life. This context could support reporting requirements for post-marketing adverse events. Additionally, other qualitative context could complement clinical scoring tools used in the trial setting, potentially expediting label expansions for new indications.

Overcoming Obstacles to Implementation

Despite the opportunities, integrating unstructured data is not without challenges. Concerns around patient privacy and data security pose hurdles. While unstructured text provided for research purposes is typically de-identified, residual identifying information can remain. Utilizing data sources that have gone through multiple layers of de-identification efforts is crucial to mitigate this risk effectively. Further, reliably extracting structured insights from unstructured text across multiple source systems using NLP also presents difficulties.

Developing robust, production-grade NLP models requires immense training data, careful tuning to the healthcare/biomedical domain, and systematic quality testing. Merging unstructured insights with existing structured pipelines is also an intricate systems engineering challenge.

Data Governance and the Path Forward

Looking ahead, stakeholders agree that managed responsibly and embedded into robust data frameworks, unstructured real-world datasets can help drive high quality healthcare to its full potential. Pharmaceutical companies may find accelerated paths to drug approvals and label expansions. Payers could gain transparency to optimize formularies and pricing models. And ultimately, patients may benefit from better targeted treatments.

 

Sources:

  • “Unlocking the Power of Unstructured Data in Drug Discovery.” DrugDiscoveryToday, 2021..
  • “Natural Language Processing in Drug Safety.” Pharmaceutical Medicine, 2021.
  • “Bridging the ‘Efficacy-to-Effectiveness’ Gap.” BioPharmaDive, 2022.
  • “NLP for Clinical Trial Eligibility Criteria.” NEJM Catalyst, 2021.
  • “NLP Applications in Life Sciences and Healthcare.” Optum, 2022.
  • “Challenges of Integrating Unstructured Data in Healthcare.” MIT, 2020.
  • “Data, RWE and the Future of Value-Based Care.” IQVIA, 2022.

 

 

data diversity

Diversity in Data: Why It Matters for Drug Discovery

By | Healthcare Data

Key Takeaways:

  • The absence of diversity in clinical trial data can lead to biases and inequities in healthcare.
  • Regulators like the FDA are emphasizing the need for more diverse and representative data through initiatives like the Real World Evidence Program.
  • Having accurate, diverse real world data leads to more equitable and effective treatments by ensuring safety and efficacy across populations.
  • Pharmaceutical companies should prioritize capturing diverse real-world data and applying advanced analytics to identify variabilities in treatment response.

In recent years, a growing understanding has emerged regarding the critical need for diversity and representation in clinical research data. Historically, certain demographic groups such as women, minorities, and the elderly have been underrepresented in many clinical trials. This lack of diversity in the underlying data can lead to significant biases and inequities when new therapies are approved and launched.

For example, a landmark study in the early 1990s showed that women had been excluded from most major clinical trials, leading to gaps in knowledge about women’s responses to medications. The study found that eight out of ten prescription drugs withdrawn from the market posed greater health risks to women than men. This exemplifies the real dangers of not gathering data across diverse populations.

More recently, the COVID-19 pandemic has further revealed disparities in health outcomes and treatment responses between different demographic groups. Regulators have emphasized the need for clinical trials that are more representative of real-world diversity. In the United States, the Food and Drug Administration (FDA) now requires inclusion of underrepresented populations in clinical trials under the Improving Representation in Clinical Trials initiative.

The FDA has also created the Real World Evidence Program to evaluate the potential use of real-world data (RWD) from sources like electronic health records, insurance claims databases, and registries. The goal is to complement data from traditional trials with more diverse, real-world information on safety, effectiveness, and treatment response variabilities across patient subgroups.

Having access to accurate, representative real-world data enables more equitable and effective treatments in several key ways:

  1. Identifying safety issues or side effects that disproportionately impact certain populations based on factors like age, race, or comorbidities. This allows for better labeling and monitoring.
  2. Ensuring adequate efficacy across all segments of the patient population. Understanding variabilities in treatment response is key for optimal dosing guidance.
  3. Enabling development of targeted therapies for population subgroups where the risk-benefit profile may differ, such as pregnant women.
  4. Avoiding biases and inequities in access to treatment. Diverse data helps prevent therapies from being indicated for only limited populations.
  5. Informing appropriate use criteria and payor coverage decisions based on real-world comparative effectiveness across groups.

From a regulatory compliance perspective, lack of representation in trial data can also lead to delays or rejection of new drug and device applications. The FDA has advised that drugs may not be approvable if safety and efficacy has not been demonstrated across demographics.

Looking ahead, embracing diversity and representativeness throughout the drug discovery process will be critical. Pharmaceutical companies should make gathering inclusive, real-world data a priority. Advanced analytics techniques like machine learning can then help unlock insights about treatment response variabilities within diverse patient populations.

Ultimately, leveraging diverse and representative data will lead to more equitable, effective personalized healthcare and better outcomes for all patients.

Sources:

  • Improving Representation in Clinical Trials and Research: FDA’s New Efforts to Bridge the Gap – FDA
  • Real-World Evidence – FDA
  • Racial and Ethnic Differences in Response to Medicines: Towards Individualized Pharmaceutical Treatment – NIH
  • Addressing sex, gender, and intersecting social identities across the translational science spectrum – NIH
  • Utilizing Real-World Data for Clinical Trials: The Role of Data Curators – NIH

Crafting Compelling Stories from Clinical Trial Data: Leveraging Real-World Insights

By | Clinical Trials

Key Takeaways:

  • Integrating elements of storytelling enriches the presentation of clinical trial data, making it more engaging and informative.
  • Analyzing raw clinical trial data reveals hidden trends and patterns upon which to build a compelling story.
  • Adhering to ethical and regulatory standards is imperative when crafting narratives from clinical trial findings.
  • Augmenting clinical trial data with real-world outcomes allows for a more comprehensive understanding of treatment effectiveness, representing diverse demographics and lifestyles often underrepresented in the controlled clinical trial environment.
  • Communicating the real-world impact of treatments beyond statistical outcomes is crucial for showcasing scientific advancement to a broad audience.

In pharmaceutical research and development, the path from clinical trials to market is strictly defined. It involves analyzing data, conducting research, and ensuring compliance with regulations. It is essential to follow these steps to ensure the safety and efficacy of the drug. Although clinical trial data are often presented scientifically, they contain the potential for powerful storytelling beyond statistical tables and regulatory filings.

In this blog post, we explore the art of storytelling with clinical trial data and discuss how real-world perspectives can augment these insights.

Structuring the Story

Great storytelling hinges on certain fundamental elements. Below are a few of these elements and ways each can be applied to clinical trial data to help inform and engage the audience.

  • Character. In clinical trials, “character” refers to the intervention under study. Highlight the features, benefits, and potential impacts of the intervention.
  • Plot. Create a clear storyline. Start with the problem (condition or disease), introduce an intervention (treatment), describe the process (methodology), and conclude with the results.
  • Setting. Provide context by explaining the background and significance of the studied condition and why the clinical trial is crucial for advancing treatment.
  • Conflict. Discuss challenges faced during the clinical trial, such as recruitment difficulties or unexpected side effects.
  • Resolution. Share the clinical trial outcomes. Present statistical results, efficacy, safety, and any breakthrough findings.
  • Visual aids. Incorporate visuals such as infographics, charts, and interactive dashboards to make complex information more accessible and engaging.

Uncovering the Story

Interpreting clinical trial results can be challenging. To ensure findings are accessible to a broad audience, it is important to construct a compelling narrative around the data.

Clinical trial data serve as the backbone of drug development. In raw form, these data include patient demographics, treatment protocols, treatment responses, adverse events, and efficacy outcomes presented as numbers, tables, and graphs. However, analyses are often required to help uncover the stories within clinical trial data.

Analyzing clinical trial data allows for realizing trends and patterns that are not immediately apparent. Whether it’s the impact of a specific treatment on a particular subgroup or long-term effects beyond the clinical trial period, these insights contribute to a richer and more comprehensive storyline.

Beyond numerical outcomes and statistical significance, the impact of treatment extends to its ability to address unmet medical needs and improve patient outcomes. A balanced presentation of risks and benefits empowers healthcare professionals, policymakers, and patients to understand the data’s implications and make conscientious decisions.

Maintaining ethical standards and transparency is crucial to this process. Adhering to regulatory guidelines and clearly articulating limitations and biases ensures integrity in storytelling with clinical trial data.

Supporting the Story

Clinical trials are only one phase of a drug development lifecycle. Understanding the long-term safety and effectiveness of a treatment often requires evaluation beyond the controlled clinical trial environment.

For example, supplementing clinical trial data with real-world outcomes helps validate hypotheses, identify potential biomarkers, and uncover post-marketing insights. This approach accommodates variations in demographics, socioeconomics, and lifestyle choices often excluded from controlled clinical trials.

In addition, communicating how a treatment translates into improved quality of life or reduced healthcare burden further solidifies the narrative.

Summary

Clinical trial data are critical for advancing healthcare but are often challenging to interpret. Transforming clinical trial data into compelling stories involves focusing on the character, plot, setting, conflict, and resolution of the study while enriching the data with real-world perspectives. Crafting narratives that resonate with diverse stakeholders is crucial for conveying the true impact of treatments driving research, and advancing healthcare for the benefit of everyone.

human genetics

From Genes to Drugs: The Role of Genetics in Modern R&D

By | Clinical Genomics

Key Takeaways:

  • Human genetics research can elucidate mechanisms of disease and help identify new drug targets.
  • Studying genetic variants linked to disease risk or drug response helps stratify patients and inform clinical trials.
  • Genomic data enables the development of precision medicines targeted to patients’ genetic profiles.
  • Pharmocogenomics and genetic screening guides optimal drug usage and minimizes adverse reactions.
  • Advancements in genetic analysis technologies are enabling more rapid and expansive use of genomic data in drug R&D.

The Value of Human Genetics in Drug R&D

Developing new drugs is a lengthy and expensive process with a high failure rate. On average, it takes 10-15 years and over $1 billion to bring a new drug to market. The pharmaceutical industry is looking to human genetics research to improve R&D efficiency, success rates and the personalized utility of new medicines.

Understanding the genetic factors underlying diseases can point the way to new drug targets. Identifying genetic variants linked to disease risk helps elucidate biological pathways involved. Druggable targets can then be identified to modulate relevant pathways and processes. Genetics also helps establish causal mechanisms to avoid spurious associations.

Pharmacogenomics focuses on how genetic variability affects drug response. It enables matching patients to treatments according to genotype to maximize effectiveness and avoid adverse reactions. Testing for pharmacogenomic biomarkers can guide dosing, or indicate alternate treatments when genetics point to likely non-response.

Genetic screening also aids patient stratification and clinical trial optimization. Enriching trial participant selection for those most likely to respond or exhibit a clinical effect improves statistical power with smaller sample sizes. Genetic variables allow better control for confounding factors. Pharmacogenomic testing of participants also helps explain differential responses.

Studying rare genetic variants with large effects (“genetic supermodels”) provides another window into disease biology. The study of extreme genotypes helps unravel mechanisms and identify new targets.

Once a drug is developed, genetics continues to inform optimal use. Screening programs using pharmacogenomic biomarkers guide treatment choices and minimize risks. Genetics also aids mechanistic understanding of how therapies work, illuminating additional applications and opportunities.

The plummeting costs of genome sequencing and advances in big data analytics are enabling more extensive use of human genetic data. Pete Hulick, lead for molecular biology at Eli Lilly, described human genetics as “intersecting with everything that we do” in drug R&D.

Applications in Discovery Research

Early in the R&D process, human genetic insights can point the way to promising disease targets. Scientists look for associations between genetic variants, such as single nucleotide polymorphisms (SNPs), and disease risk. Genome-wide association studies (GWAS) uncover SNP differences between disease and control cohorts. Significant associations indicate genes and biological pathways involved in the disease that may be amenable to pharmacological intervention.

Once potential targets are identified, downstream lab research explores how to modulate them. Developing a drug is an iterative process, but human genetics provides clues on where to start.

Genetics also offers validation when biological hypotheses emerge from other experiments. Confirming that tweaking a gene or pathway affects disease risk strengthens the case for pursuing it as a drug target.

Patient Stratification & Clinical Trials

Patient heterogeneity is a major obstacle in clinical trials. Varied treatment responses lower statistical power and necessitate larger trial sizes. Genetic analysis enables better patient stratification to minimize heterogeneity and identify relevant subgroups.

For example, the cystic fibrosis drug Kalydeco works for patients with a particular CFTR gene mutation. Prescreening patients’ genetics enables targeted trial recruitment. Similar approaches minimize heterogeneity in cancer trials by selecting patients with tumors exhibiting specific mutations.

Genotyping trial participants helps explain differential responses and may uncover additional genotype-specific effects. Genetic associations can also point to new indications for the drug mechanism.

Precision Medicine

The emergence of targeted precision therapies relies directly on human genetics. Cancer treatments like Herceptin and Gleevec target tumors with specific genomic variants. HIV drugs are tailored to individual viral genotypes. Gene therapies introduce corrected genes to compensate for defective inherited genes.

This personalized approach promises greater efficacy for those most likely to respond. By targeting drugs based on genetic profiles, precision medicine seeks to maximize benefit while minimizing unnecessary treatment.

Pharmacogenomics for Safety & Optimization

Pharmacogenomic testing assesses how genetic variability affects reactions to drugs. It can identify patients likely to experience adverse events or suboptimal responses. This enables selecting safer treatments, dosage adjustment or more intense monitoring.

The blood thinner warfarin, for example, demonstrates significant pharmacogenomic effects. Genotyping helps guide ideal dosing to balance effectiveness and bleeding risks. The FDA added pharmacogenomic guidance on warfarin labeling in 2007.

Wider adoption of pharmacogenomic testing has the potential to reduce adverse drug events that represent a significant public health burden. More optimal treatment through genetic guidance also contributes to pharmacoeconomic goals.

Looking Ahead

The expanding use of human genetics is transforming every phase of drug R&D. While challenges remain in interpreting and applying genetic findings, the value in accelerating discovery, precision medicine and optimized therapeutics is evident. Advances in high-throughput genomics, big data analytics    and machine learning will further incorporate human genetics into tomorrow’s medicines.

 

Sources:

  • Relling & Evans, Nature Reviews Drug Discovery 2015
  • Roden & Denny, Annual Review of Medicine 2019
  • Genomics England PanelApp Pharmacogenetics Gene Curation Group,NPJ Genomic Medicine 2020
  • Li et al., Nature Reviews Genetics 2020
  • Manolio et al., JAMA 2020
  • Xu et al., Nature Reviews Drug Discovery 2022
OMOP

The Basics of OMOP – Data Standardization in Healthcare

By | Health Data Types, Healthcare Data

Key Takeaways:

  • OMOP Defined: The Observational Medical Outcomes Partnership (OMOP) is a common data model for organizing healthcare data from various sources.
  • Objective: OMOP aims to standardize and integrate diverse healthcare data facilitating analysis and research.
  • Data Structuring: It organizes data into standard tables and fields (observations, procedures, drug exposures, conditions, etc.), enhancing analytics across datasets.
  • Enhanced Analysis: A common data model allows for larger data pooling, increasing statistical power in analysis.
  • Privacy Protection: OMOP prioritizes patient privacy, using de-identified data while retaining analytical utility.

What is OMOP Data?

OMOP represents a collaborative effort to standardize the transformation and analysis of healthcare data from diverse sources. Its goal is to optimize observational data for comparative research and analytics. The OMOP Common Data Model (CDM) prescribes a structured format for organizing heterogeneous healthcare data, encompassing demographics, encounters, procedures and more. This facilitates cross-platform analytics and queries. Notably, OMOP is a blueprint for data organization, not a database. It supports data standardization across platforms, leading to more robust datasets.

Key Features of OMOP:

  • Vocabulary Standards: For coding concepts like conditions and medications.
  • Standard Formats: For dates, codes and relational data structures.
  • Person-Centric Model: Data connected to individuals over time.
  • Support for Various Data Types: Like EHR, claims, registries, etc.
  • Open Source Licensing: Promotes free implementation and continuous evolution of the standard.

OMOP’s standardization ensures key clinical concepts are represented uniformly, balancing analytical utility with patient privacy.

Use of OMOP Data:

OMOP facilitates practical medical research by standardizing observational data, enabling:

  • Cross-platform analytics on combined datasets.
  • Reproduction of analyses and sharing of methods.
  • Application of predictive models across diverse data types.
  • Support for safety surveillance and pharmacovigilance.
  • Conducting population health studies and comparative effectiveness research.

Implemented by a variety of organizations, OMOP enables significant analytical use cases, including drug safety signal detection, real-world treatment outcome analysis and population health forecasting. By creating a common language for healthcare data, OMOP fosters data integration and analysis on a larger scale, accelerating health research.

Sources:

Observational Health Data Sciences and Informatics (OHDSI) OMOP Common Data Model Specifications https://www.ohdsi.org/data-standardization/the-common-data-model/

Hripcsak, G., Duke, J.D., Shah, N.H., Reich, C.G., Huser, V., Schuemie, M.J. et al. Observational Health Data Sciences and Informatics (OHDSI): Opportunities for Observational Researchers. Stud Health Technol Inform. 2015

Overview of the OMOP Common Data Model – HealthIT.gov https://www.healthit.gov/sites/default/files/ptp13-700hhs_white.pdf

human genetics

Human Genetics as a Strategic Imperative to Accelerate Drug Discovery: The Alliance for Genomic Discovery

By | Clinical Genomics

Key Takeaways:

  • Pharmaceutical development is high-risk and resource-intensive, with a 90% failure rate in clinical trials, often due to inadequate efficacy, toxicity, drug properties, or commercial viability.
  • Incorporating human genetic evidence doubles drug approval rates, paving the way for innovative therapies and new molecular entities.
  • Techniques like GWASs and PheWAS linking genetic data to phenotypic data enhance drug development by identifying associations between rare alleles and diseases.
  • Published human genetic studies, primarily centered on individuals of European descent, hinder our understanding of genetic diversity and impede the development of new therapies suitable for diverse populations; therefore, establishing study cohorts with under-represented populations is crucial for promoting health equality and identifying novel drug targets based on diverse genetic variants.
  • The Alliance for Genomic Discovery (AGD) aims to reshape drug development by sequencing 250,000 diverse samples, providing a powerful resource for pharmaceutical members to correlate genetic variations with clinical outcomes and, in turn, enabling these companies to better serve a global population.

 

The Struggle to Discover New Therapies

Discovering and developing pharmaceuticals is a resource-intensive and high-risk endeavor, sometimes spanning 15 years with costs exceeding $2 billion for their approval (Hinkson et al., 2020). Shockingly, about nine out of ten potential therapies, upon progressing to clinical trials, fail before approval (Dowden & Munro, 2019; Sun et al., 2022). The four primary contributors to the staggering 90% failure rate in drug development are inadequate clinical efficacy, unmanageable toxicity, suboptimal drug-like properties and a lack of commercial viability (Dowden & Munro, 2019; Harrison, 2016; Sun et al., 2022). To increase the chances of a drug target passing these critical checkpoints, considerable endeavors can be directed towards incorporating human genetic evidence into drug development.

In the drug development pipeline, all compounds before entering clinical phases must undergo rigorous testing in animal models, providing significant evidence of their potential to treat diseases. However, despite promising results in preclinical studies, the translation of efficacy and safety from animal models to human clinical trials is often elusive. Integrating human genetic evidence into the drug development process has recently emerged as a crucial strategy to navigate this challenge. Drugs grounded in such evidence exhibit a twofold increase in approval rates (Nelson et al., 2015), contributing to a higher prevalence of first-in-class therapies and new molecular entities (NMEs) (King et al., 2019). This not only accelerates the approval process but also streamlines the discovery of more effective and targeted treatments. Leveraging human genetic data empowers researchers with valuable insights into the genetic basis of diseases, facilitating the identification of better drug targets. The substantial presence of genetic evidence in FDA-approved drugs in 2021 (Ochoa et al., 2022) underscores its instrumental role in advancing drug discovery and fostering the emergence of innovative pharmaceutical solutions.

Linking Genetics to Clinical Data for Drug Discovery

To incorporate genetics into therapeutic development, researchers can link the genetic code of an individual to their Electronic Health Records (EHRs). Researchers can use techniques like Genome-wide association studies (GWASs), Phenome-wide association studies (PheWAS), Mendelian Randomization or Loss/Gain-of-Function Variants to discover associations between

rare alleles and human disease (Krebs & Milani, 2023). Using these techniques, drugs tailored for Mendelian disorders have achieved notable success in clinical trials and approvals (Heilbron et al., 2021). For instance, the genetic disease Autosomal dominant hypercholesterolemia (ADH) confers an increased risk of coronary artery disease (CAD) through elevated levels of plasmatic low-density lipoprotein (LDL). By linking phenotypic data with genetic data, researchers were able to identify the association of the PCSK9 gene with high LDL levels (Abifadel et al., 2003). This kickstarted a series of studies that culminated in the approval of two monoclonal antibodies that inhibit PCSK9, Repatha (Evolocumab) and Praluent (Alirocumab) (Krebs & Milani, 2023; Robinson et al., 2015) with their treatment reducing the rate of major adverse cardiovascular events by half (Kaddoura et al., 2020). Indeed, therapies derived from these kinds of impactful rare alleles exhibit a 6-7.2 times greater likelihood of receiving approval due to their substantial effect on symptoms (Nelson et al., 2015; King et al., 2019). However, for many prevalent diseases, heritable risk is predominantly associated with numerous common variants, each having smaller individual effect sizes. This intricate genetic landscape complicates the identification of therapeutic targets, making the discovery of new avenues for therapy challenging and necessitating new strategies.

So far, a disproportionate number of published human genetic studies have centered on individuals of European descent (Fatumo et al., 2022). However, this narrow focus restricts our understanding to a limited diversity of alleles and genetic disorders, hindering the development of new therapies. To promote health equality, it’s crucial to establish study cohorts that include under‐represented populations. After all, individuals of European descent represent only a fraction of the total human genetic variation (Heilbron et al., 2021). Diverse cohorts represent unique opportunities for identifying novel drug targets based on genetic variants that are less frequent or even absent in people of European ancestry. Genetic discoveries will have greater discovery power in populations where a disease is more prevalent and, hence, with larger disease cohorts; at the same time, these discoveries will be more relevant and beneficial for these populations.

Founding the Alliance for Genomic Discovery

This need to identify rare genetic variants in diverse patient cohorts has driven the collaboration of NashBio and Illumina Inc. to establish AGD. AGD, comprising eight member organizations—AbbVie, Amgen, AstraZeneca, Bayer, Merck, Bristol Myers Squibb (BMS), GlaxoSmithKline Pharmaceuticals (GSK), and Novo Nordisk (Novo)—aims to expedite therapeutic development through whole-genome sequencing (WGS) 250,000 samples from Vanderbilt University Medical Center’s (VUMC) biobank repository, BioVU®. As the first phase in AGD, deCODE genetics performed WGS on the first 35,000 VUMC samples, primarily made up of DNA from individuals of African ancestry. Moving forward, deCODE/Amgen will sequence the remaining samples for the Alliance members to have access to the resulting data for drug discovery and therapeutic development. The WGS data will then be linked with structured EHR data from NashBio and VUMC, creating a valuable resource for pharmaceutical members to correlate genetic variations with clinical outcomes. To learn more about how AGD aims to accelerate drug discovery and to hear directly from the alliance members, click here.

Summary

AGD marks a pivotal step in reshaping drug development, offering a solution to the challenges plaguing the pharmaceutical industry. With a staggering 90% failure rate in clinical trials, the incorporation of human genetic evidence into drug development by AGD aims to increase the approval likelihood of drug targets, fostering the discovery of more effective and targeted treatments. AGD also aims to address the limitations of existing genetic resources and studies. The WGS of 250,000 samples, encompassing diverse populations and linked with structured EHR data, provides pharmaceutical members with a powerful resource. This not only accelerates drug discovery but also facilitates the development of tailored therapies. AGD represents a significant step toward healthcare equality, highlighting the importance of diverse genetic studies in progressing drug discovery for the benefit of all people.

 

References

Abifadel, M., Varret, M., Rabès, J.-P., Allard, D., Ouguerram, K., Devillers, M., Cruaud, C., Benjannet, S., Wickham, L., Erlich, D., Derré, A., Villéger, L., Farnier, M., Beucler, I., Bruckert, E., Chambaz, J., Chanu, B., Lecerf, J.-M., Luc, G., … Boileau, C. (2003). Mutations in PCSK9 cause autosomal dominant hypercholesterolemia. Nature Genetics, 34(2), 154–156. https://doi.org/10.1038/ng1161

Dowden, H., & Munro, J. (2019). Trends in clinical success rates and therapeutic focus. Nature Reviews. Drug Discovery, 18(7), 495–496. https://doi.org/10.1038/d41573-019-00074-z

Fatumo, S., Chikowore, T., Choudhury, A., Ayub, M., Martin, A. R., & Kuchenbaecker, K. (2022). A roadmap to increase diversity in genomic studies. Nature Medicine, 28(2), 243–250. https://doi.org/10.1038/s41591-021-01672-4

Harrison, R. K. (2016). Phase II and phase III failures: 2013-2015. Nature Reviews. Drug Discovery, 15(12), 817–818. https://doi.org/10.1038/nrd.2016.184

Heilbron, K., Mozaffari, S. V, Vacic, V., Yue, P., Wang, W., Shi, J., Jubb, A. M., Pitts, S. J., & Wang, X. (2021). Advancing drug discovery using the power of the human genome. The Journal of Pathology, 254(4), 418–429. https://doi.org/10.1002/path.5664

Hinkson, I. V., Madej, B., & Stahlberg, E. A. (2020). Accelerating Therapeutics for Opportunities in Medicine: A Paradigm Shift in Drug Discovery. Frontiers in Pharmacology, 11. https://doi.org/10.3389/fphar.2020.00770

Kaddoura, R., Orabi, B., & Salam, A. M. (2020). PCSK9 Monoclonal Antibodies: An Overview. Heart Views : The Official Journal of the Gulf Heart Association, 21(2), 97–103. https://doi.org/10.4103/HEARTVIEWS.HEARTVIEWS_20_20

King, E. A., Davis, J. W., & Degner, J. F. (2019). Are drug targets with genetic support twice as likely to be approved? Revised estimates of the impact of genetic support for drug mechanisms on the probability of drug approval. PLOS Genetics, 15(12), e1008489. https://doi.org/10.1371/journal.pgen.1008489

Krebs, K., & Milani, L. (2023). Harnessing the Power of Electronic Health Records and Genomics for Drug Discovery. Annual Review of Pharmacology and Toxicology, 63(1), 65–76. https://doi.org/10.1146/annurev-pharmtox-051421-111324

Nelson, M. R., Tipney, H., Painter, J. L., Shen, J., Nicoletti, P., Shen, Y., Floratos, A., Sham, P. C., Li, M. J., Wang, J., Cardon, L. R., Whittaker, J. C., & Sanseau, P. (2015). The support of human genetic evidence for approved drug indications. Nature Genetics, 47(8), 856–860. https://doi.org/10.1038/ng.3314

Ochoa, D., Karim, M., Ghoussaini, M., Hulcoop, D. G., McDonagh, E. M., & Dunham, I. (2022). Human genetics evidence supports two-thirds of the 2021 FDA-approved drugs. Nature Reviews. Drug Discovery, 21(8), 551. https://doi.org/10.1038/d41573-022-00120-3

Robinson, J. G., Farnier, M., Krempf, M., Bergeron, J., Luc, G., Averna, M., Stroes, E. S., Langslet, G., Raal, F. J., El Shahawy, M., Koren, M. J., Lepor, N. E., Lorenzato, C., Pordy, R., Chaudhari, U., & Kastelein, J. J. P. (2015). Efficacy and Safety of Alirocumab in Reducing Lipids and Cardiovascular Events. New England Journal of Medicine, 372(16), 1489–1499. https://doi.org/10.1056/NEJMoa1501031

Sun, D., Gao, W., Hu, H., & Zhou, S. (2022). Why 90% of clinical drug development fails and how to improve it? Acta Pharmaceutica Sinica. B, 12(7), 3049–3062. https://doi.org/10.1016/j.apsb.2022.02.002

healthcare data

De-identification: Balancing Privacy and Utility in Healthcare Data

By | Healthcare Data

Key Takeaways:

  • De-identification is the process of removing or obscuring personal health information in medical records to protect patient privacy.
  • De-identification is critical for enabling the sharing of data for secondary research purposes such as public health studies while meeting privacy regulations like HIPAA.
  • Common de-identification techniques include suppression, generalization, perturbation and synthetic data generation.
  • There is often a balance between data utility and privacy risk that must be evaluated on a case-by-case basis when de-identifying data.
  • Emerging privacy-enhancing computation methods like federated learning and differential privacy offer complementary approaches to de-identification.

What is De-identification and Why is it Important for Healthcare?

Patient health information is considered highly sensitive data in need of privacy protections. However medical data sharing enables critically important research on public health, personalized medicine and more. De-identification techniques that remove identifying information and decrease the risk of exposing protected health information serve a crucial role in balancing these needs for privacy and innovation.

Definitions and Concepts

The HIPAA Privacy Rule defines de-identification as the process of preventing a person’s identity from being connected with health information. Once data has been de-identified per the Privacy Rule’s standards, it is no longer considered protected health information (PHI) and can be freely shared for research use cases like public health studies, therapeutic effectiveness studies  and medical informatics analytics.

Perfect de-identification that carries no risk of re-identification of patients is very difficult, if not impossible, to accomplish with current technology. As a result, regulations like HIPAA allow for formal designations of “de-identified” health data based on achieving sufficient pseudonymity through the suppression or generalization of identifying tag elements. HIPAA also defines a limited data set containing certain scrubbed identifiers that can be shared with a data use agreement rather than fully stripped identifiers.

The re-identification risk spectrum ranges from blatant identifiers like names, home addresses and social security numbers to quasi-identifiers like birthdates and narrowed locations that would not directly name the patient but could be pieced together to deduce identity in combination, especially as external data sources grow more public over time. State-of-the-art de-identification evaluates both blatant and quasi-identifier risks to minimize traceability while maximizing analytic utility.

Motivating Use Cases

Research and public health initiatives rely on the sharing of de-identified health data to drive progress on evidence and outcomes. The Cancer Moonshot’s data sharing efforts highlight the massive potential impact of medical databases, cohorts and real-world evidence generation on accelerating cures via de-identified data aggregation and analytics. The open FDA program demonstrates governmental encouragement of privacy-respecting access to regulatory datasets to inform digital health entrepreneurs. Patient matching in these fragmented healthcare datasets would be impossible using directly identifiable data. Apple’s ResearchKit and CareKit frameworks facilitate de-identified mobile health data sharing for app developers to build new participatory research applications.

Data marketplaces and trusted third parties are emerging to certify and exchange research-ready, consented data assets like clinico-genomic data underlying scientific publications and clinical trials. Startups and health systems manage data sharing agreements and audit logs around distributed sites leveraging de-identified data. Rich metadata combined with privacy-preserving record linkage techniques that avoid direct identifiers enables specific patient subgroup analytics without compromise.

Overall research efficiency improves when more participants openly share their health data. But none of this research progress would be possible if stringent de-identification practices were not implemented to earn patient trust in data sharing.

De-Identification Techniques and Standards

There are two high level categories of common de-identification protocols in healthcare: 1) suppressing blatant identifiers, typically following frameworks like HIPAA, and 2) actively transforming the data itself through various forms of generalization, perturbation or synthetic data production.

Suppressing Identifiers

The HIPAA Privacy Rule designates 18 categories of Protected Health Information identifiers that must be removed to achieve de-identified status, including names, geographic details narrower than state level, all dates other than years, contact information, IDs and record numbers, vehicle and device identifiers, URLs, IP addresses, biometrics etc.

Messages, images and unstructured data require specialized redaction processes to scrub both blatant and quasi-identifiers related to the patient, provider, institution or researchers involved. Named entity recognition and text annotation techniques help automate the detection of identifiable concepts. Voice data and video are more challenging mediums to de-identify.

Generalization and Aggregation

When formal dates, locations, ages over 89 and other quasi-identifiers cannot be completely suppressed without losing analytic value from the structured data, generalization techniques help band these details into abstract categories to preserve some descriptive statistics while hiding individual values.

Aggregating masked data across many patient records also prevents isolation of individuals. Row level de-identification risks in sparse data featuring outliers and uncommon combinations of traits can be mitigated by pooling data points into broader summaries before release rather than allowing raw access.

Perturbation

Perturbation encompasses a wide array of mathematical and statistical data alteration techniques that aim to distort the original data values and distributions while maintaining the general trends and correlations warranting analysis.

Value distortion methods include quantization to normalize numbers into ranges, squeezing and stretching value dispersion, rounding or truncating decimals, swapping similar records and discretizing continuous variables. Objects can be clustered into groups that are then analyzed in aggregate. Multiple perturbed versions of the dataset can be safely released to enable reproducible confirmation of discovered associations while avoiding leakage of the precise source data.

Combinations of generalization and perturbation provide flexibility for particular data types and contexts. The strengths, weaknesses and tuning of parameters merit a technical deep dive. The key is calibrating perturbation to maximize analytic integrity while minimizing correlation risk. Ongoing access rather than static publication also allows refinement of data treatment to meet evolving security assumptions and privacy regulations.

Synthetic Data

Synthetic datasets represent an emerging approach for modeling realistic artificial data distributions that resemble an actual patient group without containing the original records. Once the statistical shape of data properties is learned from the genuine dataset, simulated synthetic data can be sampled from generative models that emulate plausible features and relationships without allowing deduction of the real samples.

The underlying models must sufficiently capture multidimensional interactions and representation of minority groups within the patient population. Features such as ethnicity, outcomes, treatments and behaviors must be appropriately represented instead of using simplistic or biased summary statistics that ignore important correlations. Synthetic data techniques applying machine learning and differential privacy mechanisms to reconstruct distributions show significant promise for shared data sandbox environments. Cloud vendors like AWS, Google Cloud and Microsoft Azure now provide synthetic data services.

Evaluating the Risk-Utility Tradeoff

Ideally, de-identified health data removes enough identifying risk to prevent adversaries from recognizing individuals while retaining enough fidelity to offer scientific utility for the intended analyses by qualified researchers. But optimizing both privacy protection and analytic value requires navigating technical and ethical nuances around plausible re-identification vulnerabilities and scenarios balanced against access restrictions on derivative insights in the public interest.

Quantitative statistical metrics like k-anonymity models attempt to mathematically define anonymity sets with at least k records containing a combination of quasi-identifiers to avoid isolation. L-diversity metrics further generalize and dilute these groups to limit confidence of guessing the matching identity. Closeness measures how much perturbation may have altered correlations. Quantifying information loss helps data curators shape treatment processes and inclusion of synthetic records. Interpreting these model-based metrics requires understanding their assumptions and limitations with respect to adversary means and background knowledge.

More meaningful measures account for qualitative harms of identity traceability for affected groups based on socioeconomic status, minority populations, immigration factors, substance history, abuse status, disability needs and other cultural contexts that influence vulnerability irrespective of mathematical protections. Trusted access policies should offer options for verifiable rationale when requesting clearer data from data stewards who can evaluate situational sensitivity factors.

Overall responsibility falls upon custodial institutions and data safe havens to conduct contextual integrity assessments ensuring fair data flows to legitimate purposes. This means formally evaluating both welfare impacts on individuals and excluded populations, as well as potential data misuses or manipulative harms at population scale, such as discriminatory profiling. Updated governance mechanisms must address modern re-identification realities and connective threats.

Future Directions

Traditional de-identification practices struggle with handling high-dimensional, heterogeneous patient profiles across accumulating data types, modalities, apps, sensors, workflows and research studies. While valuable on their own, these techniques may fail to fully protect individuals as the ubiquity of digital traces multiplies potential quasi-identifiers. Absolute anonymity also severely limits permissible models and computations.

Emerging areas like federated analytics and differential privacy relax the goal of total de-identification by keeping raw records secured on distributed data servers and only allowing mathematical summaries to be queried from a central service so that statistical patterns can be discovered from many sites without exposing actual inputs from any one site. Legally defined LIMITED DATA SETS similarly bridge consented data access with managed identity risks for pre-vetted analysts.

Differentially private computations introduce mathematically calibrated noise to guarantee that the presence or sensitive attributes tied to any one patient will be masked across many patients. This masking allows research insights to be uncovered without revealing individual contributions. Secure multiparty computation and homomorphic encryption also enable certain restricted computations like aggregates, means and distributions executed on sensitive inputs while keeping the underlying data encrypted.

Such cryptographic methods and privacy-enhancing technologies provide complementary assurances to traditional de-identification practices. But governance, interpretation and usability remain active areas of improvement to fulfill ethical promises in practice. Holistic data safe havens must align emerging privacy-preserving computation capabilities with rigorous curation, context-based de-identification protocols and trust-based oversight mechanisms that can demonstrably justify public interest usages while preventing tangible harms to individuals and communities whose sensitive data fuels research.

Sources:
https://www.hhs.gov/hipaa/for-professionals/privacy/special-topics/de-identification/index.html 

https://www.ncsl.org/research/telecommunications-and-information-technology/hipaa-de-identification-state-laws.aspx 

https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0234962 

https://www.cancer.gov/research/key-initiatives/moonshot-cancer-initiative 

Health data

Understanding Key Health Data Types: Clinical Trials, Claims, EHRs

By | Clinical Trials, EHR, Health Data Types

Key Takeaways:

  • Key healthcare data types include clinical trials, insurance claims, and electronic health records (EHRs), each with distinct purposes.
  • Clinical trial data directly captures efficacy and safety of interventions, but availability is limited until publication and may lack generalizability.
  • Insurance claims provide large-scale utilization patterns, outcomes metrics across diverse groups, and cost analysis, but lack clinical precision.
  • EHR data offers longitudinal individual patient history and care details in operational workflows but quality and standardization varies.
  • Combining evidence across clinical trials, claims data, and EHRs enables real-world monitoring of interventions to guide optimal decisions and policies.

In an era of big data and analytics-driven healthcare, evidence informing clinical and policy decisions draws from an expanding variety of data sources that capture different aspects of patient care and outcomes. Three vital sources of health data include structured databases tracking results of clinical trials, administrative insurance claims systems, and electronic health records (EHRs) compiled at hospitals and health systems. Each data type serves distinct purposes with inherent strengths and limitations.

This article explains the defining characteristics, appropriate use cases, and limitations of clinical trials, insurance claims data, and EHRs for healthcare and life science researchers, operators, and innovators.. Combining complementary dimensions across data types enables robust real-world monitoring of healthcare interventions to guide optimal decisions and policies matched to specific populations.

Clinical Trials

The randomized controlled trial (RCT) serves as the gold standard for evaluating safety and efficacy of diagnostic tests, devices, biologics, and therapeutics prior to regulatory approvals. Clinical trials compare treatments in specific patient groups, following strict protocols and monitoring outcomes over a set study period. Data elements captured include administered treatments, predefined clinical outcomes, patient-reported symptoms, clinician assessments, precision diagnostics, genomic biomarkers, other quantifiable endpoints, and adverse events.

RCT datasets supply the most scientifically valid assessment of efficacy and toxicity for an intervention compared to alternatives like placebos or other drugs because influential variables are intentionally balanced across study arms using eligibility criteria and random assignment. This internal validity comes at a cost of potentially reduced generalizability and applicability. As a result there is a challenge in translating benefits and risks accurately into heterogeneous real-world populations. Published trial findings often overstate effectiveness when applied more broadly. Additional data from pragmatic studies is needed to complement classical efficacy findings along the product lifecycle.

Supplemental data integration is required to expand evidence beyond the limited snapshots of clinical trial participants and into continuous monitoring of outcomes across wider populations who are prescribed the treatments clinically. Here the high-level perspectives of insurance claims data and granular clinical details contained in EHRs play a vital role.

Insurance Claims

Administrative claims systems maintained by public and commercial health insurers serve payment and reimbursement purposes rather than research goals. Yet analysis of population-level claims data containing coded diagnoses, procedures performed, medications dispensed, specialty types, facilities visited, costs billed and reimbursed enables important usage trends, treatment patterns, acute events, and cost efficiency insights which complements clinical trials.

Claims provide researchers a broad window into diagnoses, prescribed interventions, and health outcomes frequently spanning millions of covered lives across geographical regions that are absent from most trials. Claims data encompasses at all covered care delivered rather than isolated interventions. Examining trends over longer timeframes across more diverse patients who differ from strict trial eligibility enables assessment of real-world utilization frequencies, comparative effectiveness versus alternatives, clinical guideline adherence, acute complication rates, mortality metrics, readmission trends, and direct plus indirect medical costs.

However, claims data lacks the precise clinical measures systematically captured in trials and EHR records. Billing codes often fail to specify clinical severity or capture quality of life impacts. Available data elements focus primarily on how much and how often healthcare services are used rather than qualitative clinical details or patient-reported outcomes. Underlying diagnoses and accuracy of coding may require supplementary validation. Despite its limitations, claims data plays a crucial role in providing essential information for healthcare professionals, researchers, and policymakers. It serves as a valuable tool for monitoring diverse aspects of the healthcare system, ultimately contributing to the assurance of efficient, safe, and effective treatments.

While abbreviated claims codes document utilization events at a population level and clinical trials quantify experience for circumscribed groups, the patient-centric Electronic Health Record (EHR) details comprehensive individual-level clinical data as an immutable ledger accumulated over years of clinical encounters across care settings. The longitudinal EHR chronicles detailed diagnoses, signs and symptoms, lab orders and results, exam findings, procedures conducted, prescriptions written, physician notes and orders, referral details, communications around critical results, and other discrete or unstructured elements reflecting patient complexity often excluded from claims data and trials.

EHRs

EHRs provide fine-grained data for precision medicine inquiries into subsets of patients with common clinical trajectories, risk profiles, comorbidities, socioeconomic factors, access challenges, genomic risks, family histories of related illnesses, lifestyle behaviors like smoking, and personalized interventions based on advanced molecular markers. EHR data supports deep phenotyping algorithms and temporal pattern analyses that can extract cohort comparisons not feasible solely from claims.

Secondary use of EHR data faces challenges in representativeness when drawing data from single health systems rather than national networks, variability in coding terminologies and data entry fields across platforms, fragmentation forcing linkage between separate specialties and sites of care, semi-structured formats with mixed discrete codified and free text variables, and data quality gaps during clinician workflow constraints. Population-based claims data ensures inclusion of patients seeking care across all available providers rather than just one health system.

Integrating Complementary Evidence

Definitive clinical trial efficacy remains the gold standard when initially evaluating medical interventions, while large-scale claims data offers a complementary view of broader utilization patterns and comparative outcomes across more diverse populations who are receiving interventions in clinical practice. However, as interventions diffuse beyond the research setting, reliable acquisition of clinical details requires merging population-based signals from claims with deep clinical data contained uniquely within EHRs.

Combining evidence across clinical trials, claims databases, and EHR repositories maximizes strengths of each data type while overcoming inherent limitations of any single source. Clinical trials determine effectiveness, and combining insights from large-scale claims data with detailed clinical information in EHRs is crucial for assessing interventions as they transition from research to practical healthcare, contributing to overall healthcare improvement.

 

Aspect Clinical Trial Data Claims Data EHR Data
Primary Purpose Research and development of new treatments Billing and reimbursement for services Patient care and health record keeping
Data Source Controlled clinical studies Insurance companies, healthcare providers Healthcare providers
Data Types Included Patient demographics, treatment details, outcomes Patient demographics, services rendered, cost Patient demographics, medical history, diagnostics, treatment plans
Data Structure Highly structured and standardized Structured but varies with payer systems Structured and unstructured (e.g., doctor’s notes)
Temporal Span Limited to the duration of the trial Longitudinal, covering the duration of coverage Longitudinal, covering comprehensive patient history
Access and Privacy Restricted, subject to clinical trial protocols Restricted, governed by health insurance portability and accountability act (HIPAA) regulations Restricted, governed by HIPAA and patient consent
Primary Users Researchers, pharmaceutical companies Healthcare providers, payers, policy makers Healthcare providers, patients
Data Volume and Variety Relatively limited, focused on specific conditions Large, diverse covering a wide range of conditions and services Large, diverse, includes a wide range of medical information
Use in Healthcare Drug development, understanding treatment effectiveness Healthcare economics, policy making, fraud detection Direct patient care, diagnosis, treatment planning
Challenges Limited generalizability, high cost Variability in coding, potential for missing data Inconsistent data entry, variability in EHR systems

 

Sources:
https://www.ncbi.nlm.nih.gov/books/NBK11597/ 

https://pubmed.ncbi.nlm.nih.gov/10146871/

https://www.fda.gov/drugs/types-applications/new-drug-application-nda 

https://www.nia.nih.gov/research/blog/2017/06/pragmatic-clinical-trials-testing-treatments-real-world

Polygenic Risk Scores

An Introduction to Polygenic Risk Scores: Aggregating Small Genetic Effects to Stratify Disease Risk

By | Polygenic Rick Scores

Key Takeaways:

  • Polygenic risk scores aggregate the effects of thousands of genetic variants to estimate an individual’s inherited risk for complex diseases.
  • Polygenic risk is based on genome-wide association studies that identify common variants associated with modest increases in disease risk.
  • Polygenic scores provide risk stratification beyond family history, but most disease risk is not yet explained by known variants.
  • Clinical validity and utility of polygenic scores will improve as more disease-associated variants are discovered through large genomic studies.
  • Polygenic risk models may one day guide targeted screening and preventive interventions, but face challenges related to clinical interpretation and implementation.

Introduction to Polygenic Risk Scores

The vast majority of common, chronic diseases do not follow simple Mendelian inheritance patterns, but rather are complex genetic conditions arising from the combined small effects of thousands of genetic variations interacting with lifestyle and environmental factors. Polygenic risk scores aggregate information across an individual’s genome to estimate their inherited susceptibility for developing complex diseases like heart disease, cancer, diabetes and neuropsychiatric disorders.

Polygenic risk scores are constructed using data from genome-wide association studies (GWAS) that scan markers across the genomes of thousands to millions of individuals to identify genetic variants associated with specific disease outcomes. While most disease-associated variants have very small individual effects, the combined effect of thousands of these common, single nucleotide polymorphisms (SNPs) can stratify disease risk in a polygenic model.

Polygenic Scores vs. Single Gene Mutations

In monogenic diseases like cystic fibrosis and Huntington’s disease, a single genetic variant is necessary and sufficient to cause disease. Genetic testing for causal mutations in specific disease-linked genes provides a clear-cut diagnostic assessment. In contrast, no single gene variant accounts for more than a tiny fraction of risk for complex common diseases. Polygenic risk models aggregate the effects of disease-associated variants across the genome, each imparting a very modest increase or decrease in risk. An individual’s polygenic risk score reflects the cumulative impact of thousands of small risk effects spread across their genome.

While polygenic scores are probabilistic and estimate only inherited genetic susceptibility, monogenic mutations convey deterministic information about disease occurrence. However, for many individuals with elevated polygenic risk scores, modifiable lifestyle and environmental factors may outweigh their inherited predisposition, allowing prevention through early intervention.

GWAS and Polygenic Scores

Human genome-wide association studies utilize DNA microarray ‘chips’ containing hundreds of thousands to millions of SNPs across the genome. Comparing SNP frequencies between thousands of disease cases and controls reveals variations associated with disease diagnosis. Each SNP represents a common genetic variant present in more than 1-5% of the population. Individually, SNP effects on disease risk are very modest, usually less than 20% increase in relative risk.

However, by aggregating the effects of disease-associated SNPs, polygenic risk models can categorize individuals along a spectrum of low to high inherited risk. Polygenic scores typically explain 7-12% of disease variance, though up to 25% for some cancers. The more powerful the original GWAS in terms of sample size, the better the polygenic score will be at predicting an individual’s predisposition.

Constructing Polygenic Scores

Various methods exist for constructing polygenic scores after identifying disease-associated SNPs through GWAS. Most commonly, a SNP effect size is multiplied by the number of risk alleles (0, 1 or 2) for that SNP in a given individual. These products are summed across all chosen SNPs to derive an overall polygenic risk score. SNPs strongly associated with disease receive more weight than weakly associated markers.

Rigorous validation in independent sample sets evaluates the predictive performance of polygenic scores. Optimal SNP inclusion thresholds are selected to maximize predictive ability. Polygenic models lose power with too few or too many SNPs included. Ideal thresholds retain SNPs explaining at least 0.01% of disease variance based on GWAS significance levels.

Applications and Limitations

Polygenic risk models are currently most advanced for coronary artery disease, breast and prostate cancer, type 2 diabetes and inflammatory bowel disease. Potential clinical applications include:

  • Risk stratification to guide evidence-based screening recommendations beyond family history.
  • Targeted prevention and lifestyle modification for individuals at elevated genetic risk.
  • Informing reproductive decision-making and genetic counseling based on polygenic risk.
  • Improving disease prediction, subtyping and prognosis when combined with clinical risk factors.

However, limitations and ethical concerns exist around polygenic score implementation:

  • Most heritability remains unexplained. Adding more SNPs only incrementally improves prediction.
  • Polygenic testing may prompt unnecessary interventions if clinical validity and utility are not adequately demonstrated.
  • Possible psychological harm and discrimination from genetic risk probabilization.
  • Unequal health benefits if not equitably implemented across populations.

While polygenic scores currently identify individuals with modestly increased or decreased disease risks, their predictive utility is anticipated to grow exponentially with million-person biobank efforts and whole-genome sequencing. Harnessing the full spectrum of genomic variation contributing to polygenic inheritance will enable more personalized risk assessment and clinical decision-making for complex chronic diseases.

Sources:

  1. Torkamani A, Wineinger NE, Topol EJ. The personal and clinical utility of polygenic risk scores. Nat Rev Genet. 2018 Sep;19(9):581-590.
  2. Lewis CM, Vassos E. Polygenic risk scores: from research tools to clinical instruments. Genome Med. 2020 May 6;12(1):44.
  3. Khera AV, Chaffin M, Zekavat SM, et al. Whole-genome sequencing to characterize monogenic and polygenic contributions in patients hospitalized with COVID-19. Nat Commun. 2021 Jan 20;12(1):536.
  4. Torkamani A, Erion G, Wang J, et al. An evaluation of polygenic risk scores for predicting breast cancer. Breast Cancer Res Treat. 2019 Apr;175(2):493-503.
  5. Mars N, Koskela JT, Ripatti P, Kiiskinen T TJ, Havulinna AS, Lindbohm JV, Ahola-Olli A, Kurki M, Karjalainen J, Palta P, FinnGen, Neale B, Daly M, Salomaa V, Palotie A, Collins F, Samani N, Ripatti S. Polygenic and clinical risk scores and their impact on age at onset and prediction of cardiometabolic diseases and common cancers. Nat Med. 2020 Nov;26(11):1660-1666.