Skip to main content
All Posts By

nashvillebios

Polygenic risk score

The Role of Polygenic Risk Scores in Clinical Genomics

By Clinical Genomics

Introduction

We were promised the end to genetic diseases. All we needed to do was unlock the human genome. Unfortunately, life has a way of being more complicated than we expect. It turned out that many genetic disorders are the result of the interplay between multiple genetic factors. This set off the need for improved analytical tools to analyze human genetics that could interrogate the associations of many genetic backgrounds and link them to various diseases. One such technique, the Polygenic Risk Score (PRS), emerged as a powerful tool to quantify the cumulative effects of multiple genetic variants on an individual’s predisposition to a specific disease.

The Evolution of Polygenic Risk Scores

The genesis of PRS can be traced back to the early 2000s when researchers sought to comprehend the collective impact of multiple genetic variants on disease susceptibility. Initially viewed through a biological lens, the focus was on enhancing the prediction of diseases by analyzing subtle genomic variations. Studies concentrated on prevalent yet complex diseases such as diabetes, cardiovascular diseases, and cancer, laying the groundwork for a comprehensive understanding of their genetic architecture.

 

That was until Dr. Sekar Kathiresan showed that the prediction from a PRS was just as clinically useful as a single variant (Khera et al., 2018). Instead of looking at the percent of people with a PRS in each group (with or without a disease), his group could show a much more obvious effect – the difference in risk for people in the groups with the highest and lowest scores. Then, they could say that there was a huge difference in risk for these two edges of the population.

 

In the initial stages, PRSs consisted of only the most statistically significant variants from genome-wide association studies. Geneticists often added up the quantity of risk variants without giving them a weight for how much of an impact they had on whether someone would get a disease. Refining these scores led scientists to challenge arbitrary risk cutoffs and advocate for the inclusion of all variants to maximize statistical power (based on the assumption that, on average, variants that have no effect are evenly distributed to appear positively or negatively correlated to the trait). However, proximity of variants on a chromosome presented another challenge. If variants were closer together on a chromosome, they would be less likely to be separated during recombination (Linkage Disequilibrium). This would result in them carrying the signal of something that had a true effect, potentially leading to an overcounting of that signal.

 

To deal with this, geneticists used tools to remove signals within a specified block unless their correlation with the strongest signal fell below a threshold. One of the first packages, PRSice (Choi & O’Reilly, 2019), used an approach called Pruning and Thresholding. Scientists would choose a block size, say, 200,000 base pairs. A program would go through and slide that block along the genome. If there was more than a single signal in that block, the program would remove (or “prune”) all but the strongest signal unless the variant had a smaller correlation with the strongest signal than the “threshold”. The result was that in a region with many different variants that affected the risk of a disease, but which were still a bit correlated, signal could be lost.

 

Criticism from biostatisticians prompted a shift towards a Bayesian approach, reducing over-counting while better accounting for partially independent signals. Implementation was challenged by the extensive computational resources needed to update the signal at each genetic location based on linkage disequilibrium of the surrounding SNPs. One program, called PRS-CS (Ge et al., 2019), implemented a method that could apply changes to a whole linkage block at once, addressing both the geneticist demand for a good system that can provide results using the computation tools we have and the biostatistician demand for accuracy and retained information.

 

Despite these advancements, accuracy challenges persisted, particularly when applying scoring systems across populations with different genetic ancestries. It turned out Linkage Disequilibrium was a pervasive problem. The patterns of Linkage Disequilibrium are different in people with different genetic ancestries. In fact, even statistics about the patterns themselves, like how big an average block size is, are different. Recognizing the need for improvement, ongoing efforts in refining PRSs aim to address these challenges, paving the way for more accurate and reliable applications. As researchers delve deeper into these complexities, the evolving landscape of PRSs continues to shape the future of clinical research.

Polygenic Risk Scores in Clinical Research Settings

To harness the full potential of PRS in clinical practice, a crucial shift is needed—from population-level insights to personalized predictions for individual patients. This transformation involves converting relative risks, which compare individuals across the PRS spectrum with a baseline group, into absolute risks for the specific disease (Lewis & Vassos, 2020). The current emphasis is on identifying individuals with a high genetic predisposition to disease, forming the foundation for effective risk stratification. This information guides decisions related to participation in screening programs, lifestyle modifications, or preventive treatments when deemed suitable.

 

In practical applications, PRS demonstrates promise in patient populations with a high likelihood of disease. Consider a recent study in an East Asian population, where researchers developed a PRS for Coronary Artery Disease (CAD) using 540 genetic variants (Lu et al., 2022). Tested on 41,271 individuals, the top 20% had a three-fold higher risk of CAD compared to the bottom 20%, with lifetime risks of 15.9% and 5.8%, respectively. Adding PRS to clinical risk assessment slightly improved accuracy. Notably, individuals with intermediate clinical risk and high PRS reached risk levels similar to high clinical risk individuals with intermediate PRS, indicating the potential of PRS to refine risk assessment and identify those requiring targeted interventions for CAD.

 

Another application of PRS lies in improving screening for individuals with major disease risk alleles (Roberts et al., 2023). A recent breast cancer risk assessment study explored pathogenic variants in high and moderate-risk genes (Gao et al., 2021). Over 95% of BRCA1, BRCA2, and PALB2 carriers had a lifetime breast cancer risk exceeding 20%. Conversely, integrating PRS identified over 30% of CHEK2 and almost half of ATM carriers below the 20% threshold. Indeed, a similar result was found in a separate study when researchers investigated men with high blood levels of prostate-specific antigen (PSA). 

 

This trend extends to other diseases, such as prostate cancer, where a separate investigation focused on men with elevated levels of prostate-specific antigen (PSA) (Shi et al., 2023). Through the application of PRS, researchers pinpointed over 100 genetic variations linked to increased PSA levels. Ordinarily, such elevated PSA levels would prompt prostate biopsies to assess potential prostate cancer. By incorporating PRS into the screening process, doctors could have accounted for the natural variation in PSA level and prevent unnecessary escalation of clinical care. These two studies suggest that PRS integration into health screening enhances accuracy, preventing unnecessary tests and enabling more personalized risk management.

 

In the realm of pharmacogenetics, efforts to optimize treatment responses continue. While progress has been made in identifying rare high-risk variants linked to adverse drug events, predicting treatment effectiveness remains challenging. The evolving role of PRS in treatment response is particularly evident in statin use for reducing initial coronary events. In a real-world cohort without prior myocardial infarction, an investigation revealed that statin effectiveness varied based on CHD PRSs, with the highest impact in the high-risk group, intermediate in the intermediate-risk group, and the smallest effect in the low-risk group (Oni-Orisan et al., 2022). Post-hoc analyses like this for therapeutics could potentially allow for more targeted enrollment for clinical trial design, substantially reducing the number of participants needed to demonstrate trial efficacy (Fahed et al., 2022).

Conclusion

As the field of genetics continues to advance, PRSs emerge as a potent tool with the potential to aid clinical research. Validated PRSs show promise in enhancing the design and execution of clinical trials, refining disease screening, and developing personalized treatment strategies to improve the overall health and well-being of patients. However, it’s crucial to acknowledge that the majority of PRS studies heavily rely on biased datasets of European ancestry. To refine and improve PRS, a comprehensive understanding of population genetic traits for people of all backgrounds, such as linkage disequilibrium, is essential. Moving forward, the integration of PRS into clinical applications must prioritize datasets with diverse ancestry to ensure equitable and effective utilization across all patient backgrounds. As research in this field progresses, the incorporation of PRS is poised to become an indispensable tool for expediting the development of safer and more efficacious therapeutics.

 

References

Choi, S. W., & O’Reilly, P. F. (2019). PRSice-2: Polygenic Risk Score software for biobank-scale data. GigaScience, 8(7). https://doi.org/10.1093/gigascience/giz082

 

Fahed, A. C., Philippakis, A. A., & Khera, A. V. (2022). The potential of polygenic scores to improve cost and efficiency of clinical trials. Nature Communications, 13(1), 2922. https://doi.org/10.1038/s41467-022-30675-z

 

Gao, C., Polley, E. C., Hart, S. N., Huang, H., Hu, C., Gnanaolivu, R., Lilyquist, J., Boddicker, N. J., Na, J., Ambrosone, C. B., Auer, P. L., Bernstein, L., Burnside, E. S., Eliassen, A. H., Gaudet, M. M., Haiman, C., Hunter, D. J., Jacobs, E. J., John, E. M., … Kraft, P. (2021). Risk of Breast Cancer Among Carriers of Pathogenic Variants in Breast Cancer Predisposition Genes Varies by Polygenic Risk Score. Journal of Clinical Oncology : Official Journal of the American Society of Clinical Oncology, 39(23), 2564–2573. https://doi.org/10.1200/JCO.20.01992

 

Ge, T., Chen, C.-Y., Ni, Y., Feng, Y.-C. A., & Smoller, J. W. (2019). Polygenic prediction via Bayesian regression and continuous shrinkage priors. Nature Communications, 10(1), 1776. https://doi.org/10.1038/s41467-019-09718-5

 

Khera, A. V., Chaffin, M., Aragam, K. G., Haas, M. E., Roselli, C., Choi, S. H., Natarajan, P., Lander, E. S., Lubitz, S. A., Ellinor, P. T., & Kathiresan, S. (2018). Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nature Genetics, 50(9), 1219–1224. https://doi.org/10.1038/s41588-018-0183-z

 

Lewis, C. M., & Vassos, E. (2020). Polygenic risk scores: from research tools to clinical instruments. Genome Medicine, 12(1), 44. https://doi.org/10.1186/s13073-020-00742-5

 

Lu, X., Liu, Z., Cui, Q., Liu, F., Li, J., Niu, X., Shen, C., Hu, D., Huang, K., Chen, J., Xing, X., Zhao, Y., Lu, F., Liu, X., Cao, J., Chen, S., Ma, H., Yu, L., Wu, X., … Gu, D. (2022). A polygenic risk score improves risk stratification of coronary artery disease: a large-scale prospective Chinese cohort study. European Heart Journal, 43(18), 1702–1711. https://doi.org/10.1093/eurheartj/ehac093

 

Oni-Orisan, A., Haldar, T., Cayabyab, M. A. S., Ranatunga, D. K., Hoffmann, T. J., Iribarren, C., Krauss, R. M., & Risch, N. (2022). Polygenic Risk Score and Statin Relative Risk Reduction for Primary Prevention of Myocardial Infarction in a Real-World Population. Clinical Pharmacology and Therapeutics, 112(5), 1070–1078. https://doi.org/10.1002/cpt.2715

 

Roberts, E., Howell, S., & Evans, D. G. (2023). Polygenic risk scores and breast cancer risk prediction. Breast (Edinburgh, Scotland), 67, 71–77. https://doi.org/10.1016/j.breast.2023.01.003

 

Shi, M., Shelley, J. P., Schaffer, K. R., Tosoian, J. J., Bagheri, M., Witte, J. S., Kachuri, L., & Mosley, J. D. (2023). Clinical consequences of a genetic predisposition toward higher benign prostate-specific antigen levels. EBioMedicine, 97, 104838. https://doi.org/10.1016/j.ebiom.2023.104838

drug response data

Bridging the Gap: How AI Companies in the TechBio Space are Revolutionizing Biopharma using Genomics and Drug Response Data

By AI No Comments

Introduction

Innovations in Artificial Intelligence (AI) have propelled pharmaceutical companies to revolutionize their approaches to designing, testing, and bringing precision medicine and healthcare solutions to the market. Two key elements in advancing precision medicine include early disease detection and understanding drug responders within distinct populations. By leveraging genomics and clinical notes, AI companies, specifically in the TechBio space, are transforming the way biopharma industries identify, understand, and cater to individuals rather than whole populations.

The Challenge: Precision Medicine and Drug Response

Traditional drug development methods often analyze the success of a drug treatment as its effect on a patient population, leading to highly variable outcomes and adverse effects among individual patients. This is despite the fact that for many diseases the underlying mechanisms driving symptoms can be quite different from person to person. This lack of individualization in treatment can hinder therapeutic efficacy at the group level despite effectiveness for certain individuals. If we hope to accelerate drug development to get cures in the hands of people faster, future research needs intelligent, cost-effective methods to stratify patients based on the contribution of different disease mechanisms and drug processing capabilities. AI companies are helping biopharma address this challenge by incorporating genomics and insights garnered from those individual’s de-identified patient clinical charts in a systematic way.

Genomics: The Blueprint of Personalization

The genomic revolution has undoubtedly paved the way for precision medicine in Biopharma. By analyzing an individual’s genetic data, scientists can identify variations that may influence drug metabolism and response. This approach has already proven highly effective, particularly in the case of breast cancer patients. In some instances of breast cancer, there is an overexpression of the HER2/neu protein (Gutierrez & Schiff, 2011). When genomic markers for this overexpression are identified, anti-HER2 antibodies can be incorporated into the treatment regimen, significantly enhancing survival rates. AI companies are at the forefront of continuing this research by utilizing genomics for the creation of genetic sub-groups essential for biomarker discovery and predicting the most effective drug treatments for individual patients (Quazi, 2022).

Disease Detection and Monitoring with AI-Enhanced Biomarker Research

Early detection and monitoring of disease progression are paramount for improving patient survival rates. Traditionally, biomarker research has focused on identifying individual molecules or transcripts that can serve as early indicators of future severe illness. However, the field is evolving beyond the notion of a single-molecule biomarker diagnostic. Instead, it is turning to AI to examine the relationships between molecules and transcripts, offering a more comprehensive approach to identifying the onset of significant diseases (Vazquez-Levin et al., 2023). Over the past decade, cancer research and clinical decision-making have undergone a significant transformation, shifting from qualitative data to a wealth of quantitative digital information.

Universities and clinical institutions globally have contributed a vast trove of biomarkers and imaging data. This extensive dataset encompasses insights from genomics, proteomics, metabolomics, and various omics disciplines, as well as inputs from oncology clinics, epidemiology, and medical imaging. AI, uniquely positioned to integrate this diverse information, holds the potential to spearhead the development of pioneering predictive models for drug responses, paving the way for groundbreaking advancements in disease diagnosis, treatment prediction, and overall decision-making concerning novel therapies. 

With growing collections of data, it is becoming easier to model how a drug will shift an individual’s biology for worse or better. A recent example of this modelling is in the Cancer Patient Digital Twin (CPDT) project, where, the collection of multimodal temporal data from cancer patients can be employed to build a Digital Twin (a virtual replica of a patient’s biological processes and health status), allowing for in silico experimentation, which may guide testing, treatment, or decision points (Stahlberg et al., 2022).

One example is how the detection of metastatic disease over time could be improved from radiology reports. Researchers exposed prediction models to historical information using Natural Language Processing (NLP) (Batch et al., 2022). The authors were able to extract and encode relevant features from medical text reports, and use these features to develop, train, and validate models. Over 700 thousand radiology reports were used for model development to predict the presence of metastatic disease. Results from this study suggest that NLP models can extract cancer progression patterns from multiple consecutive reports and predict the presence of metastatic disease in multiple organs with higher performance than previous analytical techniques. Early knowledge of disease states or disease risk could lead to revised risk:benefit assessments for treatments and testing, potentially influencing patients’ choices. As a result, patients with otherwise comparable profiles may opt for treatments or tests they would not have otherwise considered. Even in cases where we do not have good biomarkers for disease (for example, Alzheimer’s disease, where most of the biomarkers are quite invasive to collect), knowing that a person has a higher disease risk earlier can enable important research that can lead to better biomarkers and, ultimately, better treatments.     

AI-Driven Pharmacogenomics: Revolutionizing Precision Medicine and Clinical Trials

While traditional approaches have paved the way for tailored medical treatments, the integration of AI can supercharge these efforts by leveraging an individual’s genetic information. For instance, consider the case of Warfarin, a widely prescribed anticoagulant. Accurate dosing for Warfarin is critical during the start of treatment, which carries higher risks of bleeding and clotting issues. Over decades, dose-response models have been developed to better understand how this drug affects the human body (Holford, 1986). To improve on Warfarin anticoagulation therapy, algorithms have incorporated genetic information to aid in identifying the factors behind clotting issues like Warfarin clearance rate, improving dosage and therapy (Gong et al., 2011). 

Now, with the power of AI, researchers can expedite the personalization of treatments for various disorders and medications, similar to what was accomplished with Warfarin but in a fraction of the time. AI algorithms are starting to analyze an individual’s genetic profile to predict their specific responses to various medications. This approach enables healthcare providers to fine-tune treatment plans, taking into account an individual’s unique genetic makeup, thus optimizing the effectiveness of therapies and reducing the potential for adverse effects. The integration of AI not only enhances the precision of pharmacogenomics but also streamlines the process, ultimately leading to safer and more efficient medical care tailored to each patient’s genetic characteristics.

The ultimate aspiration is to develop a sophisticated AI-driven system that can accurately forecast how each individual will react to specific medications, with the potential to bypass the conventional, time-consuming method of starting with the lowest effective dose and incrementally adjusting it upwards. This trial-and-error approach often leads to prolonged periods of uncertainty and potential adverse side effects for patients. Such advancements not only boost the precision of healthcare but also elevate the overall quality of life for patients seeking rapid relief and improved well-being.

Moreover, the integration of AI in pharmacogenomics has the potential to significantly expedite clinical trial programs. By tailoring medication doses to specific genetic backgrounds, AI aids at all three phases of the clinical trial process. This approach not only streamlines the trials but also offers substantial time and cost savings. The ability to tailor treatments for different genetic subgroups ensures that clinical trials are more efficient, bringing new therapies to market faster and ultimately benefiting patients in need.

Conclusion

The union of genomics and clinical notes, facilitated by AI, is ushering in a new era of precision medicine in biopharma. With the ability to predict individual drug responses and identify targeted therapies, this approach holds immense promise for improved treatment outcomes and a patient-centric view of medicine. As AI companies continue to advance their capabilities, the future of precision medicine for many diseases is looking closer than ever. The key to unlocking its full potential lies in the availability of high-quality data that comprehensively spans the entire patient journey. The integration of such diverse health-related data is central to driving valuable insights for drug development, making AI a driving force in the future of healthcare.

 

Citations:

Batch, K. E., Yue, J., Darcovich, A., Lupton, K., Liu, C. C., Woodlock, D. P., El Amine, M. A. K., Causa-Andrieu, P. I., Gazit, L., Nguyen, G. H., Zulkernine, F., Do, R. K. G., & Simpson, A. L. (2022). Developing a Cancer Digital Twin: Supervised Metastases Detection From Consecutive Structured Radiology Reports. Frontiers in Artificial Intelligence, 5. https://doi.org/10.3389/frai.2022.826402

Gong, I. Y., Schwarz, U. I., Crown, N., Dresser, G. K., Lazo-Langner, A., Zou, G., Roden, D. M., Stein, C. M., Rodger, M., Wells, P. S., Kim, R. B., & Tirona, R. G. (2011). Clinical and genetic determinants of warfarin pharmacokinetics and pharmacodynamics during treatment initiation. PloS One, 6(11), e27808. https://doi.org/10.1371/journal.pone.0027808

Gutierrez, C., & Schiff, R. (2011). HER2: biology, detection, and clinical implications. Archives of Pathology & Laboratory Medicine, 135(1), 55–62. https://doi.org/10.5858/2010-0454-RAR.1

Holford, N. H. (1986). Clinical pharmacokinetics and pharmacodynamics of warfarin. Understanding the dose-effect relationship. Clinical Pharmacokinetics, 11(6), 483–504. https://doi.org/10.2165/00003088-198611060-00005

Quazi, S. (2022). Artificial intelligence and machine learning in precision and genomic medicine. Medical Oncology (Northwood, London, England), 39(8), 120. https://doi.org/10.1007/s12032-022-01711-1

Stahlberg, E. A., Abdel-Rahman, M., Aguilar, B., Asadpoure, A., Beckman, R. A., Borkon, L. L., Bryan, J. N., Cebulla, C. M., Chang, Y. H., Chatterjee, A., Deng, J., Dolatshahi, S., Gevaert, O., Greenspan, E. J., Hao, W., Hernandez-Boussard, T., Jackson, P. R., Kuijjer, M., Lee, A., … Zervantonakis, I. (2022). Exploring approaches for predictive cancer patient digital twins: Opportunities for collaboration and innovation. Frontiers in Digital Health, 4, 1007784. https://doi.org/10.3389/fdgth.2022.1007784

Vazquez-Levin, M. H., Reventos, J., & Zaki, G. (2023). Editorial: Artificial intelligence: A step forward in biomarker discovery and integration towards improved cancer diagnosis and treatment. Frontiers in Oncology, 13, 1161118. https://doi.org/10.3389/fonc.2023.1161118

EHR Data

Claims Data vs EHRs: Distinct but United in Real-World Research

By EHR

In healthcare research, real-world data from patients is invaluable for gaining insights into disease patterns, treatment effectiveness, and outcomes. Two major sources of real-world data are claims data and electronic health records (EHRs). Both data types have distinct advantages and limitations that impact their utility for different research applications. This article examines the key differences between claims data and EHR data and how researchers can leverage both data sources to answer critical healthcare questions.

 

What is Claims Data?

Health insurance claims data contains information submitted by healthcare providers to payers to receive reimbursement for services rendered to patients. Claims data includes demographic details about the patient such as age, gender, location, insurance details, diagnosis codes, procedure codes, prescription details, costs and reimbursement information. Claims data provides a longitudinal view of a patient’s interactions across healthcare systems, as it captures data each time a claim is filed over months or years of care (Pivovarov et al., 2019).

Large claims clearinghouses aggregate data from millions of patients across different payers, providing massive real-world datasets. For example, IBM MarketScan databases contain claims information on over 240 million US patients collected from employers, health plans and government health programs (IBM Watson Health, 2022). Other major claims aggregators include Optum Clinformatics, Premier Healthcare Database and PharMetrics Plus.

 

Key Details Captured in Claims Data

  • Patient demographics – age, gender, location, insurance details
  • Diagnoses – ICD diagnosis codes
  • Procedures – CPT and HCPCS procedure codes
  • Medications – NDC codes, dose, number of prescription refills
  • Costs – total and itemized costs, amount paid by payer and patient responsibility

Claims data is extremely valuable for comparative effectiveness research, pharmacoepidemiology, health economics and outcomes research (Berger et al., 2017). The large sample sizes and longitudinal view make claims databases ideal for studying disease incidence, treatment patterns, medication adherence, healthcare costs and utilization across different patient demographics and      therapeutic areas.

 

Limitations of Claims Data

While claims data offers unparalleled scale and longitudinal perspective, researchers must be aware of its limitations:

  • Diagnosis codes are not indicative of a confirmatory diagnosis; for example, a provider might submit a claim with a diagnosis code that is being considered during a diagnostic workup. 
  • Diagnosis and procedure codes may be inaccurate or incomplete if providers submit improper codes. Important clinical details are missing.
  • Prescription records lack information about whether the medication was taken as directed or refilled properly.
  • Available data elements are restricted to what is required for reimbursement. No additional clinical context is provided.
  • Inability to link family members or track individuals who change payers over time.
  • Variable data quality and completeness across different claims sources.
  • Biased sampling based on specific payer population characteristics. May not represent the general population.

Despite these limitations, claims data remains highly useful for epidemiologic studies, health economics research, population health analyses and other applications where large sample sizes are critical. Researchers should account for the nuances of claims data during study design and analysis.

 

What are Electronic Health Records?

Electronic health records (EHRs) are a digital documentation of patient health information generated throughout clinical care. EHRs are maintained by healthcare organizations and contain various data elements documenting patient encounters, including (Hersh et al., 2013):

  • Demographics – age, gender, race, ethnicity, language
  • Medical history – conditions, diagnoses, allergies, immunizations, procedures
  • Medications – prescriptions, dosing instructions
  • Vital signs – blood pressure, heart rate, weight, height
  • Lab test results
  • Radiology images
  • Clinical notes – physician progress notes, discharge summaries

A key advantage of EHR data is its rich clinical context. While claims data only captures billing codes, EHRs include detailed narratives, quantitative measures, images and comprehensive documentation of each patient visit. This facilitates better understanding of disease presentation & progression, treatment rationale & response and patient complexity.

EHR databases aggregate records across large healthcare networks to compile real-world data on millions of patients. For instance, Vanderbilt University Medical Center’s Synthetic Derivative database contains de-identified medical records for over 3.7 million subjects and their BioVU® database contains over 310,000 DNA samples linked to de-identified medical records for genomics research (Roden et al., 2008).

EHR Data

 

Benefits of EHR Data

EHR data enables researchers to (Cowie et al., 2017):

  • Obtain granular clinical details beyond billing codes
  • Review physician notes and narratives for patient context
  • Link lab results, pathology reports, radiology images for additional phenotyping
  • Study unstructured data through natural language processing
  • Identify patient cohorts based on complex inclusion/exclusion criteria
  • Examine longitudinal disease patterns and treatment journeys

EHR data yields insights unattainable through claims data alone. The rich clinical details enable researchers to understand nuances in patient populations, disease manifestation and therapy response.

 

Challenges with EHR Data

While valued for its clinical context, EHR data also has some inherent limitations:

  • Incomplete or missing records if providers fail to properly document encounters
  • Incomplete records if patient receives care at multiple, unlinked healthcare networks
  • Inconsistent use of structured fields vs free text notes across systems
  • Lack of national standards in data formats, terminologies and definitions
  • Biased datasets dependent on specific health system patient population
  • Difficulty normalizing data across disparate EHR systems
  • Requires data science skills to analyze unstructured notes and documents
  • Requires clinical background to appropriately interpret unstructured notes and documents
  • More resource intensive for data extraction and processing compared to claims data

EHR data analysis requires specialized skills and infrastructure, especially to interpret unstructured data. Despite limitations, EHRs remain an invaluable data source on their own or as complements to other data sources like claims for comprehensive real-world evidence generation.

 

Integrating Claims and EHR Datasets

Given the complementary strengths of claims data and EHRs, there is significant value in integrating these datasets to conduct robust real-world studies. This can be accomplished by (Maro et al., 2019):

  • Linking claims and EHR data at the patient level via unique identifiers
  • Building cohorts based on diagnosis codes from claims data, then reviewing clinical data for each patient in the EHR
  • Using natural language processing on EHR notes to extract additional details not available in claims
  • Applying claims analysis algorithms on EHR data to identify lapses in care, adverse events, etc.
  • Incorporating prescription fills from claims with medication orders in EHRs to assess adherence
  • Using cost data from claims combined with clinical data for health economic studies

Major research networks like PCORnet have developed infrastructure to integrate claims and EHR data to support large-scale patient-centered outcomes research. When thoughtfully combined, these complementary data sources enable multifaceted real-world studies not possible using either source alone.

Claims data and EHRs both provide invaluable real-world evidence on patient populations, but have distinct strengths and limitations. Claims data allows longitudinal analysis of diagnosis, procedure and prescription patterns at scale, but lacks clinical granularity. EHRs provide rich clinical context like physician notes, lab results and images, but lack continuity across health systems and data standardization. By integrating these sources, researchers can conduct robust real-world studies leveraging the advantages of both datasets. Careful consideration of the nuances of each data type allows generation of comprehensive real-world evidence to inform healthcare decisions and improve patient outcomes.

At NashBio, we use EHR data for most of our analytic activities because of its depth and additional clinical context, which helps us build the highest fidelity study populations for our clients.

 

References

Berger, M. L., Sox, H., Willke, R. J., Brixner, D. L., Eichler, H.-G., Goettsch, W., … Schneeweiss, S. (2017). Good practices for real-world data studies of treatment and/or comparative effectiveness: Recommendations from the joint ISPOR-ISPE Special Task Force on real-world evidence in health care decision making. Pharmacoepidemiology and Drug Safety, 26(9), 1033–1039. https://doi.org/10.1002/pds.4297

Cowie, M. R., Blomster, J. I., Curtis, L. H., Duclaux, S., Ford, I., Fritz, F., … Zalewski, A. (2017). Electronic health records to facilitate clinical research. Clinical Research in Cardiology, 106(1), 1–9. https://doi.org/10.1007/s00392-016-1025-6

Hersh, W. R., Weiner, M. G., Embi, P. J., Logan, J. R., Payne, P. R. O., Bernstam, E. V., Lehmann, H. P., Hripcsak, G., Hartzog, T. H., Cimino, J. J., & Saltz, J. H. (2013). Caveats for the use of operational electronic health record data in comparative effectiveness research. Medical care, 51(8 0 3), S30–S37. https://doi.org/10.1097/MLR.0b013e31829b1dbd

IBM Watson Health. (2022). IBM MarketScan Databases. https://www.ibm.com/products/marketscan-research-databases

Maro, J. C., Platt, R., Holmes, J. H., Stang, P. E., Steiner, J. F., & Douglas, M. P. (2019). Design of a study to evaluate the comparative effectiveness of analytical methods to identify patients with irritable bowel syndrome using administrative claims data linked to electronic medical records. Pharmacoepidemiology and Drug Safety, 28(2), 149–157. https://doi.org/10.1002/pds.4698

Pivovarov, R., Albers, D. J., Hripcsak, G., Sepulveda, J. L., & Elhadad, N. (2019). Temporal trends of hemoglobin A1c testing. Journal of the American Medical Informatics Association, 26(1), 41–48. https://doi.org/10.1093/jamia/ocy137

Roden, D. M., Pulley, J. M., Basford, M. A., Bernard, G. R., Clayton, E. W., Balser, J. R., & Masys, D. R. (2008). Development of a large-scale de-identified DNA biobank to enable personalized medicine. Clinical pharmacology and therapeutics, 84(3), 362–369. https://doi.org/10.1038/clpt.2008.89