Key Takeaways

  • Conventional oncology datasets treat diagnosis as the starting line, missing the pre-diagnostic period when routine biomarkers may already encode signals of emerging disease.
  • Evaluating biomarkers over time reveals patterns that single measurements miss, as studies demonstrate that routine laboratory markers deviate from baseline years before metastatic cancer diagnoses.
  • Rigorous longitudinal oncology research requires anchoring clinical data to confirmed diagnostic endpoints, linking serial laboratory measurements with cancer registry outcomes.
  • NashBio’s platform integrates longitudinal health records with registry-confirmed cancer diagnoses and outcomes, enabling analyses that span the complete patient journey, from pre-diagnostic care through treatment and follow-up.

Introduction

Cancer does not begin at diagnosis; it develops gradually through molecular and physiological changes that occur long before clinical detection. Yet the datasets used most in oncology research are structured as if diagnosis is the starting point. Most oncology real-world data (RWD) platforms index patients at or near the point of cancer detection, capturing treatment patterns and post-diagnosis outcomes in detail while offering little visibility into the preceding clinical history. The result is a systematic gap that limits our understanding of early disease progression, delays the identification of high-risk patients, and narrows the window for therapeutic intervention.

The clinical record that accumulates during routine primary and specialty care, like serial laboratory values, screening results, and metabolic panels, contains pre-diagnostic signals typically unavailable in conventional oncology datasets. A patient’s elevated protein trajectory over a decade, a shifting inflammatory marker profile, or a subtle metabolic trend may carry predictive value that single measurements miss. Closing the pre-diagnosis gap requires a data infrastructure that links deep, longitudinal health records to rigorous, registry-confirmed cancer endpoints. Only with this foundation can researchers reconstruct the full patient journey from before diagnosis through treatment and long-term surveillance.

The Clinical Limitation of Oncology Snapshots

Many oncology RWD platforms are sourced around specialty oncology networks that, by design, index patients at or after cancer diagnosis. While these datasets are invaluable for understanding treatment patterns and post-diagnosis outcomes, the pre-diagnosis period remains a blind spot.

This disconnected approach makes it difficult to characterize baseline health status, metabolic trajectory, or early disease activity. Longitudinal trends in routine biomarkers such as hemoglobin (Boennelykke et al., 2022), lipid panels (Neshat et al., 2022), and prostate-specific antigen (PSA) (Eggener et al., 2020) can carry meaningful pre-diagnostic signal, but none of these trajectories are visible when the dataset begins at biopsy confirmation. This gap is a significant methodological limitation for researchers studying early disease progression, therapeutic targets, or cancer risk stratification.

Reconstructing the “Before”: Pre-Diagnosis Biomarker Trajectories

Routine clinical care generates longitudinal biomarker data through standard laboratory panels, imaging, and screening tests. When linked to confirmed cancer outcomes, these measurements offer a window into pre-clinical disease activity that single time-point analyses cannot provide.

The predictive power of longitudinal biomarker tracking extends across multiple analyte classes. Data analyzed from a multi-year cohort of healthy adults with serial measurements found that laboratory markers, like CEACAM5, CALCA, DLK1, and ERBB2, showed detectable deviations from baseline years before metastatic cancer diagnoses (Magis et al., 2020). These findings suggest the broader potential of evaluating biomarkers over time. PSA also exemplifies this principle in a clinical setting where it has direct diagnostic relevance. Rather than relying on a single absolute PSA threshold to trigger concern, systematic evaluation of baseline PSA provides substantial predictive information for long-term risk (Preston et al., 2016; Vickers et al., 2013).

Grounding the Timeline in Rigorous Clinical Data

Studying pre-diagnosis biomarker trends at population scale requires both longitudinal health records spanning routine care, and pathologically-confirmed diagnostic endpoints against which those records can be anchored. The importance of combining rigorous diagnostic validation with longitudinal follow-up has been demonstrated in landmark prostate cancer studies such as the CEASAR (Comparative Effectiveness Analysis of Surgery and Radiation) study from Vanderbilt University Medical Center (VUMC) (Barocas et al., 2017). In this study, men with localized prostate cancer were enrolled and followed prospectively over years, capturing detailed clinical outcomes and patient-reported quality of life measures (Barocas et al., 2017).

The NashBio platform extends this principle to real-world data at population scale. By integrating structured laboratory data, including serial PSA measurements, directly with cancer registry-confirmed diagnoses, NashBio datasets enable researchers to reconstruct pre-diagnostic biomarker trajectories across large real-world populations using data generated during routine clinical care.

Looking Ahead: The Power of a Fully Mapped Journey

By following patients from routine screening through diagnosis, treatment, and long-term follow-up, researchers can move beyond studies that start at diagnosis and instead examine how a patient’s full clinical history connects to their outcomes. Drug target identification, real-world treatment response, and survival outcomes can all be evaluated against a complete clinical backdrop.

Because NashBio’s oncology data is refreshed on a biannual basis, biomarker trajectories can be tracked over time alongside evolving treatment histories, enabling ongoing analyses as additional clinical data accrue. As leading oncology research communities call for higher quality real-world evidence built on rigorous data linkage and longitudinal depth (Ramsey et al., 2024), platforms that connect pre-diagnosis biomarker trajectories with registry-confirmed endpoints are well positioned to meet this standard.

In our final Oncology blog post, we will bring these two concepts together, biopsy-proven precision and longitudinal tracking, to officially introduce NashBio’s Clinical Specialty (CS) Oncology, our new integrated dataset built for multimodal oncology research that spans diagnosis, treatment, and outcomes.

References

  • Barocas, D.A., Alvarez, J., Resnick, M.J., Koyama, T., Hoffman, K.E., Tyson, M.D., Conwill, R., McCollum, D., Cooperberg, M.R., Goodman, M., Greenfield, S., Hamilton, A.S., Hashibe, M., Kaplan, S.H., Paddock, L.E., Stroup, A.M., Wu, X.-C., Penson, D.F., 2017. Association Between Radiation Therapy, Surgery, or Observation for Localized Prostate Cancer and Patient-Reported Outcomes After 3 Years. JAMA 317, 1126–1140. https://doi.org/10.1001/jama.2017.1704
  • Boennelykke, A., Jensen, H., Østgård, L.S.G., Falborg, A.Z., Hansen, A.T., Christensen, K.S., Vedsted, P., 2022. Cancer risk in persons with new-onset anaemia: a population-based cohort study in Denmark. BMC Cancer 22, 805. https://doi.org/10.1186/s12885-022-09912-7
  • Eggener, S.E., Rumble, R.B., Armstrong, A.J., Morgan, T.M., Crispino, T., Cornford, P., van der Kwast, T., Grignon, D.J., Rai, A.J., Agarwal, N., Klein, E.A., Den, R.B., Beltran, H., 2020. Molecular Biomarkers in Localized Prostate Cancer: ASCO Guideline. J Clin Oncol 38, 1474–1494. https://doi.org/10.1200/JCO.19.02768
  • Magis, A.T., Rappaport, N., Conomos, M.P., Omenn, G.S., Lovejoy, J.C., Hood, L., Price, N.D., 2020. Untargeted longitudinal analysis of a wellness cohort identifies markers of metastatic cancer years prior to diagnosis. Sci Rep 10, 16275. https://doi.org/10.1038/s41598-020-73451-z
  • Neshat, S., Rezaei, A., Farid, A., Sarallah, R., Javanshir, S., Ahmadian, S., Chatrnour, G., Daneii, P., Heshmat-Ghahdarijani, K., 2022. The tangled web of dyslipidemia and cancer: Is there any association? J Res Med Sci 27, 93. https://doi.org/10.4103/jrms.jrms_267_22
  • Preston, M.A., Batista, J.L., Wilson, K.M., Carlsson, S.V., Gerke, T., Sjoberg, D.D., Dahl, D.M., Sesso, H.D., Feldman, A.S., Gann, P.H., Kibel, A.S., Vickers, A.J., Mucci, L.A., 2016. Baseline Prostate-Specific Antigen Levels in Midlife Predict Lethal Prostate Cancer. J Clin Oncol 34, 2705–2711. https://doi.org/10.1200/JCO.2016.66.7527
  • Ramsey, S.D., Onar-Thomas, A., Wheeler, S.B., 2024. Real-World Database Studies in Oncology: A Call for Standards. J Clin Oncol 42, 977–980. https://doi.org/10.1200/JCO.23.02399
  • Vickers, A.J., Ulmert, D., Sjoberg, D.D., Bennette, C.J., Björk, T., Gerdtsson, A., Manjer, J., Nilsson, P.M., Dahlin, A., Bjartell, A., Scardino, P.T., Lilja, H., 2013. Strategy for detection of prostate cancer based on relation between prostate specific antigen at age 40-55 and long term risk of metastasis: case-control study. BMJ 346, f2023. https://doi.org/10.1136/bmj.f2023