Skip to main content

Smarter data that enables precise medical care

Precision medicine, an approach to patient care that is tailored to the individual, is an attainable goal. A key component of precision medicine is a patient’s genes and how their genes influence their risk of disease & progression and their response to treatment. NashBio’s data covers both the phenotypic and the genotypic patient data, enabling the interrogation of these interactions.

Understanding how genes influence disease plays a crucial role in drug discovery and development. By leveraging human genetic evidence, companies can increase their chances of a drug succeeding in clinical trials by 2-7x¹.

Genetic Data by the Numbers

DNA samples from the BioVU® biobank were assayed on Illumina's Expanded Multi-Ethnic Genotyping Array (MEGAEX). MEGAEX covers 2 million variants and was developed to provide extensive genotyping coverage of European, East Asian and South Asian populations. The genotype data is available in PLINK format.

90K

 

patients

12.4 years

 

median length of patient EHR

54 visits

 

median per patient

Ancestry

EHR-Reported Gender

Age at Last EHR

*NashBio acknowledges the complexities surrounding ancestry, genetic ancestry calculations and the use of ancestry in genomic analysis. Here we classify the population into relative majority 1000 Genomes super-groups.

Imputed Data

Less anomalies for enhanced usability

Imputing genomic datasets can greatly increase the number of represented variants and can enhance genomic analyses, such as genome-wide association studies. NashBio has imputed the 90,000-subject MEGAEX dataset using multiple industry-standard pipelines (Michigan, TOPMed) and multiple reference datasets (1000 Genomes, HRC, TOPMed). The imputed datasets include between 30 million and 300 million variants (15x-150x MEGAEX). Imputed datasets are available in PLINK format for EUR and AFR ancestry cohorts.

Interested in learning more about our imputed data? Get the whitepaper here.




    Genomic Sequencing

    NashBio has whole exome sequences (WES) and whole genome sequences (WGS) for a subset of subjects.

    Some of the disease populations include:

    Fatty liver disease (NAFLD/NASH)*
    Type 2 diabetes
    Diabetic nephropathy
    Focal segmental glomerulosclerosis (FSGS)
    Diverse Ancestry Cohort (DAC)
    Additional sequence data will be available in 2024
    *Sequencing performed prior to adoption of MAFLD/MASH terminology.
    All NashBio data modalities are fully normalized and have been cross-referenced to provide a harmonized data experience. NashBio is committed to patient privacy, to learn more see Unwavering Commitment to Patient Privacy.
    ¹REFs: Nelson et al, Nature Genetics, 2015; 47:56-860, King et al, PLoS Genetics, 2019; 15(12), Estrada et al, Nature Communications, 2021; 12(2224), Wang et al, Nature 2021; 597(7877): 527-532