In This Section

BPC CRC v2.0-public

The GENIE BPC CRC v2.0-public dataset contains 1,485 CRC patients from three institutions: MSKCC, DFCI, and VICC.  

Data Access: 

  • A subset of the data is available through cBioPortal. Both the genomic and clinical (phenomic) data can be evaluated in cBioPortal with opportunities for data exploration and visualization using a user-friendly interface.  
  • The complete, post-processed data are available on Synapse.   
  • The second BPC dataset, containing 1,485 colorectal patients, is now available.

See an overview of the data set


What is included in GENIE BPC data?  

  • Genomic data: Clinical-grade next-generation sequencing data for each patient from the GENIE Registry. Genomic profiling was performed between 2014 and 2018.  
  • Treatment Histories: All anti-neoplastic systemic therapies–intravenous and oral chemotherapies–are included in the data set. Dates are provided as intervals from diagnosis to start and stop of each drug. Investigational drugs are masked, no dosing information is included.  
  • PRISSMM™: the BPC CRC dataset uses the PRISSMM™ framework developed at the Dana-Farber Cancer Institute to determine outcomes from retrospective real-world data to ascertain cancer treatment responses in the real world. Additional information can be found in the analytic data guide and information about licensing PRISSMM™ can be obtained by emailing [email protected].
  • Pathologic information: Each pathology specimen from diagnosis through death or last follow-up is curated with specimen type, site, and histology.  
  • Imaging information: Each CT, MRI, PET-CT scan from diagnosis through death or last follow-up is curated for the presence or absence of cancer and an evaluation of whether the cancer was stable, responding, or progressing. These data are used to compute progression-free survival-imaging (PFS-I). Sites of tumor involvement are also recorded.   
  • Medical oncologist’s evaluations: Medical oncology notes (1/month) have been curated to ascertain the presence or absence of cancer and whether the cancer was stable, responding, or progressing. These data are used to compute progression-free survival-medonc (PFS-M) from diagnosis through death or date of last follow-up.   
  • Additional relevant biomarkers: Information about select biomarkers not included on the NGS panels, including PDL1, MSI, and MMR are also curated.  
  • There are no patient (self-) reported outcomes in the data.  
  • CRC cancer diagnosis is considered the index tumor for this patient cohort. There are data about other cancer diagnoses antecedent to the CRC and subsequent to the CRC.  
  • Overall survival is based on death with censoring at date of last contact known alive. Ascertainment of death varies by institution.  
  • Exact date fields have been masked to preserve confidentiality. However, exact date intervals are available and allow for calculation of the time between events, e.g., diagnosis, treatment start, treatment end, PFS-I, PFS-M, OS, etc.   
  • Analytical Data Guide: A more comprehensive overview of the data can be found in the Analytical data guide, and a description and location of the variables collected can be found in the variable synopsis spreadsheet.