BPC Prostate 1.0 – Public
The Project GENIE BPC Prostate v1.0-public dataset contains 1,116 Prostate cancer patients from 4 institutions: MSK, DFCI, UHN and VICC.
Data Access:
- A subset of the data is available through cBioPortal. Both the genomic and clinical (phenomic) data can be evaluated in cBioPortal with opportunities for data exploration and visualization using a user-friendly interface.
- The complete, post-processed data are available on Synapse.
See an overview of the data set.
What is included in GENIE BPC data?
- Genomic Data: Clinical-grade next-generation sequencing data for each patient from the GENIE Registry. Genomic profiling was performed between 2013 and 2018; patients were aged 37-88 at the time of genomic sequencing.
- Cancer Diagnosis: Prostate cancer diagnosis is considered the index tumor for this patient cohort. There are data about other cancer diagnoses antecedent and subsequent to the prostate cancer.
- Prostate cancer-specific fields: includes medical oncologist assessment of hormone/castrate/androgen-independent status, assessment of change in PSA level, PSA and Testosterone lab values, highest Gleason score recorded on pathology report, margin status on excision, and specialized histologic pattern across all prostate specimens, were also collected.
- Pathologic Information: Each pathology specimen from diagnosis through death or last follow-up is curated with specimen type, site, and histology.
- Treatment Histories: All anti-neoplastic systemic therapies–intravenous and oral chemotherapies–are included in the data set. Dates are provided as intervals from diagnosis to start and stop of each drug. Investigational drugs are masked; no dosing information is included.
- Imaging Information: Each CT, MRI, PET-CT scan from diagnosis through death or last follow-up is curated for the presence or absence of cancer and an evaluation of whether the cancer was stable, responding, or progressing. These data are used to compute progression-free survival-imaging (PFS-I). Sites of tumor involvement are also recorded.
- Medical Oncologist’s Evaluations: Medical oncology notes (1/month) have been curated to ascertain the presence or absence of cancer and whether the cancer was stable, responding, or progressing. These data are used to compute progression-free survival-medonc (PFS-M) from diagnosis through death or date of last follow-up.
- Overall Survival: Overall survival is based on death, with censoring at the date last known alive. Ascertainment of death varies by institution.
- Additional Relevant Biomarkers: Information about select biomarkers not included on the NGS panels, including PSA, Testosterone, PD-L1, MSI and MMR are also curated.
- Patient-Reported Outcomes: No patient-reported outcomes are available in this dataset.
- Date Masking: Exact dates are masked to preserve confidentiality; however, date intervals are available, allowing calculation of event times such as diagnosis, treatment start, treatment end, PFS-I, PFS-M, and OS
- Analytical Data Guide: A more comprehensive overview of the data can be found in the data guide, and a description and location of the variables collected can be found in the variable synopsis spreadsheet.
- Other Resources: There is a dedicated project wiki that describes each of the files.
- Training Videos:
- Demo of GENIE Data on the Synapse and cBioPortal Platforms: here
- BPC- specific cBioPortal video training playlist: here

