In This Section

AACR Project GENIE: Data

The first set of cancer genomic data aggregated through AACR Project Genomics Evidence Neoplasia Information Exchange (GENIE) was available to the global community in January 2017.  The tenth data set, GENIE 10.0-public, was released in July 2021. With the release of the GENIE 10.0-public data set, the registry now contains nearly 120,000 sequenced samples from more than 111,000 patients, making the AACR Project GENIE registry among the largest fully public cancer genomic data sets released to date.

The combined data set includes de-identified genomic records collected from patients who were treated at each of the consortium’s participating institutions. These data will be released to the public every six months. The public release of the eleventh data set, GENIE 11.0-public, will take place in January 2022.    

With the most recent data release, the registry now contains genomic information from more nearly 18,000 non-small cell lung carcinomas, and nearly 13,000 breast and more than 12,000 colorectal cancers. 

The most commonly mutated genes in the sequenced tumors are TP53, KRAS, and PIK3CA, with CDKN2A and CDKN2B deletions and PDE4DIP and UBR5 amplifications still representing the most common copy number alterations.

For the use of the data, consult the data guide and the release notes.

Users can access the data directly via cbioportal, or download the data directly from Sage Bionetworks. This handy cbioPortal tutorial will help you navigate through the site. Users will need to create an account for either site and agree to the terms of access.

For frequently asked questions, visit our FAQ page

Date Updated: 07/13/2021