In This Section

AACR Project GENIE: Data

The first set of cancer genomic data aggregated through AACR Project Genomics Evidence Neoplasia Information Exchange (GENIE) was available to the global community in January 2017.  The ninth data set, GENIE 9.0-public, was released in February 2021.  With the release of the GENIE 9.0-public data set, GENIE hit another milestone – the registry now contains nearly 113,000 sequenced samples from more than 104,000 patients, making the AACR Project GENIE registry among the largest fully public cancer genomic data sets released to date.

The combined data set includes de-identified genomic records collected from patients who were treated at each of the consortium’s participating institutions.  These data will be released to the public every six months. The public release of the tenth data set, GENIE 10.0-public, will take place in July 2021.    

With the most recent data release, the registry now contains genomic information from more nearly 17,000 non-small cell lung carcinomas, and nearly 12,000 breast and more than 11,000 colorectal cancers. 

The most commonly mutated genes in the sequenced tumors are TP53, KRAS, and PIK3CA, with CDKN2A and CDKN2B deletions and PDE4DIP and UBR5 amplifications now representing the most common copy number alterations.

For the use of the data, consult the data guide and the release notes.

Users can access the data directly via cbioportal, or download the data directly from Sage Bionetworks. This handy cbioPortal tutorial will help you navigate through the site. Users will need to create an account for either site and agree to the terms of access.

For frequently asked questions, visit our FAQ page

Date Updated: 03/23/2021