​​AACR Project GENIE: Data

The first set of cancer genomic data aggregated through AACR Project Genomics Evidence Neoplasia Information Exchange (GENIE) is available to the global community. The data set includes nearly 19,000 de-identified genomic records collected from patients who were treated at each of the consortium’s participating institutions, making it among the largest fully public cancer genomic data sets released to date. These data will be continuously updated on a quarterly basis.

The release includes data for 59 major cancer types, including data on nearly 3,000 patients with lung cancer, more than 2,000 patients with breast cancer, and more than 2,000 patients with colorectal cancer. For more details about the data, and how to use it, consult the data guide​.

Users can access the data directly via cbioportal, or download the data directly from Sage Bionetworks. Users will need to create an account for either site and agree to the terms of access.

AACR Project GENIE Frequently Asked Questions (FAQs)

If the answer to your question is not found here or in the data guide, email

Q. What percentage of the GENIE samples are also in the TCGA?
A. It is possible that a very small percentage of the samples are also in TCGA (~0.1%).

Q. Will there be therapeutic response data included?
A. Yes; however, these data will only be included on specific subsets of patients and will not be available until the time of relevant study publication.

Q. How do I get access to the data?
A. Go to and request access.

Q. Can I download the data directly?
A. Yes, go to and request access.

Q. Can I use a non-gmail account to login to the cbioportal?
A. Yes. We use Google Accounts to manage access to the data. All Gmail accounts are Google accounts, but is possible to create a Google Account without Gmail, even using your institutional email address. Sign up for an account.

Q. Can I learn more details about the data themselves?
A. Yes, click here to download the data guide.

Q. Can my lab/institution/company sponsor a clinical study using the data?
A. Yes, please email

Q. Can my institution apply to become a participating center?
A. Yes, please email

Q. What is the distribution of cancer types in the dataset?
A. The distribution is different for each release. To see the distribution for a given release, use the Oncotree codes in the clinical file.

Q. What genomic regions are covered by a center's gene panel?
A. All panels are described in the combined BED file, available at

Q. What annotation method was used for the variant data?
A. The data was annotated with VEP, using vcf2maf as a wrapper.

Q. Are the FASTQ or BAM data available?
A. No, only the annotated variant calls are provided.

Q. Will additional data be released?
A. Yes, new data on the current and additional patients will be periodically added; we will notify the community when new releases are available.

Q. What kind of data is available for download?
A. You can download mutation, fusion, and DNA copy-number calls.

Date Updated: 1/30/17