National Cancer Institute (NCI) Clinical Proteomic Tumor Analysis Consortium (CPTAC) scientists have released a dataset of proteins and phosphopeptides identified through deep proteomic and phosphoproteomic analysis of breast tumor samples, previously genomically analyzed by The Cancer Genome Atlas (TCGA). This is the largest-ever public dataset of proteins designed to complement deep genomic sequencing data on the same tumor (breast cancer), and is publicly available at the CPTAC Data Portal (data released on February 20, 2014).
Researchers from the Broad Institute of Harvard and Massachusetts Institute of Technology, Fred Hutchinson Cancer Research Center and Washington University of St. Louis worked together to produce this comprehensive dataset. This dataset of clinical tumors includes 105 TCGA breast cancer samples analyzed using iTRAQ protein quantification methods. Samples selected for analysis are from each of the four breast tumor subtypes (Lumina A, Luminal B, Basal-like, HER2-enriched) described in the previous TCGA publication. On average, greater than 8000 proteins and 12000 phosphopeptides are quantified per tumor sample.
This dataset provides researchers the opportunity to develop and test novel proteogenomic integration tools and algorithms to extend our knowledge of the biological underpinnings of cancer. It also represents the second dataset release of proteins comprehensively identified through deep proteomic analysis of tumors previously genomically analyzed by TCGA (colorectal cancer deep proteomic dataset was released on September 4, 2013). The data are embargoed for publications until the publication of the global analysis paper or until May 26, 2015.