Data Repository

Commercial HMCL Phase1 CGH — WARNING NEWER VERSION AVAILABLE - The linked file is a segmentation file that can be directly loaded into the IGV genome browser for visualization. The encoded positions are GRCh37/hg19. The copy number values are smoothed representations from 0-10 with whole numbers reflecting the assumed absolute copy number (ie. 0 represents homozygous deletions and 2 represents a normal copy number state)

HMCL_Agilent_400k_Matrix_All.txt — This file contains the raw probe by probe log2 copy number estimates from the Agilent 400k CGH array

HMCL_All_ADM2_Smooth_Raw_CNVremoved.seg — This is a segmentation file created using Agilent's ADM-2 algorithm. Identified segments contain at least 3 probes, are continuous over a 10kb interval, and result in a log2 change in excess of 0.2. Also, known CNV regions have been removed.

HMCL_All_CBS.seg — This is a segmentation file created using Circular Binary Segmentation (CBS) with the DNACOPY package in R. Known CNV regions are not removed.

HMCL66_Gene_Expression_Counts — UPDATED-V2. This is a matrix text file containing the gene expression levels of each cell line as measured from mRNAseq data. Aligned with Tophat2 and count based expression estimates calculated by HtSeq using Ensembl64 gene models.

HMCL66_Gene_Expression_FPKM — This is a matrix text file containing the gene expression levels of each cell line as measured from mRNAseq data. Aligned with Tophat2 and expression estimates calculated by Cufflinks2 using Ensembl64 gene models.

HMCL66_Transcript_Expression_FPKM — This is a matrix text file containing the gene expression levels of each cell line as measured from mRNAseq data. Aligned with Tophat2 and expression estimates calculated by Cufflinks2 using Ensembl64 gene models.

HMCL69_Preliminary_Mutation_List — This excel file contains all the preliminary mutations we have identified in 69 cell lines tested to date. This is Agilent SureSelect V4+UTR exome captures, aligned with BWA, realigned, recalibrated, duplicates removed, and variants called by samtools. Any variant in 1000G, or NHLBI has been removed. Variants in dbSNP were removed unless the same variant exists in Cosmic.

HtSeq-Stranded Versus Unstranded — Demo File for R graphics examples