### Blog

#### DNA Copy Number and its Effects on Log2 Values and Allelic Ratios at Different Tumor Purities

posted Jan 27, 2018, 10:33 AM by Jonathan Keats   [ updated Jan 29, 2018, 9:20 AM by Keats Lab ]

This is a post that I've been meaning to put together for many years on what log2 values in DNA copy number analysis actually mean.  I'll start by admitting my own error as a post-doc when I assumed incorrectly that log2 of -1 = 1 copy state, 0 = 2 copy state, and 1 = 3 copy state.  Hopefully if you are reading this post you already know that 3 copy number state is actually a log2 of 0.58 and log2 of 1.0 actually represents a 4 copy state.  Over the years I've made quick tables in excel to convince myself and show others what log2 values you expect in theory for each copy number state of a tumor population compared to a normal diploid population.  I've also done it over and over again to establish the expected log2 value if an event was in 100% of the tumor cells but the purity of tumor cells in the tested population is not 100%.  So I've finally broken down and put it all together once and for all so I never have to do it in excel again.  Thus this post is mostly for my own sanity and time management but hopefully its useful to someone else out there as well.

Outlining the Problem
Whenever we are looking at tumor samples and trying to assess their copy number state we often compare the tumor sample to a normal sample. Now this might use any number of methods from CGH or SNP microarrays to exome or genome sequencing assays.  But for the most part we have a population of tumor cells and a population of normal cells to compare.  The two most common approaches are to compare the signal or counts from the tumor to the normal and then log2 transforming the ratio (ie. value = log2 ( Tumor / Normal ) ) or to look at the B, or alternate allele, frequency in an absolute allelic percentage.  The absolute allele frequency is calculated by first identifying the heterozygous positions in the normal and then subtracting a normal heterozygous frequency of 0.5 (50%) from the observed frequency at each matching position in the tumor (ie. absolute allele frequency = abs ( Observed B allele frequency - 0.5 ) ).  This absolute allele frequency is used to correct for the difference in allele frequencies observed across a chromosome, haplotype, to correct for the fact that the reference allele is equally distributed between both the maternal and paternal derived chromosomes, haplotypes, which often causes a mixture of two different B allele frequencies at copy number states without an equal mixture of A and B alleles (Figure 1).

Figure 1 - Example Copy Number States and Associated Allele Frequencies
Representative cells are shown for a normal diploid cell and tumor cells with copy number states from 0 to 4. For simplicity the two possible homozygous allele states for the 4 copy state are not shown.  In all cases the copy number of a region or allele frequency is represented by the vertical blue or purple bars.  Loss of heterozygosity cells are noted as is the unique situation of copy-neutral LOH that can be observed in the two copy number state.  Personally I would never call this uniparental disomy (UPD) as it is a somatic process.

Basics of Raw Copy Number Estimation
To establish the log2 copy number value for each potential state my standard practice is to build a table assuming there is 100 cells in the tumor population and each chromosome counts for 1 signal, or count, value.  Then for each chromosome lost or gained I alter the tumor signal/count proportionally. These theoretical tables and resulting distributions are outline in Table 1 and Figure 2.

Table 1 - Theoretical Log2 Values for Distinct Copy Number States
 CN State Normal Tumor T/N Ratio Log2 0 200 1* 0.005* -5* 1 200 100 0.5 -1.00 2 200 200 1.0 0.00 3 200 300 1.5 0.58 4 200 400 2.0 1.00
* The zero copy number state truly has a signal/count of 0 but since it would result in an infinity value for presentation purposes I show a single signal/count value

Figure 2 - Theoretical Log2 Values for Distinct Copy Number States in Pure Tumor Populations Compared to a Normal Diploid Control
Horizontal dotted dark mustard lines define our established mean+/-5SD cutoff used in the MMRF CoMMpass study and the vertical light green dotted line indicates the expected value for a normal diploid tumor population.  In both plots the homozygous deletion, bi-allelic deletion, 0 copy number state that actually has a log2 value of infinity is hard coded to -5 which is in the functional range I've personally encountered over the years with aCGH or NGS in the -4 to -6 range A) Theoretical log2 values for copy number states 0-50. B) Theoretical log2 values for copy number states 0-10 with the actual expected log2 value noted.

Basics of B Allele Frequency Assessment
When used alone B allele frequency has a limited utility as the observed B allele ratio and absolute frequency can be associated with multiple copy number states (Table 2).  But when integrated with the know copy number state it can be very valuable in determining the allelic state and can be leveraged to calculate the purity of a sample. So to establish the absolute B allele frequency I build a table assuming there are 100 tumor cells in the population and then assign 1 count to each allele present in the theoretical scenario (Table 2). In the most simplistic scenario you can uses the absolute allele frequency to create and expected distribution of the allele frequency for each copy number state assuming the copy number is only changing for one allele while the other allele stays constant at the normal state of 1 copy (Figure 3).  As illustrated in Table 2, this is a simplification but is also not uncommon at copy number states 1-4 in my personal experience as I'd say the most common alleles observed are AO, OB, AB, AA, BB, AAB, ABB, AABB, AAAB, ABBB.

Table 2 - Absolute B Allele Frequencies for Distinct Copy Number States
 CN State Genotype A AlleleCount B AlleleCount B Allele Ratio Abs B Allele 1 A 100 0 0.00 0.50 1 B 0 100 1.00 0.50 2 AA 200 0 0.00 0.50 2 AB 100 100 0.50 0.00 2 BB 0 200 1.00 0.50 3 AAA 300 0 0.00 0.50 3 AAB 200 100 0.333 0.166 3 ABB 100 200 0.666 0.166 3 BBB 0 300 1.00 0.50 4 AAAA 400 0 0.00 0.50 4 AAAB 300 100 0.25 0.25 4 AABB 200 200 0.50 0.00 4 ABBB 100 300 0.75 0.25 4 BBBB 0 400 1.00 0.50 5 AAAAA 500 0 0.00 0.50 5 AAAAB 400 100 0.20 0.30 5 AAABB 300 200 0.40 0.10 5 AABBB 200 300 0.60 0.10 5 ABBBB 100 400 0.80 0.30 5 BBBBB 0 500 1.00 0.50
weird line (DELETE WHEN DONE)
test
test
Figure 3 - Theoretical B Allele Frequency Assuming the B Allele Count is the Only One Changing

test
test
Fun with Heterogenous Mixtures - Looking Forward to Single Cells!
Test
Figure 4 - Representative 75/25 Mixtures of Tumor and Normal Cells at Different Tumor Copy Number States

Text
Txt
Figure 5 - Mixtures and the Effect on Deletions and LOH Events

Test
Test
Figure 6 - Gains

TEst
Allele Specific Copy Number - Totally Ignored right now

#### AMPure and not-so-simple

posted Jul 8, 2013, 5:48 PM by Kristi Stephenson   [ updated Jul 8, 2013, 5:50 PM ]

#### To TE or not to TE

posted Feb 8, 2013, 2:15 PM by Jonathan Keats   [ updated Feb 8, 2013, 2:17 PM ]

 We run into this all the time that people through around the word TE with know knowledge of the extensive variations that exist.  Also most people don't realize common solutions in various kits are just variations of TE1) Standard 1x TE  =  10 mM Tris-HCl, 1.0 mM EDTA  (pH 8.0, but this should not be assumed)2) Buffer AE = 10 mM Tris-HCl, 0.5 mM EDTA (pH 9.0)        NOTE: Apparently Qiagen puts buffer AE in different kits and they may NOT be the same composition        Values From: DNeasy Blood & Tissue Kit (50) cat#695043) Buffer ATE = 10 mM Tris-HCl, 0.1 mM EDTA, 0.04% Sodium Azide (pH 8.3)        NOTE: Apparently Qiagen puts buffer ATE in different kits and they may NOT be the same composition        Values From: QiaSymphony DNA Midi Kit cat#9312554) Buffer EB = 10 mM Tris-HCl (pH 8.5)        * Common elution buffer in many Qiagen plasmid prep and sample clean-up kits4) DNA Hydration Solution = 10 mM Tris-HCl, 1.0 mM EDTA (pH 7-8)        * Provided in Gentra Puregene kits from Qiagen5) DNA Suspension Buffer = 10 mM Tris-HCl, 0.1 mM EDTA (pH 8.0)        * Recommended by Affymetrix for SNP arrays (basically in standard TE it will not work!)        * Commonly called TElowEA bit of ranting/suggestionsA) Never put DNA in water aloneB) The pH of the solution can effect things like 260/280 values, generally higher pH makes things look betterC) You want to watch the final concentration of EDTA in any enzymatic reaction.  Our recommendations is to use buffers with 0.1 mM EDTAIn our lab all DNA samples are stored in DNA Suspension Buffer (ie. TElowE)

#### GATK - Fun - Confusion - Fear - Frustration - Things to Remember

posted Jan 9, 2013, 12:23 AM by Jonathan Keats

 There are many great things about the GATK software, primarily their excellent level of support, but somethings are a bit irritating like "we change it all the time so check the --list function" which requires you to prep up a complete GATK command which can be difficult when you have no idea what you are doing.So here are the lists I'm always wanting to have on my wall (The Genome Analysis Toolkit (GATK) v2.3-4-g57ea19f)Variant Annotator List of AnnotationsStandard annotations in the list below are marked with a '*'.Available annotations for the VCF INFO field: AlleleBalance BaseCounts *BaseQualityRankSumTest *ChromosomeCounts ClippingRankSumTest *DepthOfCoverage *FisherStrand GCContent *HaplotypeScore HardyWeinberg HomopolymerRun *InbreedingCoeff IndelType LowMQ MVLikelihoodRatio *MappingQualityRankSumTest *MappingQualityZero MappingQualityZeroFraction NBaseCount *QualByDepth *RMSMappingQuality *ReadPosRankSumTest SampleList SnpEff *SpanningDeletions *TandemRepeatAnnotator TechnologyComposition TransmissionDisequilibriumTestAvailable annotations for the VCF FORMAT field: AlleleBalanceBySample *DepthPerAlleleBySample MappingQualityZeroBySampleAvailable classes/groups of annotations: ActiveRegionBasedAnnotation ExperimentalAnnotation RankSumTest RodRequiringAnnotation StandardAnnotation WorkInProgressAnnotation

#### What type of Dynabead do I need?

posted Nov 27, 2012, 12:51 PM by Kristi Stephenson

 If you're like most of us here in the lab, we can never keep track of what the difference is between the different types of Dynabeads. It seems like every new assay is asking for a different "letter" of bead. There's T1, C1, M270, M280, and well... I think that's the end of the list for now. At any rate, I decided to post a quick cheat sheet so it's easy to look it up and remember. So for anyone else out there wondering the same thing... hopefully this will make your life a little easier! For more details, check out the link for the LifeTech product sheet also included below.

#### PCR Purification: AMPure and Simple

posted May 8, 2012, 10:16 PM by David Edwards   [ updated May 9, 2012, 8:55 AM ]

#### Affect of Heating Time on DNA Fragmentation

posted Feb 3, 2012, 1:50 PM by Kim Babos   [ updated Feb 3, 2012, 3:32 PM by Jonathan Keats ]

 After performing a test of heated DNA (95 C for 10 minutes) versus non-heated DNA for Kristi using the ARD_SSC.C8 cell line, we found that heating DEFINITELY denatures the DNA, literally to smithereens.  We decided that I should try to fluctuate the heating times between 0 and 18 minutes and compare those to non-heated DNA.  I have attached an image of the gel results which turned out beautiful I am proud to say!I used the KMS34 cell line for this experiment.  The non-heated DNA stayed at the top of the gel while the other 9 samples showed quite a variety of smear patterns.  As you can see, when the DNA was heated for 18 minutes, it was denatured to the point that it lies between about 500 and 100 base pairs.  On the other end of the spectrum, the DNA that was heated for 2 minutes shows a really nice, long smear from about the 12000 base pair marker down to about 650 or so.  After thinking about the effects heat has to DNA, there are both pros and cons to heat denaturing, and heating to differing times.  If you want to get small fragments of DNA, for instance for use in a CGH protocol, heating for 18 minutes seems to do the trick.  If you want to amplify long-length DNA via PCR, I would definitely say to avoid heating 18 minutes and try to stick to a time around 1 and 4 minutes.  In talking with Jonathan, this image might explain why it tends to be difficult to amplify an DNA sequence of a large size, say 8 kb for example.  Because we are heating the DNA at 94 C for 5 minutes and then proceeding to heat for another 15 or so, it is no wonder that we have a hard time getting many 8 kb amplicons.  They are all shredded to pieces!Just an interesting piece of data we figured out in the Keats lab today.  Ending the week with interesting results is always a treat!Kim

#### Getting the Lab off the Ground

posted Jul 15, 2011, 4:42 PM by Jonathan Keats   [ updated Sep 25, 2011, 11:34 PM ]

 So the lab is now over one year old and finally getting going.  I've had a number of people tell me that if you are going by six months and feel productive by one year you are doing good.  Based on that I'm a bit behind as I only now feel like the lab is starting to really make progress, so that is a bit past the one year mark.  But I really only started doing things in the lab in January when Kristi joined the group so maybe we are ahead of schedule?  The good news is we have a decent bit of funding between my start-up, and two different grants.  I actually had to hire another research associate recently as we just could not keep up.  Then we have a big project ready to launch and when the contract is finally signed I'll be able to add two post-docs to the group.

1-8 of 8