DNA Copy Number and its Effects on Log2 Values and Allelic Ratios at Different Tumor Purities

posted Jan 27, 2018, 10:33 AM by Jonathan Keats   [ updated Jan 29, 2018, 9:20 AM by Keats Lab ]

This is a post that I've been meaning to put together for many years on what log2 values in DNA copy number analysis actually mean.  I'll start by admitting my own error as a post-doc when I assumed incorrectly that log2 of -1 = 1 copy state, 0 = 2 copy state, and 1 = 3 copy state.  Hopefully if you are reading this post you already know that 3 copy number state is actually a log2 of 0.58 and log2 of 1.0 actually represents a 4 copy state.  Over the years I've made quick tables in excel to convince myself and show others what log2 values you expect in theory for each copy number state of a tumor population compared to a normal diploid population.  I've also done it over and over again to establish the expected log2 value if an event was in 100% of the tumor cells but the purity of tumor cells in the tested population is not 100%.  So I've finally broken down and put it all together once and for all so I never have to do it in excel again.  Thus this post is mostly for my own sanity and time management but hopefully its useful to someone else out there as well.

Outlining the Problem
Whenever we are looking at tumor samples and trying to assess their copy number state we often compare the tumor sample to a normal sample. Now this might use any number of methods from CGH or SNP microarrays to exome or genome sequencing assays.  But for the most part we have a population of tumor cells and a population of normal cells to compare.  The two most common approaches are to compare the signal or counts from the tumor to the normal and then log2 transforming the ratio (ie. value = log2 ( Tumor / Normal ) ) or to look at the B, or alternate allele, frequency in an absolute allelic percentage.  The absolute allele frequency is calculated by first identifying the heterozygous positions in the normal and then subtracting a normal heterozygous frequency of 0.5 (50%) from the observed frequency at each matching position in the tumor (ie. absolute allele frequency = abs ( Observed B allele frequency - 0.5 ) ).  This absolute allele frequency is used to correct for the difference in allele frequencies observed across a chromosome, haplotype, to correct for the fact that the reference allele is equally distributed between both the maternal and paternal derived chromosomes, haplotypes, which often causes a mixture of two different B allele frequencies at copy number states without an equal mixture of A and B alleles (Figure 1). 

Figure 1 - Example Copy Number States and Associated Allele Frequencies
Representative cells are shown for a normal diploid cell and tumor cells with copy number states from 0 to 4. For simplicity the two possible homozygous allele states for the 4 copy state are not shown.  In all cases the copy number of a region or allele frequency is represented by the vertical blue or purple bars.  Loss of heterozygosity cells are noted as is the unique situation of copy-neutral LOH that can be observed in the two copy number state.  Personally I would never call this uniparental disomy (UPD) as it is a somatic process.

Basics of Raw Copy Number Estimation
To establish the log2 copy number value for each potential state my standard practice is to build a table assuming there is 100 cells in the tumor population and each chromosome counts for 1 signal, or count, value.  Then for each chromosome lost or gained I alter the tumor signal/count proportionally. These theoretical tables and resulting distributions are outline in Table 1 and Figure 2.

Table 1 - Theoretical Log2 Values for Distinct Copy Number States 
 CN State  Normal Tumor T/N Ratio Log2
 200 1* 0.005* -5*
 1 200 100 0.5 -1.00
 2 200 200 1.0 0.00
 3 200 300 1.5 0.58
 4 200 400 2.0 1.00
* The zero copy number state truly has a signal/count of 0 but since it would result in an infinity value for presentation purposes I show a single signal/count value

Figure 2 - Theoretical Log2 Values for Distinct Copy Number States in Pure Tumor Populations Compared to a Normal Diploid Control
Horizontal dotted dark mustard lines define our established mean+/-5SD cutoff used in the MMRF CoMMpass study and the vertical light green dotted line indicates the expected value for a normal diploid tumor population.  In both plots the homozygous deletion, bi-allelic deletion, 0 copy number state that actually has a log2 value of infinity is hard coded to -5 which is in the functional range I've personally encountered over the years with aCGH or NGS in the -4 to -6 range A) Theoretical log2 values for copy number states 0-50. B) Theoretical log2 values for copy number states 0-10 with the actual expected log2 value noted.

Basics of B Allele Frequency Assessment
When used alone B allele frequency has a limited utility as the observed B allele ratio and absolute frequency can be associated with multiple copy number states (Table 2).  But when integrated with the know copy number state it can be very valuable in determining the allelic state and can be leveraged to calculate the purity of a sample. So to establish the absolute B allele frequency I build a table assuming there are 100 tumor cells in the population and then assign 1 count to each allele present in the theoretical scenario (Table 2). In the most simplistic scenario you can uses the absolute allele frequency to create and expected distribution of the allele frequency for each copy number state assuming the copy number is only changing for one allele while the other allele stays constant at the normal state of 1 copy (Figure 3).  As illustrated in Table 2, this is a simplification but is also not uncommon at copy number states 1-4 in my personal experience as I'd say the most common alleles observed are AO, OB, AB, AA, BB, AAB, ABB, AABB, AAAB, ABBB.

Table 2 - Absolute B Allele Frequencies for Distinct Copy Number States 
 CN State  Genotype A Allele
B Allele
 B Allele RatioAbs B Allele
A1000 0.000.50
4 AAAA40000.000.50
5 ABBBB1004000.800.30
5 BBBBB05001.000.50
weird line (DELETE WHEN DONE)
Figure 3 - Theoretical B Allele Frequency Assuming the B Allele Count is the Only One Changing

Fun with Heterogenous Mixtures - Looking Forward to Single Cells!
Figure 4 - Representative 75/25 Mixtures of Tumor and Normal Cells at Different Tumor Copy Number States

Figure 5 - Mixtures and the Effect on Deletions and LOH Events

Figure 6 - Gains

Allele Specific Copy Number - Totally Ignored right now

AMPure and not-so-simple

posted Jul 8, 2013, 5:48 PM by Kristi Stephenson   [ updated Jul 8, 2013, 5:50 PM ]

As those in the lab will verify, I have an admitted obsession with the AMPure beads. I love to use them for any application possible. They are just so great, and the % recovery is excellent. And they are a lifesaver when it comes to eliminating adapter dimer from next gen sequencing libraries.

We have a ChIP-seq assay in development where a gel punch is required to eliminate the adapter dimers after pcr amplification, and also narrow in on the size of the library we move forward with into sequencing (our target size is around 200-500 bp). When you are processing just a couple samples, the gel punch and isolation is not such a big deal, though it does add an extra 3 hours at least, depending on number of samples. So I got to thinking, what if we used the "double-cut" concept of the AMPure beads to: First, eliminate the really big stuff (top cut), and Second, eliminate the really small stuff (bottom cut). This way I could transfer my entire ChIP-seq library prep to a plate format and greatly increase the number of samples feasibly processed at one time. I did a little experimenting, and here's a summary of what I found.

My basic AMPure protocol goes as follows:

1) Add beads at the desired ratio to the sample and pipet 10x to mix
2) Incubate 15min at room temp
3) Place on magnet 2-5min, until supernatant is clear
4) Remove supernatant (save if needed)
5) Wash beads 2x with 200-500ul 80% EtOH
6) Remove as much EtOH as possible by pipet, then let bead pellet air dry (up to 15min depending on pellet size)
7) Remove from the magnet, add desired elution solution/volume and pipet 10x to mix
8) Incubate 2min at room temp
9) Return to magnet for 2-5min
10) Remove supernatant - this contains the size-selected DNA

First, here is a gel showing the DNA size fragments found in the supernatant (step 4) versus what is eluted off the beads (step 10), for varying ratios of beads to sample. I used a low molecular weight ladder (100bp-1031bp) as my DNA "sample."

Interesting, I thought. You can clearly see that as the bead:sample ratio decreases, the beads selectively bind larger fragment sizes. A nice step-wise illustration where the DNA size lost in the supernatant is mirrored by the size retained on the beads. Also note the beads selectively eliminate the 100bp band no matter what other sizes are retained. This is key for the adapter-dimer removal in sequencing libraries.

Second, based on the gel above I tested different ratios for top and bottom cut, trying to zero in on a combination that could work to select out my 200-500 bp target region for the ChIP libraries.


data to come :)

To TE or not to TE

posted Feb 8, 2013, 2:15 PM by Jonathan Keats   [ updated Feb 8, 2013, 2:17 PM ]

We run into this all the time that people through around the word TE with know knowledge of the extensive variations that exist.  Also most people don't realize common solutions in various kits are just variations of TE

1) Standard 1x TE  =  10 mM Tris-HCl, 1.0 mM EDTA  (pH 8.0, but this should not be assumed)

2) Buffer AE = 10 mM Tris-HCl, 0.5 mM EDTA (pH 9.0)
        NOTE: Apparently Qiagen puts buffer AE in different kits and they may NOT be the same composition
        Values FromDNeasy Blood & Tissue Kit (50) cat#69504

3) Buffer ATE = 10 mM Tris-HCl, 0.1 mM EDTA, 0.04% Sodium Azide (pH 8.3)
        NOTE: Apparently Qiagen puts buffer ATE in different kits and they may NOT be the same composition
        Values FromQiaSymphony DNA Midi Kit cat#931255

4) Buffer EB = 10 mM Tris-HCl (pH 8.5)
        * Common elution buffer in many Qiagen plasmid prep and sample clean-up kits

4) DNA Hydration Solution = 10 mM Tris-HCl, 1.0 mM EDTA (pH 7-8)
        * Provided in Gentra Puregene kits from Qiagen

5) DNA Suspension Buffer = 10 mM Tris-HCl, 0.1 mM EDTA (pH 8.0)
        * Recommended by Affymetrix for SNP arrays (basically in standard TE it will not work!)
        * Commonly called TElowE

A bit of ranting/suggestions

A) Never put DNA in water alone
B) The pH of the solution can effect things like 260/280 values, generally higher pH makes things look better
C) You want to watch the final concentration of EDTA in any enzymatic reaction.  Our recommendations is to use buffers with 0.1 mM EDTA

In our lab all DNA samples are stored in DNA Suspension Buffer (ie. TElowE)


GATK - Fun - Confusion - Fear - Frustration - Things to Remember

posted Jan 9, 2013, 12:23 AM by Jonathan Keats

There are many great things about the GATK software, primarily their excellent level of support, but somethings are a bit irritating like "we change it all the time so check the --list function" which requires you to prep up a complete GATK command which can be difficult when you have no idea what you are doing.

So here are the lists I'm always wanting to have on my wall (The Genome Analysis Toolkit (GATK) v2.3-4-g57ea19f)

Variant Annotator List of Annotations
Standard annotations in the list below are marked with a '*'.

Available annotations for the VCF INFO field:

Available annotations for the VCF FORMAT field:

Available classes/groups of annotations:

What type of Dynabead do I need?

posted Nov 27, 2012, 12:51 PM by Kristi Stephenson

If you're like most of us here in the lab, we can never keep track of what the difference is between the different types of Dynabeads. It seems like every new assay is asking for a different "letter" of bead. There's T1, C1, M270, M280, and well... I think that's the end of the list for now. At any rate, I decided to post a quick cheat sheet so it's easy to look it up and remember. So for anyone else out there wondering the same thing... hopefully this will make your life a little easier! For more details, check out the link for the LifeTech product sheet also included below.

PCR Purification: AMPure and Simple

posted May 8, 2012, 10:16 PM by David Edwards   [ updated May 9, 2012, 8:55 AM ]

I wanted to talk a little about the selection characteristics of Agencourt’s AMPure beads, a bead-reagent combination that purifies PCR reactions.

This stuff is incredible in terms of simplicity, efficiency, and high-throughput compatibility. I have a sneaking suspicion that AMPure, not unlike fire to Prometheus, was handed down from the gods to benefit humanity. You just dunk it into your sample, slosh it around, stick it to a magnet, wash, wash again, and elute in your favorite buffer. No muss, no fuss.

We were wondering, though, about its selection process. What size fragments are selected by the AMPure beads, specifically at which ratio of beads to sample? So, like diligent scientists, we rolled up the sleeves of our labcoats and… read the protocol.

The protocol recommends washing your sample in a 1.8:1 ratio of beads to sample, although it says that fragments less than 100bp will be omitted at this ratio, it doesn’t say which sized fragments will be selected. We found this remarkably helpful technical bulletin, which describes calibrating each batch of AMPure beads with various ratios of DNA ladder.

So I did our very own calibration with AMPure beads using Fermentas’s GeneRuler™ Low Range DNA Ladder (25-700 bp). I added 30ul ladder to various concentrations of AMPure beads according to Agencourt’s instructions.

(Actually, if you’re looking for good AMPure instructions, I recommend looking at Illumina’s TruSeq™ Sample Preparation Guide. Honestly, their instructions are more comprehensive than Agencourt’s, and easier to read.) After purifying each sample, I bookended the various AMPure:ladder ratios with 10ul non-purified ladder on a 2% TBE gel for easy comparison.

Without any further ado, here are the results:

The results aren’t too surprising, I guess. Unless you’re looking to select 100-150bp fragments, or if you’re using an extremely low ratio of AMPure beads, the ratio differences aren’t that significant. Basically, barring the first exception, you’ll be just fine following Agencourt’s protocol and recommended ratio. 

From this one image, it’s difficult to quantitatively compare one ratio against another, so I plugged everything into ImageJ to give me some numbers to play around with. I followed ImageJ’s guidelines for analyzing gel images. Then, I averaged the band intensities for both non-purified ladder samples, multiplied them by three (knowing that I added three times more ladder for purification), and normalized the band intensities of the purified ladder by dividing them by their corresponding band intensities for the non-purified ladder.

If you didn’t follow the grammatical train wreck that was the previous sentence, don’t worry, you should just focus on the results:

Interestingly enough, according to ImageJ, the 1.6:1 ratio has slightly more intense bands, and apparently slightly more purified DNA, than the recommended 1.8:1 ratio.  (If you want to see my exact analysis process, you can view the attached Excel file.) While those values don’t mean percentages because the normalization isn’t exact, it does suggest that different AMPure ratios to DNA can produce different results in terms of fragment size and amount retained. 

And, when you really think about it, isn’t that what experimental PCR purification fragment analysis is all about?

Affect of Heating Time on DNA Fragmentation

posted Feb 3, 2012, 1:50 PM by Kim Babos   [ updated Feb 3, 2012, 3:32 PM by Jonathan Keats ]

After performing a test of heated DNA (95 C for 10 minutes) versus non-heated DNA for Kristi using the ARD_SSC.C8 cell line, we found that heating DEFINITELY denatures the DNA, literally to smithereens.  We decided that I should try to fluctuate the heating times between 0 and 18 minutes and compare those to non-heated DNA.  I have attached an image of the gel results which turned out beautiful I am proud to say!

I used the KMS34 cell line for this experiment.  The non-heated DNA stayed at the top of the gel while the other 9 samples showed quite a variety of smear patterns.  As you can see, when the DNA was heated for 18 minutes, it was denatured to the point that it lies between about 500 and 100 base pairs.  On the other end of the spectrum, the DNA that was heated for 2 minutes shows a really nice, long smear from about the 12000 base pair marker down to about 650 or so.  

After thinking about the effects heat has to DNA, there are both pros and cons to heat denaturing, and heating to differing times.  If you want to get small fragments of DNA, for instance for use in a CGH protocol, heating for 18 minutes seems to do the trick.  If you want to amplify long-length DNA via PCR, I would definitely say to avoid heating 18 minutes and try to stick to a time around 1 and 4 minutes.  In talking with Jonathan, this image might explain why it tends to be difficult to amplify an DNA sequence of a large size, say 8 kb for example.  Because we are heating the DNA at 94 C for 5 minutes and then proceeding to heat for another 15 or so, it is no wonder that we have a hard time getting many 8 kb amplicons.  They are all shredded to pieces!

Just an interesting piece of data we figured out in the Keats lab today.  Ending the week with interesting results is always a treat!


Getting the Lab off the Ground

posted Jul 15, 2011, 4:42 PM by Jonathan Keats   [ updated Sep 25, 2011, 11:34 PM ]

So the lab is now over one year old and finally getting going.  I've had a number of people tell me that if you are going by six months and feel productive by one year you are doing good.  Based on that I'm a bit behind as I only now feel like the lab is starting to really make progress, so that is a bit past the one year mark.  But I really only started doing things in the lab in January when Kristi joined the group so maybe we are ahead of schedule?  The good news is we have a decent bit of funding between my start-up, and two different grants.  I actually had to hire another research associate recently as we just could not keep up.  Then we have a big project ready to launch and when the contract is finally signed I'll be able to add two post-docs to the group. 

1-8 of 8