Occasionally, we get questions from customers curious about how their dog’s DNA is processed into the genotype data that we analyze to understand breed ancestry and health risks.
Once we receive your dog’s cheek swab, the DNA contained in the sample is copied many times in a process called amplification. Next, all of those copies of the DNA are “chopped up” or fragmented into small pieces. That fragmented DNA is then applied to Embark’s DNA chip, also known as a genotyping array. This chip contains hundreds of thousands of tiny silica beads, each of which is covered with hundreds of thousands of identical short DNA probes that are specific to a single location in the dog genome. Once the fragmented DNA has time to stick to these probes, they are labeled with fluorescent stains (red and green) that are specific to each DNA variant, or allele, at a marker. These fluorescent probes can then be analyzed to generate a readout of the intensity of each color at each marker.
So at every marker, we expect the intensity readout to reflect one of three possible genotypes. For example, if there are two alleles, let’s call them the A allele and the B allele. Then the three possible genotypes are AA, AB, and BB. If intensity is measured relative to the B allele, then we can infer genotypes based on intensity values of 0 (AA), 0.5 (AB), and 1 (BB). Most samples go through the genotyping process smoothly, so their readouts of intensity might look something like this (note that each dot in all of the following plots corresponds to a single marker, so the X-axis reflects position along the dog genome):
But as some of you know, occasionally a swab fails genotyping. This can happen for a couple of different reasons. Sometimes a genotyping array randomly fails, or other mistakes in the process occur. Most commonly, however, genotyping failure is due to too little DNA in the provided swab. This leads to a very noisy signal in the intensity readouts, like this:
Pretty different, right? If your dog fails genotyping on the first try, make sure to take a second look at our swabbing instructions. It may help to try to stimulate saliva production by letting your dog smell their favorite treat (but don’t let them eat it) prior to swabbing.
A few of our customers have expressed concern that sample contamination has led to surprising breed ancestry results. These examples illustrate how it is actually impossible for us to analyze a contaminated sample!
If you have a multiple dog home or a dog that goes on regular playdates, when swabbing your dog you also have to be careful to avoid swabbing soon after your dog has shared food, water, or toys with their brother, sister, or friends. The reason for this is that contamination from another dog makes it impossible to accurately interpret the intensity readout from a genotyping array. When there is DNA from two dogs in the sample instead of one, instead of just three possible genotypes at each marker, there are nine! You can think of these as nine combinations of each dog’s set of three possible genotypes (for example– AA|AA, AA|AB, AA|BB, AB|AA, AB|AB, AB|BB, BB|AA, BB|AB, BB|BB). So, depending on the amount of contamination, this can make it impossible to convert the intensity readout from the genotyping array into a single genotype call at each marker. In a contaminated swab, the intensity readout on a single bead is an average of the genotypes of two dogs, so intensity frequencies in between 0 and 0.5 and 0.5 and 1 occur across the entire genome. For example, this is what an intensity readout from a contaminated sample might look like:
Every so often we get a contaminated sample, and the follow-up swab is perfectly fine. But in a few exceedingly rare cases we see the same pattern of contamination in multiple follow-up swabs. This could reflect contamination by the same second dog. However, when the intensity readouts are similar multiple times in a row, it is more likely that the dog providing the sample is what’s known as a chimera. Take a look at these three samples from the same dog, for example:
Chimeras are individuals that have cells in their body from two different individuals! This happens in dogs and humans alike on occasion. Most commonly, this is the result of an individual absorbing a sibling very early in the developmental process. We’re not actually sure how common this is in dogs yet. More importantly, just because a dog has some chimeric cells in their body does not mean they will have chimeric cells in their mouth/saliva.
I hope that this primer on DNA genotyping has helped you to understand how the data that we analyze is generated.
As always, thanks for Embarking with us!