At Embark, we often receive questions like this one: “How can you differentiate ancestry between closely related breeds?”

For example, American Bulldogs and modern English Bulldogs are the descendants of working English Bulldogs that made their way to America with working-class immigrants. So, how can we tell the difference between the genetics of American Bulldogs and English Bulldogs?

The short answer to this question is that there are four evolutionary processes. Meiotic recombination, mutation, genetic drift, and natural selection lead to independent genetic changes in the two related breeds over time. This eventually leads to genetic divergence between the two breeds. Let’s use a reference population of American and English Bulldogs. From this we identify genetic signatures that are shared between two related breeds. More importantly, we identify genetic signatures unique to each breed. Thus, for any Embark dog, we can measure the fraction of the genome with ancestry from either the American or the English Bulldog.

That’s the short answer, but we know that some of you need to know more to understand dog DNA. For an extended discussion of these processes, read on.

Genetic barcodes

To set the stage, let’s begin by imagining the genetic variation in dog DNA sequences within a breed of dogs as a set of barcodes. Figure 1 shows that every breed or population contains a limited set of genetic barcodes. Each of these is present in that breed in a certain frequency.

Figure 1

An example dog breed with genetic barcodes at different frequencies

For each generation, through the evolutionary processes of recombination, mutation, genetic drift, and natural selection (and artificial selection in the case of dogs), the genetic barcodes carried by two breeds that each descend from a single ancestral breed genetically diverge from their ancestral population. Recombination and mutation physically change the content of DNA barcodes, while drift and selection change the frequencies of the barcodes.

Recombination and mutation physically change the content of DNA barcodes, while drift and selection change the frequencies of the barcodes.


These are spontaneous changes to DNA. They can change the content of a genetic barcode over time by replacing one or more base-pairs of DNA with different base-pairs, remove sequence, or add entirely new sequence. There are many different kinds of mutations. Point mutations are some of the most common and simplest mutations. These are changes to single base pairs in a strand of DNA. Insertions and deletions are another common class of mutations. They are known as indels for short. They are the addition or removal, respectively, of segments of DNA from a chromosome.

A mutation can be passed on to offspring when a mutation occurs in the germline, the sperm or eggs, of an individual. The offspring in turn may pass that mutation on to their offspring. Figure 2 illustrates how different kinds of mutations can change the content of a genetic barcode in a lineage over time.

Figure 2

Propagation of two mutations, a point mutation and a deletion, over time in a dog’s barcode


This is the shuffling of DNA content between the two copies of a chromosome carried in a single individual during the process of meiosis (formation of sperm and egg cells). This shuffling of DNA content breaks down the length of an individual’s genetic barcodes. It also creates new barcodes that are combinations of the two barcodes inherited from parents. As such, recombination can change the content of DNA barcodes passed to offspring, each and every generation. Some mixed-breed dogs have more Supermutt ancestry (ancestry that cannot be reliably assigned to a single breed). This is because of the continuous breakdown of barcodes through meiotic recombination. I’ll discuss this in more detail in a future post.

Figure 3 shows two parents, each carrying two barcodes (one from each of their parents). Because of recombination in the germline of these parents, their offspring receives a barcode from each parent. That barcode is a mixture of the barcodes carried by that individual’s grandparents (the blue, orange, purple, and green barcodes).

Figure 3

Illustration of recombination with barcodes

In summary, the processes of mutation and recombination act (mostly) randomly to add/remove bars from the barcodes (mutations) and mix-and-match pieces of barcodes with each other (recombination) over time, leading to new content and combinations of genetic barcodes in a closed population (like a breed).

Other processes affect the frequency of specific barcodes (or single genetic variations carried on a barcode) over time.

Genetic drift

This is the random fluctuation in the frequencies of genetic variation over time. In short, if you were to track the frequency of a genetic variant in a closed, finite population over time, you would see that eventually the frequency of that genetic variant would shift a little bit randomly each generation until ultimately the frequency reached 100% (fixation) or 0% (loss). Simply put, fixation is when every dog in the population carries two identical copies of that barcode (one from each parent). :oss is when no dog in the population has any copies of that barcode.

This process results from the random inheritance of gametes by the next generation in a population. The larger a population is, the longer it takes for this random drift to lead to fixation or loss.

We can clarify this concept fairly well with a simple randomization experiment. You can try this at home with 20 marbles (10 each of two different colors). Imagine a large bowl of marbles into which we place 5 red and 5 blue marbles. In this “population” the initial frequency of blue marbles is 50% (5/10).

We then, one at a time, choose a marble with our eyes closed (randomly), record the color (after opening our eyes, of course), and place it back into the bowl 10 times (this process is called randomization with replacement). For each successive “generation” of our experiment, we replace the contents of the bowl with the random sample that we generated in the previous generation.

Figure 4 illustrates this simple experiment repeated 25 times. In this figure, you can see that in every case the blue marble is either lost or completely replaces the red marbles in the bowl. If we were to scale this experiment up to 100 or 1,000 marbles, we’d find that the more marbles we add to our experiment, the longer it takes for one color to replace the other.

Figure 4

Illustration of simple genetic drift

In a future post, we’ll discuss other types of genetic drift in more detail. Nonetheless, the take-home point here is that many parts of the dog genome, particularly those that do not influence phenotypic traits that are characteristic of a breed, are subject to random fluctuation in frequency. Furthermore, in breeds for which the population size is fairly small, the effect of genetic drift can be great.


The evolutionary process that most people are familiar with is selection. This is the deterministic or intentional change in the frequency of a genetic variant over time. Natural Selection is when this occurs in natural populations due to the effect of a genetic variant on the ability of an individual in a population to survive and, more importantly, reproduce. However, in most domesticated species, most selection is directly imposed by humans controlling the reproduction of that species. In that case, the selection is referred to as Artificial Selection.

Imagine that the barcodes in Figure 1 represent genetic variation in a gene that affects muzzle length, a trait that varies between American and English Bulldogs. Dogs carrying the blue barcode tend to have slightly longer muzzles. However, dogs carrying the other three barcodes have shorter muzzles. Say we wanted to create a new breed from this population, for which the breed standard is an elongated muzzle. To do this, we might choose to allow longer muzzled dogs to have a few more litters each generation than shorter muzzled dogs. Through this process, longer muzzles would become more common in each successive generation. This is because we have “selected” dogs carrying the “long muzzle” barcode to contribute more offspring to the next generation. We could continue this process until all dogs in our new breed have longer muzzles.

Genetic drift and selection

Thus, the processes of genetic drift and selection act to alter the frequencies of specific barcodes or subsets of barcodes (specific genetic variations) within a breed.

We’ve discussed and illustrated these processes separately in this post. However, it is important to remember that none of these processes act in isolation. In fact, all are constantly acting to alter the genetic composition of a population. In future posts we will discuss how these processes interact in populations. We also touch on how they work to influence important concepts, like inbreeding and genetic diversity, in the maintenance of domesticated species.

A basic understanding of these processes allows us to answer our initial question. “How can Embark identify ancestry from a breed if that breed was derived from another related breed?”

Figure 5 illustrates this basic process. We first take a single ancestral population, like working English Bulldogs in the 1800s, and divide it into two populations. The set of genetic barcodes in that initial population diverges in the two descendant bulldog populations. This occurs through the processes of mutation, recombination, genetic drift, and artificial selection on desired traits in each breed.

Figure 5

Summarizing figure demonstrating four biological principles leading to divergence of the American and English Bulldog breeds

Dogs share most genetic variation in the two populations early in the divergence progress. Today, however, in American and English bulldogs still share some genetic variation. This is due to their divergence from the same ancestral population. Breeds are more likely to share most genetic barcodes within each breed than between two breeds. Dogs share most genetic barcodes within each breed more often than between the two breeds. This allows the science team at Embark to differentiate the two breeds.

We are always working to expand and improve the breed reference samples that we use to analyze dog breed ancestry, so if you are an owner or breeder of a breed that we don’t have listed on our list of breeds, let us know at Thanks for Embarking with us!

Want to know more about dog DNA? Follow @embarkvet on Facebook, Twitter, and Instagram.