Genetic Genealogy Question and Answer Session.

Clare Roots Society monthly meeting

8:00 p.m. Thursday 19 May 2016

Maguire Suite, Old Ground Hotel, Ennis

by Paddy Waldron

WWW version:

http://pwaldron.info/DNA2/

What is DNA?

Where does our DNA come from?

male offspring female offspring
sperm Y chromosome X chromosome
22 paternal autosomes
egg X chromosome
22 maternal autosomes
mitochondria

Inheritance paths

Y chromosome
Only males have a Y chromosome.
The Y chromosome comes down the patrilineal line - from father, father's father, father's father's father, etc.
This is the same inheritance path as followed by surnames, grants of arms, peerages, etc.
X chromosome
Males have one X chromosome, females have two.
X DNA may come through any ancestral line that does not contain two consecutive males.
Blaine Bettinger's nice colour-coded blank fan-style pedigree charts show the ancestors from whom men and women can potentially inherit X-DNA.
Autosomes
Males and females all have 22 autosomes.
Exactly 50% of autosomal DNA comes from the father and exactly 50% comes from the mother.
Due to recombination (see below), on average 25% comes from each grandparent, on average 12.5% comes from each greatgrandparent, and so on.
Siblings each inherit 50% of their parents' autosomal DNA, but not the same 50% (except for identical twins), so not the same DNA matches.
Similarly, sisters each inherit 50% of their mother's X DNA, but not the same 50% (except for identical twins).
Mitochondria
Everyone, male and female, has mitochondrial DNA.
Mitochondrial DNA comes down the matrilineal line - from mother, mother's mother, mother's mother's mother, etc.
The surname typically changes with every generation in this line.
For genetic genealogy, beginners should start with autosomal DNA, or Y DNA for one name studies or surname projects.

How much DNA do we have?

Billions of letters:
Male Female
Length Width Total Length Width Total
Autosomal 2,881,033,286 2 5,762,066,572 2,881,033,286 2 5,762,066,572
X 155,270,560 1 155,270,560 155,270,560 2 310,541,120
Y 59,373,566 1 59,373,566 0
Mitochondrial 16,569 1 16,569 16,569 1 16,569
GRAND TOTAL 3,095,693,981 5,976,727,267 3,036,320,415 6,072,624,261

How much DNA do we observe?

The three major DNA companies sample different locations on the autosomes - about 0.01% of the total:
The overlap between FamilyTreeDNA and AncestryDNA is 652,462 (based on my personal results).
Current technology can observe two letters at each location, but cannot distinguish between the paternal and maternal letters.
So we have to be clever to distinguish between paternal and maternal DNA matches.
The vast majority of DNA is identical for all humans.

The random component

Most DNA is transcribed exactly from the relevant parent.

Two sources of randomness mean that one cannot always exactly infer the child's DNA from the parents' or vice versa:
  1. mutations
  2. recombination
Mutations are transcription errors at single locations, e.g. a single A in the parent may be replaced by a C in the child.

Some locations mutate very frequently, and can be used to identify individuals beyond reasonable doubt, e.g. in criminal cases.

Some locations mutate less frequently, and can be used to identify closely related individuals.

Special types of mutations:
The entry-level Y-DNA37 product looks at 37 STR markers on the Y chromosome, e.g. U.S. Waldron project or O'Brien project.
Some SNPs on the Y chromosome are once-in-the-history-of-mankind events and can be used to build a Y-DNA Haplogroup Tree.
STRs can predict Y haplogroups but a SNP product must then be purchased to confirm the Y haplogroup.
Surname-specific SNPs are now being discovered.

The other source of randomness is recombination, which is how, e.g., the father's paternal and maternal autosomes cross over to produce the child's paternal autosomes.
Paternal: gtacgatcgtagatcgatcatatccgtacgcatcatgactacatatcatcgatcgatcatcatatcgatcatcagcatcgatcgatcgatcgatcgat
Maternal: gggggggggggggggaccagtatgtatcagtcctattactacatctactataactatctactagctagcaatatcctactcatacatctacttactgt
Combined: gtacgatcgtagatcgatcatatctatcagtcctattactacatctactataacgatcatcatatcgatcatcacctactcatacatctacttactgt

Every sperm/egg is potentially unique.

Recombination of the paternal and maternal chromosomes is sometimes compared to shuffling two decks of playing cards.

Recombination rates very markedly along the autosomes and X chromosome.

One recombination per generation is expected in each 100 centiMorgans (cM, not cm).

The longer the centiMorgan length of two identical DNA segments, the more recently one can expect to find the common ancestor from whom they were inherited.
On average, 798,852 total letters per cM or 190 letters observed per cM, or one recombination or crossover per generation per 19,000 observed letters.

In practice, we cannot distinguish the paternal letter and the maternal letter, so we can only check whether two individual's observed autosomal (or X) DNA is half-identical (or better) at specific locations, e.g.
Phasing and triangulation are like opposite sides of the same coin:
Rule of thumb for lengths of the longest half-identical region::
The aggregate length of all the half-identical regions above some arbitrary threshold is used to estimate the relationship:
Average autosomal DNA shared by pairs of relatives, in percentages and centiMorgans

What can DNA tell us?

The big 3 DNA companies

AncestryDNA
Part of ancestry.com
Autosomal DNA only
Very limited analysis tools
Overcharges non-U.S. customers
Internal messaging system
About 1 million samples
Most people use pseudonyms or initials
My results
23andMe
Concentrates on medical aspects of DNA
Autosomal DNA plus predicted Y-DNA and mtDNA haplogroups
Doubled prices in late 2015
Overcharges non-U.S. customers
Optional internal messaging system
About 1 million samples
Most people anonymous
Analysis tools for non-anonymous matches
Results
Surname View
FamilyTreeDNA (FTDNA)
Dedicated to genetic genealogy
Autosomal DNA (Family Finder) plus various Y-DNA and mtDNA products
Good analysis tools
Single worldwide price
No U.S. bias
Simple e-mail communications
400,000+ samples (?250,000 Family Finder and 150,000 Y-DNA only or mtDNA only)
Most people use real names: but married women recommended to use maiden surnames
Projects - e.g. Clare Roots, Munster Irish, Pre-Great Famine Munster Ireland Project
My results

What do you get for your money?

The third-party sites

GEDmatch.com

DNAgedcom.com

Levels of involvement

Lists of names
A black box algorithm can be used to list the names of those in a database whose autosomal DNA is closest to yours. 
You can look at your matches' own names, their ancestral surnames, ancestral placenames and family trees, if they have made these available.
Anybody can do this.
Lengths of half-identical regions
To get full value from one's investment in DNA analysis, one should move on from the purely qualitative approach of looking at names and take a more quantitative approach.
The first step is to look at the percentages of the length of the genome on which one is half-identical with a potential relative.
The higher this percentage, the closer the relationship is likely to be.
Some basic arithmetic skills are required for this.
Locations of half-identical regions
If three or more people are all half-identical to the others on the same region, and if two or more of them are known relatives, then it becomes far more likely that they are all descended from a common ancestral couple.
Furthermore, it can be inferred that the DNA in the half-identical region has been inherited from either the male or the female of that common ancestral couple.
This may exercise your brain cells a little more than the first two approaches.
Raw data
Sooner or later, the only answer to a particular DNA puzzle will be to look at one's raw data, in the form of long sequences of pairs of As, Cs, Gs and Ts, in order to work out exactly how and why something happened.
This is for the specialist.
Whatever level of involvement you choose, you have a responsibility to provide your DNA matches with at least an outline pedigree chart showing your direct ancestors (FamilyTreeDNA, AncestryDNA).

The easiest way to do this is to upload a GEDCOM file from your desktop genealogy software.

Why you should submit your DNA

Further reading

Questions???