Adventures in DNA-based family history research:
How DNA found two more generations of my 18th century Limerick ancestors
Granary Library History Lecture Series
8:00 p.m. Tuesday 15 December 2015
Granary Library, Michael Street, Limerick
WWW version:
What is DNA?
- short for deoxyribonucleic
acid
- made up of chromosomes and mitochondria, each consisting of
molecules of four nucleotides
named adenine (A), cytosine (C), guanine (G) and
thymine
(T)
- represented by strings of the letters A, C, G and T
Where does our DNA come from?
- When a sperm fertilises an egg, each brings DNA, which is
replicated in every cell of the resulting person.
|
male
offspring |
female
offspring |
sperm |
Y chromosome |
X chromosome |
22 paternal autosomes |
egg |
X chromosome |
22 maternal autosomes |
mitochondria |
- autosome
is short for autosomal chromosome.
Inheritance paths
- Y chromosome
- Only males have a Y chromosome.
The Y chromosome comes down the patrilineal line - from father,
father's father, father's father's father, etc.
This is the same inheritance path as followed by surnames, grants of
arms, peerages, etc.
- X chromosome
- Males have one X chromosome, females have two.
X DNA may come through any ancestral line that does not contain two
consecutive males.
Blaine Bettinger's nice colour-coded blank fan-style pedigree
charts show the
ancestors from whom men and women can potentially inherit
X-DNA.
- Autosomes
- Exactly 50% of autosomal DNA comes from the father and
exactly 50% comes from the mother.
Due to recombination (see below), on average 25% comes from each
grandparent, on average 12.5% comes from each greatgrandparent, and so
on.
Siblings each inherit 50% of their parents'
autosomal DNA, but not the same 50% (except for identical twins).
Similarly, siblings each inherit 50% of their mother's X DNA, but not the same 50% (except for identical twins).
Sisters also inherit 100% of their father's X DNA.
- Mitochondria
- Everyone has mitochondrial DNA.
- Mitochondrial DNA comes down the matrilineal line - from
mother, mother's mother, mother's mother's mother, etc.
The surname typically changes with every generation in this line.
For genetic genealogy, beginners should start with autosomal DNA, or Y
DNA for one name studies or surname projects.
How much DNA do we have?
Billions of letters:
|
Male |
Female |
|
Length |
Width |
Total |
Length |
Width |
Total |
Autosomal |
2,881,033,286 |
2 |
5,762,066,572 |
2,881,033,286 |
2 |
5,762,066,572 |
X |
155,270,560 |
1 |
155,270,560 |
155,270,560 |
2 |
310,541,120 |
Y |
59,373,566 |
1 |
59,373,566 |
|
|
0 |
Mitochondrial |
16,569 |
1 |
16,569 |
16,569 |
1 |
16,569 |
GRAND TOTAL |
3,095,693,981 |
|
5,976,727,267 |
3,036,320,415 |
|
6,072,624,261 |
How much DNA do we observe?
The three major DNA companies sample different locations on the
autosomes - about 0.01% of the total:
- FamilyTreeDNA: 682,608
- AncestryDNA: 658,780 (reduced to 441,447 from May 2016)
- 23andMe: 577,382
The overlap between FamilyTreeDNA and the original AncestryDNA set is 652,462 (based on
my personal results).
Current technology can observe two letters at each location, but cannot
distinguish between the paternal and maternal letters.
Mutation
Most DNA is transcribed exactly from the relevant parent.
Two sources of randomness mean that one cannot always exactly infer the
child's DNA from the parents' or vice versa.
Mutations
are transcription errors at single locations, e.g. a single A in the
parent may be replaced by a C in the child.
Some locations mutate very frequently, and can be used to identify
individuals beyond reasonable doubt, e.g. in criminal cases.
Some locations mutate less frequently, and can be used to identify
closely related individuals.
The vast majority of DNA is identical for all humans, and even for many of the apes.
Special types of mutations:
- Short Tandem Repeat (STR): a string of
letters consisting of the
same short substring repeated several times, for example
CCTGCCTGCCTGCCTGCCTGCCTGCCTG is CCTG repeated seven times; it may be
repeated less or more often in other individuals.
- Single Nucleotide Polymorphism (SNP): a single location
where two or more different letters are observed in different
individuals.
Y-DNA Mutations
The entry-level Y-DNA37 product looks at 37 STR markers on the Y
chromosome, e.g. Waldron project or O'Brien project.
Some SNPs on the Y chromosome are once-in-the-history-of-mankind events
and can be used to build a Y-DNA Haplogroup Tree.
STRs can predict
Y haplogroups but a SNP product must then be purchased to confirm the
Y haplogroup.
Surname-specific SNPs are now being discovered.
mtDNA Mutations
There is also an mtDNA Haplogroup Tree.
Autosomal DNA Mutations
The 0.02% or so locations observed on the autosomes are known SNPs.
There may be other SNPs that have not yet been discovered, so two
individuals whose DNA looks alike may later be found to be different.
All observed SNPs are still weighted equally in relationship
calculations, but two individuals who share a mutation observed in only
1% of the population are likely to be far more closely
related than two individuals who share a mutation observed in 50% of the population.
Recombination
The other source of randomness is recombination,
which is how,
e.g., the father's paternal and maternal autosomes cross over to
produce the child's
paternal autosomes.
Paternal: gtacgatcgtagatcgatcatatccgtacgcatcatgactacatatcatcgatcgatcatcatatcgatcatcagcatcgatcgatcgatcgatcgat
Maternal: gggggggggggggggaccagtatgtatcagtcctattactacatctactataactatctactagctagcaatatcctactcatacatctacttactgt
Combined: gtacgatcgtagatcgatcatatctatcagtcctattactacatctactataacgatcatcatatcgatcatcacctactcatacatctacttactgt
Every sperm/egg is potentially unique.
Recombination of the paternal and maternal chromosomes is
sometimes compared to shuffling two decks of playing cards.
Recombination rates very markedly along the autosomes and X
chromosome.
One recombination per generation is expected in each 100 centiMorgans (cM,
not cm).
The longer the centiMorgan length of two identical DNA segments, the
more recently one can expect to find the common ancestor from whom they
were inherited.
On average, 798,852 total letters per cM or 190 letters observed per
cM, or one recombination or crossover
per generation per 19,000 observed letters.
Fully identical and half-identical regions
In theory, full-siblings inherit:
- the same autosomal DNA from both parents at 1/4 of all
locations
- different autosomal DNA from each parent at 1/4 of all
locations
- the same autosomal DNA from exactly one parent at the
remaining 1/2 of all locations
Hence, full-siblings are:
- fully identical at 1/4 of all locations
- half-identical at 1/2 of all locations
- not identical at 1/4 of all locations
Similarly, double cousins can have a mix of fully identical regions
(FIRs: same DNA inherited from both parents) and half-identical regions (HIRs: same
DNA inherited from exactly one parent).
Normal cousins share half-identical regions only.
In practice, we cannot distinguish the paternal letter and the maternal
letter, so we can only check whether two individual's observed
autosomal (or X) DNA is
half-identical (or better) at specific locations, e.g.
- AA in one person is half-identical (or better) to AA, AC,
AG, AT in another person
- AG is half-identical (or better) to AA, AC, AG, AT, CG, GG,
GT
In practice, almost all locations are bi-allelic - i.e. only two
letters have been observed.
At a bi-allelic location (e.g. where only A and G are observed), AG (heterozygous) is
half-identical to everyone (AA, AG or GA), so only homozygous
observations (AA or GG) can tell us anything new about relationships.
Observed half-identical regions are bounded by locations where both
individuals are opposite homozygous (one AA, the other GG).
If As and Gs are equally likely, then one location in eight is on
average opposite homozygous.
If one letter is more likely than the other, then opposite homozygous
locations are less frequent than one-in-eight.
A run of thousands of half-identical locations (no opposite homozygous
location) cannot be by chance, and indicates a high probability of
identical segments on one chromosome in each individual, and thus a
relationship.
A long genetic length (centiMorgans) indicates a high probability of a
recent common ancestor.
False positives can occur - a half-identical region can by chance consist of short
overlapping (or zig-zagging) paternal/paternal, paternal/maternal,
maternal/paternal and maternal/maternal segments.
Phasing and triangulation are like opposite sides of the same coin:
- If you are half-identical to two people on the same region, where they are not half-identical to each other, then one must match your paternal chromosome and the other must match your maternal chromosome: phasing is
the term used for separating the observed letters into paternal and
maternal chromosomes, or separating the observed matches into paternal
and maternal relatives.
- If you are half-identical to two people on the same region, where they are half-identical to each other, then all three of you must have inherited DNA in that region from a single common ancestor: triangulation
is the term used for identifying three people descended from a common
ancestral couple. (More information is usually required to determine
whether the shared DNA came from husband or wife in that couple.)
Rule of thumb for lengths of the longest half-identical region::
- 30 cM+: good chance of finding the common ancestor
- 20 cM+: worth investigating
- 10 cM+: probably too distant to trace in surviving Irish
genealogical records
- under 10cM: possibly half-identical by chance
- reduce these thresholds if there is other solid evidence of
a relationship
The aggregate length of all the half-identical regions above some
arbitrary
threshold is used to estimate the relationship:
- 1cM threshold for FamilyTreeDNA (provided one segment is 7cM or more)
- 7cM threshold for GEDmatch
Average autosomal DNA shared by pairs of
relatives, in percentages and centiMorgans
What can DNA tell us?
- With one sample, you can fish for long-lost cousins in
three commercial pools and one non-commercial pool (GEDmatch.com).
- With two samples, you can confirm or disprove theories
about relationships (e.g. Richard III).
- It can potentially break through all sorts of genealogical
brick walls.
- For people with adoptions (or foundlings) in their family
history, it may be the only option to find their genetic family; such
people are disproportionately represented in all the databases, even
compared to the 2% adoption rate in the past.
- Those whose genetic ancestry has been
concealed from them have a right, and usually a great desire, to know
it.
- On the other hand, if there is a family secret that you, or
others
in your family, would like to remain a secret, then genetic genealogy
may not be for you.
Apart from genealogy?
- Medical risks
- Medical certainties (Cystic Fibrosis, Huntington's Disease,
haemochromatosis)
- Medically informative locations may not be the same as
ancestrally informative locations
- GINA (Genetic Information Non-discrimination Act)
- Ethnicity
- sensitive to the (unspecified) reference population
- distorted by recombination (e.g. 33% rather than 25%
from one grandparent)
- "genetic astrology"
- most Irish people are 100% Irish (or 100% British Isles according to one company)
- great marketing value in the U.S. melting pot
- Chimera - very rare instances of someone meant to
be a twin, so that parent/child relationships can look like uncle/nephew
- Chromosomal disorders
- Data-mining, lies, damned
lies and statistics
The big 3 DNA companies
- AncestryDNA
- Part of ancestry.com
Autosomal DNA only
- Very limited analysis tools
Overcharges non-U.S. customers
Internal messaging system
Reached 1 million samples in July 2015
- Most people use pseudonyms or initials
My results
- 23andMe
- Concentrates on medical aspects of DNA
- Autosomal DNA plus predicted Y-DNA and mtDNA haplogroups
Recently doubled prices
Overcharges non-U.S. customers
Optional internal messaging system
About 1 million samples
- Most people anonymous
- Analysis tools for non-anonymous matches
"Preparation for the transition to the new 23andMe experience" started in October 2015 and still ongoing as of November 2016
Results
Surname View
- FamilyTreeDNA (FTDNA)
- Dedicated to genetic genealogy
Autosomal DNA (Family Finder) plus various Y-DNA and mtDNA products
- Good analysis tools
Single worldwide price
- No U.S. bias
Simple e-mail communications
400,000+ samples (?250,000 Family Finder and 150,000 Y-DNA only or mtDNA only)
- Most people use real names: but married women recommended to use maiden surnames
- Projects - e.g. Clare Roots, Munster Irish, Pre-Great Famine Munster Ireland Project
My results
My cousin's results
What do you get for your money?
- An online list of DNA matches, ordered by closeness of
relationship
- Regular updates as the database grows
- Ethnicity estimates
- Raw data to use at third-party websites
- Various tools for analysing matches and data
- Surname projects (FTDNA only)
- Other projects (FTDNA only)
The third-party sites
GEDmatch.com
DNAgedcom.com
Levels of involvement
- Lists of names
- A black box algorithm can be used to list the names of
those in a database whose autosomal DNA is closest to yours.
- You can look at your matches' own names, their ancestral
surnames,
ancestral placenames and family trees, if they have made these
available.
- Anybody can do this.
- Lengths of half-identical regions
- To get full value from one's investment in DNA analysis,
one should move on from the purely qualitative approach of looking at
names and take a more quantitative approach.
- The first step is to look
at the percentages of the length of the genome on which one is
half-identical with a potential relative.
- The higher this percentage,
the closer the relationship is likely to be.
- Some basic arithmetic
skills are required for this.
- Locations of half-identical regions
- If three or more people are all half-identical to the
others on the same region, and if two or more of them are known
relatives, then it becomes far more likely that they are all descended
from a common ancestral couple.
- Furthermore, it can be inferred that
the DNA in the half-identical region has been inherited from either the
male or the female of that common ancestral couple.
- This may exercise your brain
cells a little more than the first two approaches.
- Raw data
- Sooner or later, the only answer to a particular DNA puzzle
will be to look at one's raw data, in the form of long sequences of
pairs of As, Cs, Gs and Ts, in order to work out exactly how and why
something happened.
- This is for the specialist.
Whatever level of involvement you choose, you have a responsibility to
provide your DNA matches with at least an outline pedigree chart
showing your direct ancestors (FamilyTreeDNA, AncestryDNA).
The easiest way to do this is to upload a GEDCOM file from your desktop
genealogy software.
Examples
Using autosomal DNA
O'Deas of Newtown
- My Family Finder results are dated 15 Nov 2013.
- My paternal and maternal first cousin's results are dated
Apr 2014.
- Paternal first cousin's closest match had an e-mail in a Texas
law firm.
- I mentioned this to his Limerick namesake on 2 Sep 2014 and
discovered that they were the same person!
- We compared what we knew and found that we both had
ancestors in Ballybrown and were both somehow related to John Smith
(1849-1909) of Adare.
- My GGGGgrandfather John Keas (c1777-1845) farmed first in
Conigar (now part of the Irish Cement site) and then in Ballyveloge,
where he first leased a 145-acre farm in 1819 (Registry of Deeds, book
840 page 259 deed 563759).
- John Keas was grandfather of John Smith.
- The match's GGgrandparents John Ryan and Bridget O'Dea married in Ballybrown, Lurriga
& Patrickswell Catholic parish on 21 Feb 1821 (no. 469).
- John Smith (who married into the business in 1877) employed
the young Denis Ryan (1858-1928), grandson of John Ryan and Bridget O'Dea, and is
reported to have later said to Denis Ryan: "Why didn't you tell me you
were related to me?"
- Lease of Ballyveloge "for and during and untill the full
end and term of the natural life and lives of Edward Keas, 2nd son, of
the said lessee and John Keas, 3rd son of said lessee and William Keas,
6th son of the said lessee" (Registry of Deeds, Book 858 page
327 deed 572827)
- Edward Keas remained in Conigar, but the rest of the Keas family moved around 1819 from a parish
with no surviving baptismal records today (Mungret) to an adjoining
parish with surviving baptismal records (Ballybrown, Lurriga &
Patrickswell).
- John Keas and John Keas Jnr., although apparently Catholic,
signed the minutes of a Vestry of the Established Church
legally called and held at Kilkeedy Church (now in ruins) on 17 Dec
1832.
- William Keas was baptised
in Ballybrown, Lurriga & Patrickswell on 31 Oct 1821
"ex Joanne Keas et Maria O'Dea".
- So John Smith and Denis Ryan each had an O'Dea grandmother!
- Were they sisters? Or is the age difference too large? Maria's last child was born the year that Bridget married.
- Almost certainly both were daughters of Edward O'Dea, after
whom they named sons.
- Relationship diagram
- But whatever became of William Keas?
Using mitochondrial DNA
- Which of three baptisms matches a marriage in the registers
of Ballybrown, Lurriga & Patrickswell Catholic parish?
- Margaret daughter of William Keas and Bridget Fitzgerald,
chr. 20 Jul 1833, mitochondrial descendant's DNA in the system
- Margaret daughter of Michael Keas and Elizabeth Hiffle,
chr. 23 Jan 1836
- Margaret daughter of John Keas Jnr. and Catherine Dundon,
chr. 27 Feb 1837
- Margaret Keas m. Patrick Heffernan, 1 Dec 1855,
mitochondrial descendant's results available
- mitochondrial cousins sought for Elizabeth Hiffle and
Catherine Dundon
- Similarly, Maria O'Dea has known mitochondrial descendants.
- Does Bridget O'Dea have any?
Conclusion: Why you should submit your DNA
- The value of DNA "testing"
to genealogists increases dramatically with the number of people from
the relevant geographical area and relevant extended family group
already in the DNA databases used.
- Submitting your DNA to a
database has significant positive externalities for existing and future
researchers.
- We need to persuade more
Limerick people to submit
DNA samples to the databases for purely genealogical purposes.
- Your descendants will be eternally
grateful to you for leaving them your DNA.
Further reading