Genetic Genealogy
6 p.m. Tuesday 2 February 2021 and
Tuesday 9 February 2021
WWW version:
Recordings:
2 February (Passcode required); 9 February (TBA)
Outline
Review of beginners'
session
|
male
offspring |
female
offspring |
sperm |
Y chromosome |
X chromosome |
22 paternal autosomal chromosomes |
egg |
X chromosome |
22 maternal autosomal chromosomes |
mitochondria |
DNA
component |
Inheritance
path |
Inherited
by |
Y chromosome |
From father only (and only if male) |
males only |
autosomal chromosomes (autosomes) |
Equally from both parents |
everyone |
X chromosome(s) |
Unequally from both parents |
males x1, females x2 |
mitochondrial DNA |
From mother only |
everyone |
- Without recombination and mutations, all of us would have
identical DNA.
- Autosomal DNA is the cheapest component to analyse and has
rapidly become the most widely used in genealogy:
- matches with larger shared centiMorgans (c.8cM-c.3600cM) are (on average) more closely related;
- cousin matching using autosomal DNA will identify
relationships
out to third cousin on all sides of your family, and may identify more
distant relationships;
- the technology cannot separate the paternal and maternal
autosomes;
- identical twins, parent/child relationships and most
full-sibling relationships can be identified unambiguously;
- for more distant relationships, probabilities can be assigned to
the various possibilities;
- this tool shows the distribution of possible relationships for a given shared centiMorgan value..
- Y-DNA has been widely used for one name studies or surname
projects for
much longer, but has also seen rapid recent scientific advances:
- matches with smaller genetic distance are (on average) more closely related;
- cousin matching using Y-DNA will identify relationships
with
men of the same surname and same genetic origin, but may identify
surname/DNA switches and/or more distant
relationships, predating the era of surnames (1014+);
- many surnames have multiple genetic origins, for example
occupational surnames (Smith, Miller, Potter, etc.).
- Targeted mitochondrial DNA and X-DNA comparisons can be
used to solve
more specialised
problems.
- The DNA companies will turn your spit or swabs into a data
file, which can be compared with data files from the DNA of other
individuals:
Identity
v. Anonymity
The trade-off
- There is a trade-off between:
- increasing your chances of finding
long-lost cousins and ancestors (and being found by long-lost cousins);
and
- maintaining the privacy of your family history research
and DNA results.
- If you keep your DNA
results or known family tree private, then nobody will be able to find
you and you will not
be able to find any DNA matches.
- If you want to be found, then you must
let your potential cousins see your DNA results and your known
ancestors.
- If you give your matches no information, then they can not
help you.
- Some customers of the DNA companies appear to wish to
maintain a
certain degree of privacy and anonymity.
- Others find it paradoxical that
those trying to identify their anonymous ancestors can be so concerned
about anonymising their own identity.
- FamilyTreeDNA.com (for customers who have not opted out) and GEDmatch.com (for
customers who have opted in) explicitly allow familial
comparisons of DNA recovered from a crime scene or unidentified human remains:
- to identify the
perpetrator of a violent crime against another individual; or
- to identify human remains.
- Using the MyHeritage DNA Services for law
enforcement purposes ... is currently "strictly prohibited, unless a court order is obtained".
- 23andMe's registration process assures customers that "We will
not provide information to law enforcement or regulatory authorities
unless required by law to comply with a valid court order, subpoena, or
search warrant for genetic or Personal Information" and links to a Transparency Report.
Basic rules
The basic rules for successful use of the DNA websites include the
following:
- Reveal the DNA subject's birth surname:
- Most people inherit DNA with
their birth surname, so identify yourself as a minimum by
your birth surname with an initial or a title, e.g., P Waldron or Mr
Waldron or Miss Durkan.
- Reveal the gender of the person who provided the DNA sample:
- A woman does not have a Y chromosome, so may ask a male
relative with the relevant surname to swab:
a father, brother, nephew, cousin, etc.,
if her interest is in her maiden
surname; or
a husband, son, brother-in-law,
father-in-law, etc., if her interest is in her married surname.
Valuable additional
inferences can potentially be drawn once it is known whether two X
chromosomes (female) or one X chromosome and one Y chromosome (male)
are potentially available for comparison.
You must NOT attach
a
female name or a female's pedigree chart to a male DNA sample (or vice versa), as
this causes untold confusion.
Be especially careful not to inadvertently link a male's Y-DNA results
with a female's autosomal DNA upload at FamilyTreeDNA.com where
error-checking does not look for this.
- Avoid providing irrelevant information:
- Your first name, married
surname, adopted
surname or marital status reveal nothing about your DNA, so you may
keep these private if you wish.
- Avoid pseudonyms:
- They reduce the chances that your matches will bother to
look at your family
tree, contact you or share the information about your ancestry that
they have and that you do not have.
- Use a photograph:
- If you upload a photograph (or any image) to your AncestryDNA account
before you receive your initial results, then the photograph
(hyperlinked to the match details) will appear on the AncestryDNA Insights page of all
your matches for as long as you remain in their eight newest matches with
photographs.
- Be consistent and avoid unnecessary confusion:
- A real example (further anonymised):
- Ancestry username: tara1234
- AncestryDNA samples from mother and daughter (per
e-mail
exchange)
- linked to pedigree charts of an aunt and niece
- appear to matches as M.R. (managed by tara1234) and
D.C. (managed by tara1234)
- neither of these are the real initials
- the daughter is an AncestryDNA match to her mother's
probable 4th cousin, but the mother is not (false negative? fuzzy
boundaries?)
- only one of the two kits is at GEDmatch
- GEDmatch alias and e-mail address both begin with Molly
- Molly is the dog's name
- it took me 300 days after the upload to GEDmatch to
associate the AncestryDNA and GEDmatch identities
- Keep all your DNA-related correspondence in a single
searchable e-mail archive
- Use the internal messaging system and
AncestryDNA/MyHeritage/23andMe/LivingDNA or Facebook messages only to exchange
e-mail addresses.
Managing
your matches
- Our objectives are:
- to identify the most recent common ancestor or ancestral
couple shared with each DNA match,
- starting with the closest matches and those with shared
surnames and/or shared locations,
- thereby confirming that the DNA match is definitely a
documented cousin or
closer relative,
- enabling either or both cousins to learn more about their
shared ancestors, and
- confirming or refuting (NPE) the archival and oral
evidence about each cousin's ancestry.
- In practice, this means
- assigning each DNA match and each DNA segment to the most
distant known ancestor through whom you inherited the shared DNA;
- identifying the most recent common ancestral couple
shared with each DNA match;
- converting DNA matches into known relatives;
- using
- the tools provided by the DNA comparison websites;
- the tools provided by third parties; and
- your own genealogy database
to manage this process.
- My chromosome map
- You will eventually learn to distinguish between genealogically useful, misattributed and false DNA matches.
Fishing
in all the gene pools
There are a growing number of DNA comparison websites and
those interested in finding long-lost relatives should be in all of
them, especially the largest ones.
Remember Murphy's
Law of Genetic Genealogy, coined
while I was helping an adoptee who is married to a Murphy:
If there are N DNA comparison websites and your DNA
is in N-1 of them, then your most important match will be in the Nth.
In the words of another widely used metaphor, there
are many
online gene pools out there and there are many people who are in only
one or two of them; for maximum effect, particularly if you are trying
to find an unknown ancestor who has left no paper trail, you must fish
in all of these
pools.
You must spit for the websites which do not allow data uploads:
You must download your data file from the website of whichever
laboratory you use and upload it to the websites which
do allow data uploads:
You must link your DNA
match list and
your pedigree chart
and share them on the major autosomal DNA comparison websites:
- As of 2 February 2021, I have:
- 16,745 MyHeritage matches
- 15,516 AncestryDNA matches
- exactly 5,000 FTDNA Family Finder matches
- 3,000 GEDmatch matches (fixed)
- 585 LivingDNA matches
Both larger databases and less strict matching criteria result in more matches.
- Add DNA information to your genealogy database:
-
- Record the ancestors and cousins confirmed by your DNA
in your genealogy
database.
- Use an event field or note tag in your database to
track
people who are in both your own database and the DNA databases.
- Add genealogy information to the online DNA databases:
-
- Export a GEDCOM file containing at least the ancestors
of each DNA subject and upload it to all the DNA websites so that
matches can see a pedigree chart.
- Examples of pedigree charts: from Ancestral
Quest, AncestryDNA, ancestry.com,
FamilyTreeDNA
and GEDmatch.
- For
FamilyTreeDNA.com, include in the GEDCOM file any known relatives
already at FTDNA and the shared ancestors; FTDNA will use these
"linked relationships" to assign other matches to the most recent
common ancestral couple of the DNA subject and the linked relationship,
and will display
paternal and
maternal icons as appropriate in the match list (example).
- Mark deceased ancestors as such, even if you do not
know the date of death, otherwise they may be deemed living,
privatised, and hidden from DNA matches who are also descended from
them.
AncestryDNA
tools
- Every new kit must be associated with a different e-mail
address.
- Test Settings
- Tree Link
- Download Raw DNA Data
- Sharing Preferences (+ Add a person)
- Match list
- Three different, sometimes conflicting, "Predicted relationship" displays!
- Four binary filters:
- Unviewed (by you or any of your collaborators)
- Common ancestors (speculative hints, to be treated
with
caution)
- Messaged
- Notes
- Three dropdown filters:
- Private/public linked/unlinked trees
- Shared DNA
- Groups
- Shows the number of matches in each Group
- My suggestions for using the 25
custom groups (gold star and 24 coloured dots)
- e.g. starred matches = known relatives
- Search Matches
- by Match name or by Surname in matches'
trees or by Birth location in matches' trees, if
the location is available on the dropdown
- search results appear incomplete for new kits
- Sort by "Relationship" (i.e. by shared centiMorgans) or by
"Date" (actual match date invisible!)
- Match page
- Shared Matches (>20cM with both parties)
MyHeritage.com
tools
FamilyTreeDNA.com
tools
Using
autosomal DNA shared matches, triangulation and phasing
A counter-example
Three marriages in the United Church of England and Ireland in Kilkeedy, County Limerick:
- My GGGgrandfather Thomas
Parker married my GGGgrandmother Mary Keas on 14 Sep 1831
- Thomas's brother Francis Parker married Margaret Smith on 3 Mar 1840
- Mary's sister Ellen Keas married Joseph Smith on 26 May 1841
- The
two Smiths were also siblings.
- These three couples produced three families, each of which were
first cousins to the other two, but who didn't have a single common
ancestral couple!
- If one took DNA from one member of each of these
three families, two Parkers and a Smith, then each of them as first
cousins would share many half-identical regions with
both of the others.
- There would be some regions in which:
- the two
Parkers
had an identical segment from a Parker grandparent,
- one of the Parkers
and the
Smith would have an identical segment from a Smith grandparent, and
- the
other Parker and the Smith would have an identical segment from a Keas
grandparent.
- In this example, the three cousins
will all "match" each other genetically, but on closer examination it
will be found that there is no common ancestral couple of all three.
However, these triangular marriage patterns are very rare.
Shared,
or In Common With (ICW), matches
A group of three or more individuals who all meet the relevant
matching
criteria with each other are likely
to share a recent common ancestor (or,
more often, ancestral couple).
When I find a new match, I am usually anxious to identify
the most distant known ancestor through whom I am related to
the
new match.
The
matches that I share with the new match are usually the first clue to
solving this puzzle.
All
of the DNA
comparison websites allow one to identify the shared matches of
two individuals in
some form
or another.
The matching criteria vary from one DNA comparison website to
another.
The stricter the matching criteria, the more significant the
shared matches.
FTDNA
Family Finder
To find the shared matches of two individuals who match each
other:
- go to the match list of one of the
individuals
- tick the box opposite the other individual
- "Reset filter" if necessary
- click the In Common With button on the 5th line from the
top of the
window
- matches are sorted by closeness of relationship to the
logged-in individual
- A can see that B matches C but cannot see the cM shared by
B and C.
- Many of the hidden shared cM figures will be as low as 8cM, which
even FTDNA does not use for family matching based on linked
relationships.
- A's shared matches with B will be the same as B's shared
matches with A, but in a different order.
You will eventually identify a group of individuals, all of
whom you
suspect descend from a single common ancestor (or ancestral couple).
To see whether up to 10 individuals who match you also match
each other:
- Add them to the Selected Matches box on the Family Finder - Matrix page
- To find the desired surname in the Matches box, click on
any match and start typing the surname
- After finding the surname, ctrl-click on the desired
individuals
- Click the Add>> button to move all the
selected individuals to the Selected Matches box
To find the shared matches and shared cM of two or more individuals who
belong to the same project, whether or not they match each other:
AncestryDNA
On each match page, there is a Shared Matches link. (Screenshot.)
The Shared Matches are those with Shared DNA of 20 cM or more
with both individuals.
So C can appear in the shared matches of A and B even if B
does not appear in the shared matches of A and C (if C shares more than
20cM with A but B shares less
than 20cM with A).
Matches are sorted by closeness of relationship to the
logged-in individual.
A can see that B matches C but cannot see the cM shared by B
and C.
MyHeritage
When the Review DNA Match page eventually
completes loading, the Shared DNA Matches section:
- reveals that "you share the following 2,052 DNA Matches"
- lists the top 10 shared matches
- allows further shared matches to be slowly loaded, 10 at a
time
- may demand money, depending on when you uploaded or what
subscriptions you have purchased
Matches are sorted by the sum
of the centiMorgans shared
with the two individuals.
A can see not only that B matches C but also the cM shared by B and C.
A's shared matches with B will be exactly the same as B's shared
matches with A, in the same order.
GEDmatch
To find the shared matches of two individuals, whether or not
they match each
other, use the "People who match both, or 1 of 2 kits" tool on the main
menu.
This lists shared
matches of Kit 1 and Kit 2, but the user sets the cM threshold of
largest segment and cM threshold of total matching segments.
Matches are sorted by closeness of relationship to Kit 1.
To sort by closeness of relationship to Kit 2, re-use the tool with the
kits in reverse order.
If you login in one browser tab, and open
the Multi Kit Analysis menu in another
tab (via a hyperlink or bookmark), then you can run an Autosomal Matrix
Comparison on up to 100 kits.
While the FTDNA user matrix shows only whether or not kits are deemed
to match, the
FTDNA administrator matrix and the GEDmatch matrix shows the shared
centiMorgans.
Triangulated
matches
Triangulation
and phasing
are
really opposite sides of the same coin. If V is
half-identical on the same region with W and Z, then there are two
possibilities:
- W and Z are half-identical to each other on this region, in
which case V, W and Z probably inherited an identical segment in this
region from a single common ancestor and the relationship can be
described as triangulated;
or
- W and Z are not
half-identical to each other on this region, in which case V is
probably
related to W on V's paternal side and V is probably related to Z on V's
maternal side, or vice
versa, and V's autosomal DNA in this region can be phased.
The ADSA tool by Don Worth at DNAGedcom
provides a graphical
representation of triangulation and phasing.
Triangulation
groups
The ultimate objective is to collect DNA matches into triangulation groups.
A triangulation group is a set of three or more people who are all
half-identical to each of the
other group members on overlapping regions.
The more individuals who are added to the triangulation group,
the smaller the overlap may become.
A triangulation group of three or more individuals are very likely to share
a recent common ancestor (or,
more often, ancestral couple).
The triangulated matches that I share with a
new match are
usually the second
clue to identifying through which of my most distant
known ancestors I am related to the new match.
Some of the
DNA
comparison websites allow one to identify the triangulated matches
of two individuals
in some form
or another.
FTDNA
Family Finder
One had to be a little devious to find triangulated matches
directly at
FTDNA.
The Linked Relationship feature was
designed to identify matches who
triangulate with known relatives, and then dump them all
together again in paternal and maternal buckets.
You may
want to have two Family Finder
kits, e.g. a kit based on swabs sent to FTDNA with a full pedigree
chart for paternal/maternal phasing; and a kit based on an autosomal
transfer from another laboratory
with a minimal pedigree chart for
identifying triangulated
matches (e.g. my B95575).
AncestryDNA
AncestryDNA refuses to provide any way of identifying triangulated
matches.
MyHeritage
MyHeritage shared match lists include a
symbol identifying
which of these matches are triangulated.
Make sure that you have not opted out of showing shared segments.
There appears to be no way to filter the list of shared
matches to show only the triangulated matches.
GEDmatch
- The Tier 1 Multi Kit Analysis (MKA) menu
allows you
to:
- generate an autosomal DNA comparison matrix
- search for triangulations involving Kit 1 and any two
or more of Kits 2,3,4,...
- anyone who logs
in in one browser tab can open MKA in another tab.
The basic tools are free to all users:
- User Registration
- Generic Uploads (23andme, FTDNA, AncestryDNA, most others)
- Upload GEDCOM (Fast)
- One-To-Many DNA Comparison Result
- One-to-One Autosomal DNA Comparison
- One-to-One X-DNA Comparison (see a tale of two X-chromosomes).
- People who match both or 1 of 2 kits
- Are your parents related? (e.g. T409076, whose parents were second cousins)
Access to these Tier 1 tools costs USD10/month:
- the Triangulation
tool allows you to find all triangulation groups for the selected kit
at the selected thresholds
- the Segment Search tool allows more
flexible manual investigation of phasing and triangulation
- the Lazarus tool attempts to resurrect
the DNA of a deceased ancestor
Other
third party tools and websites
The major DNA websites do not like the load imposed on their servers by
the more powerful tools designed by third party developers.
- Autosomal tools and websites
- Y-DNA tools and websites
Y-DNA
and surname projects
- Y-DNA projects can be
- Once you have your initial Y-DNA
results (or a known male-line relative's Y-DNA results), you can join
appropriate haplogroup projects.
- Some older project member and project administrator
features have been
disabled because of numerous changes prompted by GDPR fears:
- You must Opt in to Sharing on the PROJECT PREFERENCES page or your
pseudonymized DNA results and ancestor information will be missing from
the public results pages.
- You can also choose from that page whether to give each
project
administrator Minimum, Limited or Advanced access to your kit; reducing
access to Minimum pretty much eliminates all the benefits of project
membership.
- It is also recommended that you set Y-DNA Match Levels to
All Levels on
the PRIVACY & SHARING page.
- If there is no surname project for your surname and you are
happy to
deal with the spam risk, then you can apply to
set up your own project by following a simple five-step application process (which
actually consists of only four steps!).
- Every project has an activity feed for discussions
between
members and administrators, which can be used by administrators to
avoid having to answer the same frequently asked questions repeatedly
via individual e-mails.
- Project administrators have valuable tools, including:
- a subgroup editor to arrange members
on the Y-DNA results
pages, e.g. Clare Roots or Clancy surname
- subgroups are sorted alphabetically on the results
pages, so bear this in mind when choosing names
- to force subgroups into a desired order, number them with leading zeroes, 001, 002, 003, etc.
- criteria for grouping can include:
- surname
- geography
- haplotree position, whether
- confirmed by FTDNA
- predicted by FTDNA
- predicted by project administrator
- desire to see STR differences highlighted
- Subgroup Names (which are visible on the results pages)
appear to be
truncated at 161 characters, without warning. So keep these names as
short as possible with no unnecessary spacing or punctuation.
- Subgroup Descriptions (which are visible to the project
administrator(s) only) appear to be truncated at 973 characters,
without warning, and despite the false assurance of scroll bars in the
editor.
- a Y-DNA genetic distance calculator:
- this has greater thresholds than the matching
algorithm:
7/37 instead of
4/37; 25/67 instead of 7/67 and 40/111 instead of 10/111
- examples: R-M222
for a man with one Y-DNA37
match with no SNP test; R-FGC29367
for a man with no
Y-DNA111 match.
- a public website editor to publish information under any
or all of the following headings:
- Background
- Goals
- News
- Updates
- Bulletin
- Results
- Code of Conduct
- FAQ
- Project members can be recruited in many ways:
- FTDNA will send an e-mail on behalf of an
administrator, no
more than once every six months, to all customers with the relevant
surname who have opted to receive such e-mails.
- Administrators can see project members' matches and can
e-mail them directly to invite them to join.
- A clan or surname organisation or
one-name-study is ideally positioned to run
online and offline recruitment drives.
- See here for all the technical details
of how and why to
upload your DNA data and pedigree charts to the various websites.