Genetic Genealogy
6 p.m. Tuesday 8 February 2022 and Tuesday 15 February 2022
WWW version:
Recordings:
8 February (TBA); 15 February (TBA)
Outline
Review of beginners' session
|
male offspring |
female offspring |
sperm |
Y chromosome |
X chromosome |
22 paternal autosomal chromosomes |
egg |
X chromosome |
22 maternal autosomal chromosomes |
mitochondria |
DNA component |
Inheritance path |
Inherited by |
Y chromosome |
From father only (and only if male) |
males only |
autosomal chromosomes (autosomes) |
Equally from both parents |
everyone |
X chromosome(s) |
Unequally from both parents |
males x1, females x2 |
mitochondrial DNA |
From mother only |
everyone |
- Without recombination and mutations, all of us would have
identical DNA.
- Autosomal DNA is the cheapest component to analyse and has
rapidly become the most widely used in genealogy:
- matches with larger
shared centiMorgans (c.8cM-c.3600cM) are (on average) more
closely related;
- cousin matching using autosomal DNA will identify
relationships out to third cousin on all sides of your family,
and may identify more distant relationships;
- the technology cannot separate the paternal and maternal
autosomes;
- identical twins, parent/child relationships and most
full-sibling relationships can be identified unambiguously;
- for more distant relationships, probabilities can be assigned to the
various possibilities;
- this tool shows the distribution of
possible relationships for a given shared centiMorgan value.
- Y-DNA has been widely used for one name studies or surname
projects for much longer, but has also seen rapid recent
scientific advances:
- matches with smaller
genetic distance are (on average) more closely related;
- cousin matching using Y-DNA will identify relationships with
men of the same surname and same genetic origin, but may
identify surname/DNA switches and/or more distant
relationships, predating the era of surnames (1014+);
- many surnames have multiple genetic origins, for example
occupational surnames (Smith, Miller, Potter, etc.).
- Targeted mitochondrial DNA and X-DNA comparisons can be used
to solve more specialised problems.
- The DNA companies will turn your spit or swabs into a data
file, which can be compared with data files from the DNA of
other individuals:
Identity v. Anonymity
The trade-off
- There is a trade-off between:
- increasing your chances of finding long-lost cousins and
ancestors (and being found by long-lost cousins); and
- maintaining the privacy of your family history research and
DNA results.
- If you keep your DNA results or known family tree private,
then nobody will be able to find you and you will not be able to
find any DNA matches.
- If you want to be found, then you must let your potential
cousins see your DNA results and your known ancestors.
- If you give your matches no information, then they can not
help you.
- Some customers of the DNA companies appear to wish to maintain
a certain degree of privacy and anonymity.
- Others find it paradoxical that those trying to identify their
anonymous ancestors can be so concerned about anonymising their
own identity.
- FamilyTreeDNA.com (for customers who have not opted out) and
GEDmatch.com (for customers who have opted in) explicitly
allow familial comparisons of DNA recovered from a crime
scene or unidentified human remains:
- to identify the perpetrator of a violent crime against
another individual; or
- to identify human remains.
- Using the MyHeritage DNA Services for law
enforcement purposes ... is currently "strictly prohibited,
unless a court order is obtained".
- 23andMe's registration process assures customers that "We will
not provide information to law enforcement or regulatory
authorities unless required by law to comply with a valid court
order, subpoena, or search warrant for genetic or Personal
Information" and links to a Transparency Report.
Basic guidelines
The basic guidelines for successful use of the DNA websites include
the following:
- Reveal the DNA subject's birth surname:
- Most people inherit DNA with their birth surname, so identify
yourself as a minimum by your birth surname with an initial or a
title, e.g., P Waldron or Mr Waldron or Miss Durkan.
- Reveal the gender of the person who provided the DNA sample:
- A woman does not have a Y chromosome, so may ask a male
relative with the relevant surname to swab:
a father, brother, nephew, cousin, etc., if
her interest is in her maiden
surname; or
a husband, son, brother-in-law,
father-in-law, etc., if her interest is in her married surname.
Valuable additional inferences can potentially be drawn once it
is known whether two X chromosomes (female) or one X chromosome
and one Y chromosome (male) are potentially available for
comparison.
You must NOT attach a female name or a female's pedigree chart
to a male DNA sample (or vice
versa), as this causes untold confusion.
Be especially careful not to inadvertently link a male's Y-DNA
results with a female's autosomal DNA upload at
FamilyTreeDNA.com where error-checking does not look for this.
- Avoid providing irrelevant information:
- Your first name, married surname, adopted surname or marital
status reveal nothing about your DNA, so you may keep these
private if you wish.
- Avoid pseudonyms:
- They reduce the chances that your matches will bother to look
at your family tree, contact you or share the information about
your ancestry that they have and that you do not have.
- Use a photograph:
- If you upload a photograph (or any image) to your AncestryDNA
account before you receive your initial results, then the
photograph (hyperlinked to the match details) will appear on the
AncestryDNA Insights page of all your
matches for as long as you remain in their eight newest matches
with photographs.
- Be consistent and avoid unnecessary confusion:
- A real example (further anonymised):
- Ancestry username: tara1234
- AncestryDNA samples from mother and daughter (per e-mail
exchange)
- linked to pedigree charts of an aunt and niece
- appear to matches as M.R. (managed by tara1234) and D.C.
(managed by tara1234)
- neither of these are the real initials
- the daughter is an AncestryDNA match to her mother's
probable 4th cousin, but the mother is not (false negative?
fuzzy boundaries?)
- only one of the two kits is at GEDmatch
- GEDmatch alias and e-mail address both begin with Molly
- Molly is the dog's name
- it took me 300 days after the upload to GEDmatch to
associate the AncestryDNA and GEDmatch identities
- Keep all your DNA-related correspondence in a single
searchable e-mail archive
- Use the internal messaging system and
AncestryDNA/MyHeritage/23andMe/LivingDNA or Facebook messages
only to exchange e-mail addresses.
Managing your matches
- Our objectives are:
- to identify the most recent common ancestor or ancestral
couple shared with each DNA match,
- starting with the closest matches and those with shared
surnames and/or shared locations,
- thereby confirming that the DNA match is definitely a
documented cousin or closer relative,
- enabling either or both cousins to learn more about their
shared ancestors, and
- confirming or refuting (NPE) the archival and oral evidence
about each cousin's ancestry.
- In practice, this means
- assigning each DNA match and each DNA segment to the most
distant known ancestor through whom you inherited the shared
DNA;
- converting DNA matches into known relatives;
- using
- the tools provided by the DNA comparison websites;
- the tools provided by third parties; and
- your own genealogy database
in order to manage this process.
- My chromosome map
- You will eventually learn to distinguish between genealogically useful, misattributed and false
DNA matches.
Fishing in all the
gene pools
There are a growing number of DNA comparison websites and those
interested in finding long-lost relatives should be in all of
them, especially the largest ones.
Remember Murphy's Law of
Genetic Genealogy, coined while I was helping
an adoptee who is married to a Murphy:
If there are N DNA comparison websites and your DNA is
in N-1 of them, then your most important match will be in the Nth.
In the words of another widely used metaphor, there are many
online gene pools out there and there are many people who are in
only one or two of them; for maximum effect, particularly if you
are trying to find an unknown ancestor who has left no paper
trail, you must fish in all of these pools.
You must spit for the websites which do not allow data uploads:
You must download your data file from the website of whichever
laboratory you use and upload it to the websites which do allow
data uploads:
- GEDmatch.com
- If you have spat or swabbed for more
then one laboratory and if the laboratories use different
chips (i.e. observe different sets of SNPs), then you must:
- upload the data from all the laboratories to GEDmatch;
- sign up for Tier 1 for at least one month (USD10/month);
- use the "Combine multiple kits into 1 superkit NEW!"
option on the Tier 1 menu to create a "Combined" kit and
obtain more accurate results; and
- use the pencil icon beside the component kits on your home
page to set them to "Private" or "Research" so that they do
not clutter up your match list and those of others.
- Comparing the "Overlap" column in the one-to-many results
for the individual and combined kits will show how much more
accurate matches are.
- Some half-identical-by-chance and half-identical-by-omission
false matches will disappear (see here).
- FamilyTreeDNA.com
- MyHeritage.com
- LivingDNA.com
You must link your DNA match list and your pedigree chart and share them on
the major autosomal DNA comparison websites:
- As of 8 February 2022, I have:
- 19,170 MyHeritage matches
- 17,417 AncestryDNA matches
- 6,554 FTDNA Family Finder matches
- 3,000 GEDmatch matches (fixed)
- 1,504 23andMe matches (initially fixed at 1,500 with the
option to mark matches for retention)
- 770 LivingDNA matches
Both larger databases and less strict matching criteria result in
more matches.
- Add DNA information to your genealogy database:
-
- Record the ancestors and cousins confirmed by your DNA in
your genealogy database.
- Use an event field or note tag in your database to track
people who are in both your own database and the DNA
databases.
- Add genealogy information to the online DNA databases:
-
- Export a GEDCOM file containing at least the ancestors of
each DNA subject and upload it to all the DNA websites so
that matches can see a pedigree chart.
- Examples of pedigree charts: from Ancestral
Quest, AncestryDNA, ancestry.com,
FamilyTreeDNA and GEDmatch.
- For FamilyTreeDNA.com, include in the GEDCOM file any
known relatives already at FTDNA and the shared ancestors;
FTDNA will use these "linked relationships" to assign other
matches to the most recent common ancestral couple of the
DNA subject and the linked relationship, and will display
paternal and maternal icons as appropriate in the match list
(example).
- Mark deceased ancestors as such, even if you do not know
the date of death, otherwise they may be deemed living,
privatised, and hidden from DNA matches who are also
descended from them.
AncestryDNA tools
- Every new kit must be associated with a different e-mail
address.
- Test Settings
- Tree Link
- Download Raw DNA Data
- Sharing Preferences (+ Add a person)
- Match list
- Different, sometimes conflicting, "Predicted relationship"
displays!
- Four binary filters:
- Unviewed (by you or any of your collaborators)
- Common ancestors
(speculative hints,
based on user-donated trees, donated by users who are
encouraged to guess anything they don't know, so to be
treated with caution)
- Messaged
- Notes
- Three dropdown filters:
- Private/public linked/unlinked trees
- Shared DNA
- Groups
- Shows the number of matches in each Group
- My suggestions for using the 25
custom groups (gold star and 24 coloured dots)
- e.g. starred matches = known relatives
- Search Matches
- by Match name or by Surname in matches' trees or by Birth
location in matches' trees, if the location is available on
the dropdown
- search results appear incomplete for new kits
- Sort by "Relationship" (i.e. by shared centiMorgans) or by
"Date" (actual match date invisible!)
- Match page
- Shared Matches (>20cM with both parties)
MyHeritage.com tools
FamilyTreeDNA.com tools
Using autosomal
DNA shared matches, triangulation and phasing
For every autosomal DNA match, and for every autosomal DNA
segment, one would like to assign both to an ancestor:
- a match who is a known relative can be assigned to the most
distant known individual ancestor through whom you know
that you are related to the match;
- a match who is not a known relative can be assigned to the
most distant known individual ancestor through whom you are
likely to be related to the match;
- a segment shared with a match can initially be assigned to the
ancestor associated with the match; but
- segments shared with multiple matches can potentially be
assigned to more distant ancestors (e.g. a segment shared with a
third cousin as well as a second cousin can be assigned to a
greatgrandfather as well as to a grandfather).
Some genetic genealogists find it more intuitive to assign
matches and segments to ancestral couples (the most recent common
ancestral couple shared with the match) rather than individuals,
but:
- this approach ceases to be equivalent if there are half
relationships or double relationships; and
- each segment shared with the match is shared with only one of
the most recent common ancestral couple.
For example:
- you share two grandparents with your paternal first cousins;
but
- all of the DNA segments that you share with your paternal
first cousins descended to you through your father:
- some from your paternal grandfather; and
- the rest from your paternal grandmother.
My own family tree has numerous recent complications which force
me to think in terms of individuals rather than couples:
- my paternal grandfather was an identical twin;
- the identical twins married two sisters;
- the sisters' father married twice; and
- his two wives were first cousins.
Matches who are not known relatives can be tentatively assigned
to ancestors (or predicted) based on
- shared matches (on all the main DNA comparison websites);
and/or
- triangulated matches (on all the main DNA comparison websites
except ancestry.com).
I recommend using:
- stars (AncestryDNA, 23andMe, MyHeritage) to distinguish
- matches who are known relatives from
- matches who are not known relatives;
- coloured dots for ancestors:
- MyHeritage provides:
- 30 coloured dots (Labels), exactly enough for:
- 2 parents;
- 4 grandparents;
- 8 greatgrandparents; and
- 16 GGgrandparents.
- filtering by multiple labels selects matches with label A
OR label B
- AncestryDNA provides:
- built-in groups for "Father's side" and "Mother's side"
- 24 coloured dots (Groups), exactly enough for
- 0 grandparents;
- 8 greatgrandparents; and
- 16 GGgrandparents.
- filtering by multiple groups selects matches in group A
AND group B
- DNApainter provides:
- an unlimited number of colour-coded groups
So I use different methodologies:
- on MyHeritage:
- every known relative has a star and one dot (for the most
distant individual ancestor through whom I am related)
- other matches have a dot for every ancestor with whose
descendant there is a triangulated match
- but I remove more recent dots when I find triangulated
matches with more distant ancestors
- matches attributed to more distant ancestors than a
GGgrandparent also have notes
- on AncestryDNA:
- every known relative has a star and one dot for each
GGgrandparent shared or through whom I am related:
- fourth cousins (and more distant) have one dot
- third cousins (etc.) have two dots
- second cousins (etc.) have four dots
- first cousins (etc.) have eight dots
- siblings (etc.) have sixteen dots
- other matches
- are in the groups for "Father's side" and "Mother's side"
if I cannot attribute shared matches to an ancestor
beyond my parents
- just have a note if I cannot attribute shared
matches to an ancestor beyond my grandparents
- have a greatgrandparent dot if they have shared
matches attributed to a greatgrandparent
- have a GGgrandparent dot if they have shared
matches attributed to a GGgrandparent
- have a GGgrandparent dot and a note if they have shared
matches attributed to a more distant ancestor
- but I remove more recent dots when I find shared matches
with more distant ancestors
- on DNApainter:
- I use a similar philosophy, but have not come up with a good
methodology for distinguishing between confirmed and predicted
relationships.
A counter-example
Matches who end up with multiple dots (e.g. four GGgrandparents):
- may be quite closely related to you; or
- may be from the same geographical area and related to several
of your ancestors just by coincidence.
There is an exception to every rule: not only shared but even
triangulated matches can sometimes arise by coincidence.
Consider these three marriages in the United Church of England and
Ireland (the established church) in Kilkeedy, County Limerick:
- My GGGgrandfather Thomas Parker married my GGGgrandmother Mary
Keas on 14 Sep 1831
- Thomas's brother Francis Parker married Margaret Smith on 3
Mar 1840
- Mary's sister Ellen Keas married Joseph Smith on 26 May 1841
- The two Smiths were also siblings.
- I imagine these three couples sitting around a circular dinner
table, each person with a spouse to the left and a sibling to
the right.
- These three couples produced three families, each of which
were first cousins to the other two, but who didn't have a
single common ancestral couple!
- If one took DNA from one member of each of these three
families, two Parkers and a Smith, then each of them as first
cousins would share many half-identical regions with both of the
others.
- There would be some regions in which:
- the two Parkers had an identical segment from a Parker
grandparent,
- one of the Parkers and the Smith would have an identical
segment from a Smith grandparent, and
- the other Parker and the Smith would have an identical
segment from a Keas grandparent.
- In this example, the three cousins will all "match" each other
genetically, but on closer examination it will be found that
there is no common ancestral couple of all three.
However, these close triangular marriage patterns are very rare.
You may still find two distant relatives, related to you through
different ancestors, appearing as shared matches, because they are
related to each other through an ancestral couple whom they share
with each other, but whom neither shares with you.
Shared, or In
Common With (ICW), matches
A group of three or more individuals who all meet the relevant
matching criteria with each other are likely to share a recent common ancestor (or, more
often, ancestral couple).
The matches shared with a new match are usually the first clue to solving this
puzzle of assigning the new match to an ancestor.
All of the DNA comparison
websites allow one to identify the shared matches of two individuals in some form or
another.
The matching criteria vary from one DNA comparison website to
another.
The stricter the matching criteria, the more significant the
shared matches.
FTDNA Family Finder
To find the shared matches of two individuals who match each
other:
- go to the match list of one of the individuals
- find the In Common/Not In Common icon opposite the name on the
right of the screen
- select In Common With from the dropdown
- matches are sorted by closeness of relationship to the
logged-in individual
- A can see that B matches C but cannot see the cM shared by B
and C.
- Many of the hidden shared cM figures will be as low as 8cM,
which even FTDNA does not use for family matching based on
assigned relationships.
- A's shared matches with B will be the same as B's shared
matches with A, but in a different order.
You will eventually identify a group of individuals, all of whom
you suspect descend from a single common ancestor (or ancestral
couple).
To see whether up to 10 individuals who match you also match each
other:
- Add them to the Selected Matches box on the Family Finder - Matrix page
- To find the desired surname in the Matches box, click on any
match and start typing the surname
- After finding the surname, ctrl-click on the desired
individuals
- Click the Add>> button to move all the selected
individuals to the Selected Matches box
To find the shared matches and shared cM of two or more individuals
who belong to the same project, whether or not they match each
other, it was formerly possible to:
However, the shared matches option was removed from this menu in
2021 and it is unclear whether or when it may be restored.
AncestryDNA
On each match page, there is a Shared Matches link. (Screenshot.)
The Shared Matches are those with Shared DNA of 20 cM or more
with both individuals.
So C can appear in the shared matches of A and B even if B does
not appear in the shared matches of A and C (if C shares more than 20cM with A but B
shares less than 20cM
with A).
Matches are sorted by closeness of relationship to the logged-in
individual.
A can see that B matches C but cannot see the cM shared by B and
C.
MyHeritage
When the Review DNA Match page
eventually completes loading, the Shared DNA Matches section:
- reveals that "you share the following ... DNA Matches"
- lists the top 10 shared matches
- allows further shared matches to be slowly loaded, 10 at a
time
- may demand money, depending on when you uploaded or what
subscriptions you have purchased
Matches are sorted by the sum
of the centiMorgans shared with the two individuals.
A can see not only that B matches C but also the cM shared by B and
C.
A's shared matches with B will be exactly the same as B's shared
matches with A, in the same order.
GEDmatch
To find the shared matches of two individuals, whether or not
they match each other, use the "People who match both, or 1 of 2
kits" tool on the main menu.
This lists shared matches of Kit 1 and Kit 2, but the user sets
the cM threshold of largest segment and cM threshold of
total matching segments.
Matches are sorted by closeness of relationship to Kit 1.
To sort by closeness of relationship to Kit 2, re-use the tool with
the kits in reverse order.
If you login in one browser tab, and open the Multi Kit Analysis menu
in another tab (via a hyperlink or bookmark), then you can run an
Autosomal Matrix Comparison on up to 100 kits.
While the FTDNA user matrix shows only whether or not kits are
deemed to match, the former FTDNA administrator matrix and the
GEDmatch matrix both show the shared centiMorgans.
Triangulated matches
Triangulation and phasing are really opposite
sides of the same coin. If V is half-identical on the same
region with W and Z, then there are two possibilities:
- W and Z are half-identical to each other on this region, in
which case V, W and Z probably inherited an identical segment in
this region from a single common ancestor and the relationship
can be described as triangulated;
or
- W and Z are not
half-identical to each other on this region, in which case V is
probably related to W on V's paternal side and V is probably
related to Z on V's maternal side, or vice versa, and V's autosomal DNA in this
region can be phased.
The ADSA tool by Don Worth at DNAGedcom
provides a graphical representation of triangulation
and phasing, similar to DNApainter.
Triangulation groups
The ultimate objective is to collect DNA matches into triangulation groups. A
triangulation group is a set of three or more people who are all
half-identical to each of the other group members on overlapping
regions.
The more individuals who are added to the triangulation group,
the smaller the overlap may become.
A triangulation group of three or more individuals are very likely to share a recent
common ancestor (or, more often, ancestral couple).
The triangulated matches that I share with a new match are
usually the second clue
to identifying through which of my most distant known ancestors I
am related to the new match. Some
of the DNA comparison websites allow one to identify the triangulated matches of two
individuals in some form or another.
FTDNA Family Finder
One had to be a little devious to find triangulated matches
directly at FTDNA.
The Assigned Relationship (previously known as Linked Relationship) feature was designed to
identify matches who triangulate with known relatives, assign them
to the most distant individual ancestor through whom they are
related to the DNA subject, and then dump them all together again in
paternal and maternal buckets, even if they have been assigned to an
ancestor more distant than the father or mother.
You may want to have two Family Finder kits, e.g. a kit based on
swabs sent to FTDNA with a full pedigree chart for paternal/maternal
phasing; and a kit based on an autosomal transfer from another
laboratory with a minimal pedigree chart for identifying
triangulated matches (e.g. my B95575).
AncestryDNA
AncestryDNA refuses to provide any way of identifying triangulated
matches.
MyHeritage
MyHeritage shared match lists
include a symbol identifying which of these matches are
triangulated.
Make sure that you have not opted out of showing shared segments.
There appears to be no way to filter the list of shared matches
to show only the triangulated matches.
GEDmatch
- The Tier 1 Multi Kit Analysis
(MKA) menu allows you to:
- generate an autosomal DNA comparison matrix
- search for triangulations involving Kit 1 and any two or
more of Kits 2,3,4,...
- anyone who logs in in one browser tab can open MKA
in another tab.
Other third party tools
and websites
The major DNA websites do not like the load imposed on their servers
by the more powerful tools designed by third party developers.
Y-DNA and surname
projects
- Y-DNA tools and websites
- Y-DNA projects can be
- Once you have your initial Y-DNA results (or a known male-line
relative's Y-DNA results), you can join appropriate haplogroup
projects.
- Some older project member and project administrator features
have been disabled because of numerous changes prompted by GDPR
fears:
- You must Opt in to Sharing on the PROJECT PREFERENCES page or your
pseudonymized DNA results and ancestor information will be
missing from the public results pages.
- You can also choose from that page whether to give each
project administrator Minimum, Limited or Advanced access to
your kit; reducing access to Minimum pretty much eliminates
all the benefits of project membership.
- It is also recommended that you set Y-DNA Match Levels to
All Levels on the PRIVACY & SHARING page.
- If there is no surname project for your surname and you are
happy to deal with the spam risk, then you can apply to set up
your own project by following a simple five-step application process (which actually
consists of only four steps!).
- Every project has an activity feed for discussions between
members and administrators, which can be used by administrators
to avoid having to answer the same frequently asked questions
repeatedly via individual e-mails.
- Project administrators have valuable tools, including:
- a subgroup editor to arrange members on
the Y-DNA results pages, e.g. Clare Roots or Clancy surname
- subgroups are sorted alphabetically on the results pages,
so bear this in mind when choosing names
- to force subgroups into a desired order, number them with
leading zeroes, 001, 002, 003, etc.
- criteria for grouping can include:
- surname
- geography
- haplotree position, whether
- confirmed by FTDNA
- predicted by FTDNA
- predicted by project administrator
- desire to see STR differences highlighted
- Subgroup Names (which are visible on the results pages)
were originally truncated at 161 characters, without
warning. In November 2021, project administrators were
informed that the character limit for the Subgroup Name
field is now 200 characters. So keep these names as short as
possible with no unnecessary spacing or punctuation.
- Subgroup Descriptions (which are visible to the project
administrator(s) only) appear to be truncated at 973
characters, without warning, and despite the false assurance
of scroll bars in the editor.
- a Y-DNA genetic distance calculator:
- this has greater thresholds than the matching algorithm:
7/37 instead of 4/37; 25/67 instead of 7/67 and 40/111
instead of 10/111
- examples: R-M222 for a man with one Y-DNA37
match with no SNP test; R-FGC29367 for a man with no Y-DNA111
match.
- a public website editor to publish information under any or
all of the following headings:
- Background
- Goals
- News
- Updates
- Bulletin
- Results
- Code of Conduct
- FAQ
- Project members can be recruited in many ways:
- FTDNA will send an e-mail on behalf of an administrator,
no more than once every six months, to all customers with
the relevant surname who have opted to receive such e-mails.
- Administrators can see project members' matches and can
e-mail them directly to invite them to join.
- A clan or surname organisation or
one-name-study is ideally positioned to run online and
offline recruitment drives.
- See here for all the technical details
of how and why to upload your DNA data and pedigree charts to
the various websites.