Getting the most from your DNA kits
2:00 p.m. GMT Saturday 23 November 2024
Croke Park Hotel, Dublin
WWW version:
YouTube version:
TBA
Outline
The basics:
kits, tests, results and matches
- The term DNA kit can refer to three things:
- a spit kit, such as used by AncestryDNA and 23andMe to
collect DNA samples; or
- a swab kit, such as used by FamilyTreeDNA and MyHeritage to
collect DNA samples; or
- a computer file with a unique alphanumeric identifier
produced by these companies from the spit or swab kit.
- The term DNA test can refer to:
- testing a genealogical hypothesis using statistical
techniques and DNA evidence from two or more individuals; or
- something purchased from a DNA company, usually comprising a
single DNA kit and follow-up services, but usually not
including hypothesis testing.
- The customer must do his or her own hypothesis testing after
receiving DNA results, which usually include:
- a list of DNA matches, who are other
customers of the DNA company, ranked by closeness of estimated
relationship, out to some arbitrary threshold (designed by the
genealogy department);
- estimated ethinicity percentages, whatever that may mean
(designed by the marketing department):
- Does ethnicity refer to:
- where people lived,
- what language they spoke, and/or
- what religion they practised?
- Does ethnicity refer to:
- today's geographical boundaries, or
- the geographical boundaries of distant ancestors, who
didn't face today's strict border controls?
- Does ethnicity refer to:
- parents?
- grandparents?
- GGGGGGgrandparents?
- our most distant human ancestors?
- Percentages of what?
- Are we not all 100% African if we go back far enough?
- Due both to ambiguous definitions and to recombination of
paternal and maternal DNA, these estimated ethnicity
percentages have far bigger margins of error (and far more
dramatic fluctuations over time) than opinion polls.
- A customer who wants to get the most out of his or her
purchase must:
- attach a pedigree chart to each DNA kit to aid his or her
DNA matches;
- one by one, assign his or her own DNA matches and DNA
segments to the most distant possible ancestor, e.g. using the
coloured dots on the DNA website; and
- turn each DNA match into a known relative,
e.g. using the gold star (⭐) on the DNA website or here.
- The best hypothesis-testing website is WATO.
- DNA can accept or reject hypotheses and confirm
relationships, but it can rarely definitively prove
anything beyond the closest relationships.
- The DNA companies love to exaggerate the power and the
scientific rigour of what they do for their customers, e.g. in this undated article
brought to my attention in September 2024, AncestryDNA
announced:
We felt the word “estimate” didn’t best reflect the
scientific rigor we applied to determining your results and
that it left people less certain about their results.
- My reaction.
Components
of DNA and inheritance paths
- Humans have mitochondrial DNA and 23 pairs of chromosomes, all
represented by strings of the letters ACGT.
- The chromosomes comprise 22 pairs of autosomes and one pair of
sex chromosomes.
Every child inherits DNA from his or her parents, but with
occasional mutations.
That DNA comprises the following components:
- Sex chromosomes
- Everyone has two sex chromosomes: males XY, females XX.
- Y chromosome
- Only males have a Y chromosome.
Y-DNA comes down the patrilineal line - from father, father's
father, father's father's father, etc.
This is the same inheritance path as followed by surnames,
grants of arms, peerages, etc., so Y-DNA is used for surname
projects.
- X chromosome
- Males have one X chromosome, females have two.
X-DNA may come through any ancestral path that does not contain
two consecutive males.
Blaine Bettinger's colour-coded blank fan-style pedigree
charts show the ancestors from whom men and women can potentially inherit X-DNA.
- Autosomes
- Short for autosomal chromosomes
- Exactly 50% of autosomal DNA (atDNA) comes from the father and
exactly 50% comes from the mother.
Due to recombination, on
average 25% comes from each grandparent, on average 12.5% comes from
each greatgrandparent, and so on.
- In extreme cases, an individual can inherit up to 35% from one
paternal grandparent and, hence, as little as 15% from the other
paternal grandparent.
Siblings each inherit 50% of their parents' autosomal DNA, but
not the same 50% (except for identical twins).
Similarly, siblings each inherit 50% of their mother's X-DNA,
but not the same 50% (except for identical twins).
Sisters each inherit 100% of their father's X-DNA.
Hence, autosomal DNA is used to produce estimated ethnicity
percentages.
- Mitochondria
- Everyone has mitochondrial DNA (mtDNA).
- Mitochondrial DNA comes down the matrilineal line - from
mother, mother's mother, mother's mother's mother, etc.
The surname typically changes with every generation in this
line.
The following table summarises these critical distinctions:
DNA component |
Inheritance path |
Inherited by |
Y chromosome |
From father only (and only if male) |
males only |
autosomal chromosomes |
Equally from both parents |
everyone |
X chromosome(s) |
Unequally from both parents |
males x1, females x2 |
mitochondrial DNA |
From mother only |
everyone |
Two special types of mutations are used in DNA comparisons:
- Short Tandem Repeat (STR): a string of letters consisting of
the same short substring repeated several times
- for example, CCTGCCTGCCTGCCTGCCTGCCTGCCTG is CCTG repeated
seven times; it may be repeated less or more often in other
individuals
- used in early Y-DNA comparisons
- Single Nucleotide Polymorphism (SNP): a single location where
two (or occasionally more than two) different letters are
observed in different individuals
- for example, a single A in the parent may be replaced by a C
in the child
- a SNP mutation on the Y chromosome is inherited by all of
the son's future male-line descendants, until the next
mutation at the same location
- more powerful but more expensive tool for Y-DNA comparisons
- also the basis of autosomal DNA comparisons
Fishing in all the
gene pools
The DNA companies market what they do as more a fishing trip than
a hypothesis-testing exercise.
As autosomal DNA has the widest reach, it is best suited to the
fishing-trip model.
As Y-DNA, mtDNA and X-DNA have a narrower but deeper reach, they
are best suited to testing specific hypotheses.
Remember Murphy's Law of
Genetic Genealogy, coined while I was helping an
adoptee who is married to a Murphy:
If there are N DNA comparison websites and your DNA is
in N-1 of them, then your most important match will be in the Nth.
In the words of another widely used metaphor, there are many
online gene pools out there and there are many people who are in
only one or two of them; for maximum effect, particularly if you
are trying to find descendants of an unknown ancestor who has left
no paper trail, you must fish in all of these pools.
- For autosomal DNA:
- Spit for
- Upload the resulting data files to the other websites.
- For detailed guidelines on uploading and downloading DNA
data and GEDCOM files, see here.
- For Y-DNA:
- if you are male or can find a willing male relative or
in-law with the surname of interest:
- swab for Big Y-700 from FamilyTreeDNA.com
- or you may be happy with the ancient Y-SNPs detected by
FTDNA Family Finder, 23andMe and LivingDNA
- For mtDNA:
- swab for FamilyTreeDNA.com
- or you may be happy with the ancient information detected by
FTDNA Family Finder (coming soon), 23andMe and LivingDNA
How
closely are you related to your DNA matches?
- centiMorgans
- larger centiMorgan values represent closer autosomal
matches
- beyond parent/child and full-sibling relationships, the
centiMorgan value will give only a range of possible
relationship groups
- the DNA companies perceive and encourage a demand for more
precise point estimates of relationships
- thednageek.com
- dnapainter.com
- genetic distance
- smaller genetic distances represent closer Y-STR
matches
- the block tree of Y-DNA SNPs
The big 5 DNA
laboratories
Third party DNA websites
Surnames and DNA
- segments of autosomal DNA never follow surnames for more than
a small number of generations, eventually and inevitably coming
from a female ancestor with a different birth surname if one
goes back far enough.
- segments of X-DNA cannot normally follow the birth surname for
more than two generations (two individuals, one link), as they
cannot be passed from father to son.
- mitochondrial DNA normally never follows the birth surname, as
it is only ever passed from mother to child.
- Y-DNA in principle always follows the surname, apart from
surname/DNA switches.
Hence, when grouping autosomal matches, you should assign them to
specific ancestors, not to surnames.
- Marrinan: a successful surname/Y-DNA project
- There have been major updates this week
to the FTDNA project interface, with numerous bugs still to be
ironed out.
Advantages
and disadvantages of the various DNA companies and websites
- potential number of matches
- raw data
- available from all the laboratories
- segment data, currently available in bulk from
- FamilyTreeDNA
- GEDmatch
(download links available in DNApainter.com settings)
- shared matches
- available from all the laboratories
- triangulated matches and chromosome browser
- not available from AncestryDNA
- login timeouts:
- infinite at AncestryDNA and MyHeritage
- a month at WikiTree.com
- 7 days at DNApainter.com
- at most a few hours at the other websites
- data sharing, data protection and hacking
- the "ownership" of DNA data is a deep philosophic question:
who owns your DNA?
- you?
- the first degree relatives who may share up to 50% of your
DNA (100% if you are an identical twin)?
- the company which extracts the data from your DNA sample?
- AncestryDNA's patronising statement:
I understand that after I download my DNA
Data, it will no longer be protected by Ancestry and I
will assume all risk of storing, securing, and
protecting this information.
- do I or the brother than I have not seen for decades own
the roughly 50% of our DNA that we share?
- does either of us have the right to stop the other from
doing as we please with that data?
- is either of us obliged to seek the other's permission to
do so?
- Opinions may differ
- using email addresses as usernames has proven dangerous
- 23andMe got a lot of flak when information which it had
discretion to share with anyone it deemed a match was shared
with a hacker who obtained passwords associated on other
websites with the email addresses of matches.
- two-factor-authorisation
- username and password always constituted two-factor
authorisation, until users were forced to use an email
address as their usernames
- allowing both Google and Facebook as a second factor
allows some collaboration
- sharing match lists:
- FTDNA projects can be set up by anyone and allow project
administrators access to all the kits in a project under a
single login
- different projects for surnames, geographical areas and
Y-SNPs are encouraged, but
- GDPR is cited for making life awkward for administrators
of multiple projects.
- MyHeritage allows a match list to be shared with exactly
one collaborator.
- AncestryDNA allows a match list to be shared with an
unlimited number of collaborators.
- some genetic genealogists are reluctant to show information
on their matches in a talk such as this, even if it could be
shown to everyone in the room by sending them individual
sharing/viewing invitations.
What does it cost?
- manufacturers of printers use the printer as a loss leader and
make profits from ink;
- suppliers of mobile telephone services use the handset as a
loss leader and make profits from monthly subscriptions and
top-ups; and
- DNA laboratories are now moving towards using data extraction
in the laboratory as a loss leader and making profits from
monthly or annual website subscriptions.
- Price discrimination:
- the same company's prices can vary depending on the
customer's location;
- reinforced by including return postage to a local depot in
the upfront cost;
- Aer Lingus and An Post have been blamed for widespread
reports of Irish DNA samples not reaching American
laboratories;
- some people prefer to find a friendly returning American to
put the kit in his or her baggage and mail it on arrival;
- you should include FTDNA's explanatory letter
when you mail your package.
- Price competition:
- there is strong competition in the autosomal DNA matching
market and up-front prices are low;
- there is a virtual monopoly in the Y-DNA matching market;
- is it just coincidence that analysing one Y-chromosome costs
many times more than analysing 22 pairs of autosomal
chromosomes?
- The latest AncestryDNA premium feature is "Shared Matches PRO"
and I already cannot imagine using AncestryDNA without it.