A Sceptic's Adventures in Genetic Genealogy
Chapter 1:
Thoughts and questions on DNA sampling, testing and marketing
by Paddy Waldron
Last updated: 11 Dec 2013
Background
Having been increasingly addicted to genealogy from the age of 12 or earlier
and having a degree in mathematical sciences with a particular interest in
probability and statistics, it was inevitable that I would develop an interest
in DNA and in genetic genealogy.
I have attended various lectures on these subjects over the past several
years, and have read lots of explanations, often ending up more confused rather
than less confused after an effort to improve my understanding. I have still
not found the inspirational book or inspirational teacher that suddenly fits
everything into place within the context of my prior knowledge, such as
happened with probability and statistics when I took Adrian Raftery's course
(251) as a third year undergraduate at Trinity College Dublin back in 1983/4. (In the
genetic genealogy field, my brief exposure to lectures by Maurice Gleeson and Dan
Bradley has, however, helped a lot.)
On the third day of the joint Back To Our Past (BTOP) and Genetic Genealogy Ireland
2013 shows at the Royal Dublin
Society (20 Oct 2013), Kathy Borges of the the
International Society of
Genetic Genealogists (ISOGG) persuaded me to submit my DNA to FamilyTreeDNA and to
purchase Y-DNA and autosomal DNA products. Notification arrived by e-mail that
my autosomal DNA results were available on 16 Nov 2013 and that my Y-DNA
results were available on 21 Nov 2013.
I should probably try to weave my thoughts and the answers to my questions
into the ISOGG Wiki, but for now
I have more questions than answers and it is much quicker and easier to post
them all together here in these new web pages on my own personal web site
documenting my adventures as a sceptic in genetic genealogy. I hope that this
first chapter will help to dispel some myths, in
particular about the need for a little jargon; will help
customers and FamilyTreeDNA respectively to get more out of the FamilyTreeDNA.com website and to fix some of its
shortcomings; and will get me some feedback about interpreting my own autosomal DNA results, or lack thereof. First,
some definitions will help.
Definitions
If you are reading this page, you hopefully have some basic understanding of
DNA and of genetic genealogy. For those who don't, I had better begin by
outlining some basics.
DNA is material contained within human cells and inherited by
children from their parents. Genetic genealogy is the use of DNA to
assist genealogical research. For the purposes of genetic genealogy, DNA is
represented by long strings of the letters A, C, G and T. These long strings
are divided naturally into shorter strings called chromosomes.
There are four main types of DNA, which each have very different inheritance
paths:
- Y-DNA
- Males have one Y chromosome containing Y-DNA and one X chromosome
containing X-DNA. Females have two X chromosomes, but do not have a Y
chromosome. Y-DNA is inherited patrilineally by
sons from their fathers, and so on, "back to Adam". (Most geneticists are
not creationists, but the concept of "Adam" is still useful and used!)
Some people are actually confused by the simple concept that Y-DNA
follows the male line, and even by the simpler concept that in most
cultures the surname follows the same male line. If you belong to (or
join) the relevant facebook groups, you can read about examples of this
confusion in discussions in the County
Clare Ireland Genealogy group and The
Waldron Clan Association group.
- X-DNA
- Every male inherits his single X chromosome from his mother. Every
female inherits two X chromosomes, one each from her father and from her
mother.
- mtDNA
- Similarly to Y-DNA, mitochondrial DNA (or mtDNA for short) is inherited
matrilineally,
but by both sons and daughters, from their mothers, and so on, "back to
Eve".
- Autosomal DNA
- Autosomal DNA (or atDNA for short) is inherited by everyone in 22 pairs
of chromosomes. One chromosome in each pair comes from the father and the
other from the mother. These chromsomes are further broken down, because
of the random process of recombination, into many smaller
segments represented by shorter strings of A, C, G and T (less
than 1000 segments of genealogical value per individual). The segments in
the paternal chromosomes come roughly equally from both paternal
grandparents, and those in the maternal chromosomes likewise come roughly
equally from both maternal grandparents. Segments come ultimately from
all ancestors in recent generations, but those large enough to be of
genealogical value can be traced back to a vanishingly small proportion
of the exponentially increasing number of ancestors in earlier
generations.
The word test is used with many different meanings in many different
fields. To a scientist or a medic, it may be a deterministic test with a
definite positive or negative outcome. To a statistician, it is a hypothesis
test which can accept or reject (but not prove or disprove) a hypothesis
based on the observed outcome of one or more random experiments. The word is
used loosely by genetic genealogists with other meanings, but I will try to
stick to the rigorous statistical meaning.
To a statistician, a sample is the set of data collected from the
random experiments on the basis of which a hypothesis is tested. So a DNA
sample comprises the strings of letters returned by a DNA company
from the cells collected from its customers. The relevant random
experiment is not the collection of cells (which is deterministic) but
the act of reproduction in which the random processes of mutation and
recombination produce the child's DNA from the parents'.
The various competing DNA companies market various products
which comprise both raw DNA samples and interpretation of both the
genealogical and medical implications of those samples.
Genetic genealogy has been very poorly explained to the public.
The results of DNA testing are frequently combined with the history and
mythology of human migration. The connection between genetics and the history
of human migration is generally extremely poorly explained. Is it based on DNA
extracted from prehistoric human remains, on other evidence from excavation of
prehistoric settlements, or on pure guesswork based on the geographical spread
of DNA in today's living people?
DNA tests CAN provide estimates of the probability that an individual
living in place A and an individual living in place B had a common ancestor,
either at any time, or within a specified number of generations. DNA from
living people on its own CANNOT provide any information as to whether
any such common ancestor lived in place A, lived in place B or lived in some
other place C, or moved between places A, B and C.
Consider the extreme example of a family of two brothers, one of whom
continued to live in his birthplace and fathered 10 daughters and no sons, the
other of whom emigrated and fathered 10 sons. Their shared Y-DNA (passed from
father to son) disappeared in one generation from their birthplace, but
increased and multiplied in the emigrant's destination. The present location of
the Y-DNA is therefore far away from the location where the common ancestor
lived. (The initial brothers could of course have had male line cousins who
passed on the same Y-DNA, perhaps in yet another different location.)
The units in which DNA testing (Y-DNA testing in particular) measures the
genetic distance between two individuals are numbers of mutations or
recombinations, i.e. rare (small probability) differences in DNA between a
child and the parent from whom the child inherits the DNA. By studying the
frequency distribution of mutations per reproduction or recombinations
per reproduction (for autosomal DNA), we can begin to understand the
significance of this genetic distance. With some knowledge of the number of
reproductions per generation (i.e. the average number of children fathered by
each male) and its variation over centuries and millennia, ESTIMATES of
the average number of mutations per generation or recombinations per
generation can be derived. These can then be used to provide further
ESTIMATES of the number of generations between the two individuals. By
studying the frequency distribution of the age of parents at reproduction (i.e.
years per generation) and its variation over centuries and millennia,
estimated numbers of years for variables like the time to the most recent
common ancestor can be derived. As stated by Dan Bradley of Trinity College
Dublin at BTOP, the error bars for such time estimates are typically of the
order of +/-50% of the point estimate. (I presume that "error bar" is
geneticists' jargon for what statisticians' jargon calls "confidence
interval".)
Genetics is a branch of applied probability and statistics in exactly the
same way as insurance, gambling, investment, lots of sports, medicine and many
other aspects of everyday life are. The highly educated population of the 21st
century are well capable of understanding it if it is explained clearly in this
context. Indeed, as Kelly Wheaton says,
"a statistics course is more important than a genetics one for genetic
genealogists".
Genetic genealogy is a branch of genealogy which likewise has its place
alongside traditional genealogical methods. Statistics prove nothing and
likewise genetic genealogy alone proves nothing. Both, however, can be of great
help in telling researchers where to look for the desired proof, and in
disproving wrong hypotheses.
Some DNA testing companies (ancestry.com in particular) have employed
marketing people to sell their products by promising not to use jargon. In
other words, they admit that they want to sell only to people who don't know
what they are buying. Consumer protection authorities should look into this:
the better regulated financial sector would never be allowed to get away with
it! See Roberta Estes's blog for a
further critique
of the ancestry.com product.
Any new science requires a new vocabulary to explain it. However, an attempt
to reconcile the geneticists' vocabulary, the genealogists' vocabulary and the
statisticians' vocabulary is urgently required. Scientists and marketers should
agree on the vocabulary, minimise the number of different synonyms used for
each concept, avoid mentioning concepts which are not directly relevant to
their audience, and define all new words clearly and precisely, with whatever
diagrams and mathematical models are necessary to help the understanding of
those who prefer verbal, spatial or quantitative approaches respectively. The
problem is epitomised both by the looseness of FTDNA's glossary and by
AncestryDNA's refusal to even use what it terms "jargon" to make its statements
intelligible to multiple audiences.
For example, at familytreedna.com, the words "block" and "segment" appear to
mean exactly the same thing and to be used interchangeably on the same page,
unnecessarily confusing the company's customers. (If there is a subtle
difference that I have missed, please let me know.)
As genetics is a branch of applied probability and statistics, it cannot be
explained clearly without using the basic vocabulary of those subjects, i.e.
words and phrases like probability, estimate, confidence interval and
hypothesis test. Beware of anyone who tries to persuade you otherwise.
Like any sophisticated and rapidly developing website, FamilyTreeDNA.com is
bound to take some getting used to.
It appears that every visit has to start with a login screen even if one
ticks the apparently useless `Remember me' checkbox. One must also remember to
click the small dark "LOG IN" button towards the middle left, not the larger
and brighter "Login" button at the top right, which merely reloads the login
screen. There are regular annoying pop-ups saying things like: `You have been
idle for 120 minutes. Your session may have timed out. The page will be
reloaded and you may need to log in again.' or 'Your session will expire on Sun
Nov 17 2013 13:19:52 GMT+0000 (GMT Standard Time). You have 5 minutes remaining
until your session times out. Click OK to keep this session.' If facebook.com
can keep its billions of users permanently logged in, there is no excuse for
any smaller website such as FamilyTreeDNA.com not to provide this option. At
least the timeout was increased from 30 minutes to 120 minutes soon after I
started to use the website.
My initial autosomal DNA results are presented in the form of 36 pages of
matches, with 10 matches per page, sortable by 3 fields. I have yet to find
a formal statistical definition of the word "match".
The nearest to a definition that I can find in the FAQs is:
The Family Finder program has calculated all of your matches to be your
relatives within the relationship range. Family Tree DNA uses stringent
standards for the relationship range and for the degree of relatedness. Thus,
only those determined with high confidence to be your actual genetic
relatives are included.
Where are the "stringent standards" published? How high is "high"?
Every statistical inference is subject to two types of error. For no
particular reason, they are known as Type I and Type
II errors:
- A Type II error is a false negative; in the context of genetic genealogy,
it means failing to list a known relative from the database of test
subjects among my matches. The FAQs
give these surprisingly small probabilities, at least beyond third cousins,
of NOT making Type II errors (i.e., in statistical jargon, this table shows
that the power of the tests is surprisingly low):
Chances of finding a match:
Relationship |
Match
Probability |
2nd
cousins or closer |
>
99% |
3rd
cousin |
>
90% |
4th
cousin |
>
50% |
5th
cousin |
>
10% |
6th
cousin and more distant |
Remote
(typically less than 2%) |
- It is not clear to me whether these lower bounds come from some
mathematical (probabilistic) model or from experimental sample data.
Neither is it clear why lower bounds rather than point estimates are given.
Nor is it clear what other unspecified factors might cause the power of the
test to be greater than the given lower bounds.
- Since the power of the test is so low for those who are not closely
related, my first idea when trying to place someone I don't know who shows
up as a close match would be to look at all of his or her matches (not just
our "in common" matches) to see if I recognise any of them as a known
distant relative who by chance doesn't match me. Under present access
rules, this would ideally require my close match to share his or her
password, or as a next best alternative to download and forward to me the
entire spreadsheet of all of his or her matches.
- A Type I error is a false positive; in the context of genetic genealogy,
it means listing as a match someone who can be proven not to be a relative
within the Relationship Range. I have so far not found any statement of the
probability of Type I errors. This probability is the significance level
used for the hypothesis test, so is most likely the industry standard 5%.
The probability that two people are related is, of course, much larger than
the probability that that relationship can be traced and proven by
conventional genealogical means. Having reviewed my own 36 matches for
which the Relationship Range does not include "Remote Cousin", my
genealogist's instinct is that the proportion to whom I have any hope of
proving a relationship is far less than the theoretical 95%.
Back to the website layout:
There seems to be no means of viewing all 350+ matches in the same web
browser window. I can, however, see all 350+ matches, sortable by all fields,
in a single Microsoft Excel window by clicking the Excel button at the bottom
right of the browser window. This causes Mozilla Firefox to offer to open an
XML Document in XML Editor. I am not familiar with either of these, but
clicking OK then opens a normal Excel window. The file downloaded is not a
properly formatted Excel file and is probably just a CSV file: column widths
are not set to match the content; panes are not frozen; autofilter is not
turned on; dates are not in my preferred Microsoft Windows date format; e-mail
addresses are not hyperlinked; long lists of surnames and placenames are not
set to wrap in a readable manner; etc. As I will be re-downloading this file, I
had to record this macro in order to
make it usable in Excel 2010. Hopefully the macro will be of use to other
FamilyTreeDNA customers. If you know how to use a macro in Excel, hopefully you
know how to copy and paste someone else's macro into the appropriate place (and
how to back up your macros, which Excel stupidly insists on storing in the
Program Files directory hierarchy).
For each match, I can see the following fields either in the web browser
window or in the Excel window or in both windows or somewhere else:
- Longest block length:
- This is the field by which results are sorted in Excel (Column F). It
is not immediately visible in the web browser, but can be revealed by
expanding a tiny, almost invisible, dropdown menu under any match's
mugshot. I initially thought it was seven clicks away: click Family
Finder, Chromosome Browser, Filter Matches By ..., Name, [type name,
don't hit <Enter> key], Find, checkbox, View this data in a table,
scan the centiMorgans column
for the largest value. See my separate chapter on the Chromosome Broswer for more details and some
examples.
- Shared segments:
- The number of shared segments is not visible in either the web browser
window or the Excel window, but appears after the sixth click in the
above alternative path to the longest block.
- Full Name:
- The full name may include a title (Mr. or Mrs.) in the web browser, but
not in Excel (Column A). In the case of Mrs., the Full Name does not
include the maiden surname, the only one of interest to the genealogist.
Sortable by first name in Excel, but not sortable by surname! One of my
matches (Mr. Robert M. Elliott) appears to have entered the "Mr." as part
of his first names as it does appear in Excel.
- Mugshot:
- On a randomly selected page, 2 of the 10 matches have uploaded
mugshots. Understandably not visible in Excel.
- E-mail address:
- A hyperlinked icon in the web browser; not hyperlinked in Excel (Column
H). It took a lot of googling to find this
page which taught me how to add appropriate hyperlinks using my Excel
macro!
- Note icon:
- Allows notes to be added on the website. These are then included in
Column L of the next Excel download.
- Family tree icon:
- Green indicates that the match has uploaded a GEDCOM. Grey (red on
mouseover) indicates that the match has not uploaded a GEDCOM. This is
not part of the Excel download, where it would be extremely useful. Why
is mouseover required to show the red? A surprisingly and disappointingly
small proportion of users have uploaded a GEDCOM - as few as one out of
ten on a randomly selected page of matches.
- Run Triangulate icon:
- Allows filtering on "In Common" matches (which can then be downloaded
into a smaller spreadsheet; for some bizarre reason this smaller
spreadsheet does not include any information about the match with whom
those included are in common!).
- Match Date:
- One of the three fields on which the web results can be sorted. In
North American all-numeric mm/dd/yyyy format on the web page. All
genealogists know that the use of ambiguous all-numeric dates is a mortal
sin, guaranteed to lead to the hell of confused months and days for
events in the first 12 days of any month. My macro, inter
alia, converts the Excel date (Column B) to my own preferred
yyyy-mm-dd format.
- Relationship Range:
- This is the primary field by which results are sorted by default in the
web browser, but not in the Excel download (Column C). Where is the
algorithm explaining how it is calculated from the numeric results?
- Suggested Relationship:
- Column D in the Excel spreadsheet, but not shown in web browser. Does
"Suggested" mean what statisticians call "Estimated"? If so, then is it a
maximum likelihood estimate? I can't think of any other estimation method
for a discrete parameter.
- Known Relationship:
- User-entered field, which must then be confirmed by the other Match.
There is a limited dropdown menu of possibilities with the vague "Distant
Cousin" hidden in the middle to cover any omitted relationships. "Distant
Cousin" is NOT a "Known Relationship"! There should be submenus or
numeric fields to allow any degree of cousin and any degree of remove to
be selected. What will I do if I find one of my 3rd Cousins 3R? If and
when I find and enter known relatives, this field will be downloaded as
Column G in the Excel spreadsheet.
- Shared cM:
- Presumably the sum in centiMorgans of block lengths for all shared
segments longer than some unspecified minimum length, probably 1cM. Also
represented by a graphical icon, which wastes a lot of valuable real
estate in the web browser window, forcing other variables into the hidden
dropdown underneath. For distant relatives, the graphical icons are so
similar that they are much less informative than the numerical
representation (in which trailing zeroes are properly included here).
Column E in the Excel spreadsheet.
- Ancestral Surnames:
- This appears to be a free-format text field, combining surnames,
placenames and the punctuation marks chosen by the customer entering the
text. The entry of ancestral surnames appears to be completely
independent of the uploading of a GEDCOM. Entries are delimited by " / ".
The order of entries is unclear: should it be alphabetical, or ahnentafel
order, or perhaps whatever random order the user entered the names in?
The first few words are visible in the web browser window, and the
remainder are revealed by mouseover. There appears to be no limit on the
number of entries allowed. If known relationships are limited to 6th
cousins, then should ancestral surnames not be limited to
GGGGGgrandparents, or 128 surnames, not allowing for spelling changes or
surnames not conventionally inherited. Even that is too many to be of use
in free form unsortable text. Surnames which I have entered as ancestral
surnames and which I share with my matches are apparently bolded. It is
too tedious to check these by clicking on all ten matches on all 36
pages. In Excel 2010 (Column I), to find matches using a particular word
in this field, click the auto-filter dropdown in Cell I1, type F then A
then the word of interest then <Enter>.
- Ancestral Placenames:
- Hidden in with ancestral surnames. This should surely be a separate
field. It should be (but is not) possible to download matches into a
spreadsheet with one line per ancestral surname and/or one line per
ancestral placename for further analysis.
- Y-DNA Haplogroup:
- Shown, if applicable (i.e. for males) and available (i.e. customer has
paid for it and order has come to the head of the queue) on the tiny,
almost invisible, dropdown menu under each match's mugshot and in Column
J of the Excel spreadsheet.
- mtDNA Haplogroup:
- Shown, if available (i.e. customer has paid for it and order has come
to the head of the queue) on the tiny, almost invisible, dropdown menu
under each match's mugshot and in Column K of the Excel spreadsheet.
What was I expecting to find?
- As of 6 Nov 2013, the offline version of my genealogy
database contained 7,217 known relatives who were either my
GGGGgrandparents or known to be descended from my GGGGgrandparents. All of
these are my fifth cousins or more closely related (possibly some number of
times removed). A lot of them are dead, but many of them are still alive. I
did expect to find that at least one of them had done the autosomal test
with FamilyTreeDNA and would show up as a match, but no such luck.
- No known relative at all appeared among my initial 354 matches.
- Until I found the table above, I didn't realise
that I can't expect to find one of these 7,217 among my matches until a lot
more than one of them have tested.
- I have discussed DNA testing with a number of known relatives, but didn't
feel that I could push them to test until I had done so myself.
- My friends and relatives probably know by now that I hate Christmas. Not
being a materialistic person and having a dread of shopping, I particularly
hate being asked `What do you want for Christmas?' and the pressure to
reciprocate. This year, as Christmas lights start to go up everywhere while
I reflect on my newly received initial results, I suddenly for once have an
answer for any relative who asks that dreaded question: `I want you to sign
up for the autosomal DNA Family Finder test at FamilyTreeDNA.com!' In
return, I can hopefully at least give anyone who obliges a more complete
family tree than would otherwise have been possible.
- So if you know (or even suspect) that you are related to me and you have
already tested, then please get in touch so that we can compare
results.
- And if you know (or even suspect) that you are related to me and you have
NOT already tested, then PLEASE consider signing
up for the test! The more remote our known connection, the more
interesting I think the results will be.
- I have a number of known fifth cousins and even sixth cousins with whom I
have proven relationships using a wide variety of sources, ranging from the
Registry
of Deeds, to inheritance cases, to the combination
of oral tradition and the writings
of Sir Hal Blackall, to a little helpful note in the
Kilkeedy burial register. However, surviving Irish church and civil and
census records generally do not go back far enough to confirm most of these
relationships. Additional DNA confirmation would be nice to have.
- I also have a number of known fifth cousins based on notes in my county
Clare grandmother's diaries and letters about meetings with various sets of
her third cousins, in particular the Nolans of Kilkee, the Houlihans of
Killard and the O'Connells of Moveen. While the family friendship with all
these families remains strong, nobody remembers any longer who our common
ancestors were. Then there are the Clancys of Cranny who remained the
closest of friends with the Clancys of Killard long after the details of
their relationship were forgotten. If you belong to one of these families,
I particularly urge you to please consider signing
up for the test!
- In the long term, if all my known relatives test, it should be possible
to reconstruct the DNA of all of our ancestors - possibly not exactly, but
certainly in the form of some sort of posterior probability distribution.
If I find someone who is both a match and a known relative, we can
presumably deduce that our longest shared blocks were almost certainly
shared by one of our most recent ancestral couple. Every match will add
another piece to the jigsaw puzzle.
What did I actually find?
- I should probably concentrate first on contacting the four closest
matches (Relationship Range 2nd Cousin - 4th Cousin; Suggested Relationship
3rd Cousin); then on the 36 matches for which the Relationship Range does
not include "Remote Cousin"; and probably dismiss the other 314 matches for
which the Relationship Range does include "Remote Cousin". Some sort of
colour coding (say green, amber and red) to distinguish between these three
groups might have helped to highlight at first glance the significant
differences between these groups.
- Only one of my four closest matches has uploaded a GEDCOM.
- 124 of my initial 354 matches have left the Ancestral Surnames column
completely blank, frustratingly including the top 3 matches in both the web
browser and Excel orderings, and 13 of the 36 matches for whom Relationship
Range does not contain Remote Cousin.
- Only when I established contact with my closest match did I realise that
many of those with missing GEDCOMs and blank Ancestral Surnames columns
could be adoptees. Could the percentage of adoptees testing be as high as
implied by the above ratios? See Example 3 at the end of my chromosome browser page for details of this
case.
- My own surname Waldron does not appear in the spreadsheet, per se.
However, there are two matches (same surname, same e-mail address, probably
father and daughter) listing Waldronde, in Berkshire, England. My earliest
known Waldron ancestor, my GGgrandfather, was born in county Roscommon,
Ireland in the 1820s.
- My mother's surname Durkan/Durkin (which was the maiden surname of three
of her four grandparents, so is potentially the source of up to 37.5% of my
autosomal DNA) appears as the surname of one match, but not among his
ancestral surnames, or those of any other match. There is one person with
Durkee, a new variant to me, among ancestral surnames.
- Despite this, there are at least two people that I know on the list:
- Gerard
Corcoran (4th Cousin - Remote Cousin), whose talks I attended at
the afore-mentioned BTOP show: We both have Walsh ancestors, but they
are in different parts of Ireland, and mine now has a big question mark
as I suspect sloppy research by the cousin who gave me the information
many years ago.
- Terry
Fitzgerald (it is actually her mother's DNA which suggests 5th
Cousin - Remote Cousin): See my chromosome
browser page for details, in particular Example 2 at the end of the
page.
- I once had a student named Carleen Doherty, and there is an e-mail
address on the list (5th Cousin - Remote Cousin) with username
carleendoherty. It's not the commonest combination of names in the
world, but perhaps not unique.
- Terry's is just one of many e-mail addresses which are associated with
multiple DNA samples of different family members.
- My top ten matches, which are the same both by Relationship Range and
Longest Block, include no less than five members of the Dengen family,
sharing an e-mail address - a mother and four of her children. See Example
1 at the end of my chromosome browser page
for details of this case.
- My mother and all her known ancestors lived in County Mayo. Filtering the
Ancestral Surnames column of the spreadsheet shows that 12 of my initial
matches use the word Mayo in the Ancestral Surnames field. For four of them
it is a surname, for eight of them it is a placename.
- That got me wondering: how common is Mayo as a surname? Might I actually
have relatives named Mayo? Is the surname derived relatively recently from
the placename?
- Then I got wondering about other common surnames: 12 of my initial 354
matches have Sullivan in the Ancestral Surnames field. Is that more or less
than the proportion of the entire FamilyTreeDNA database who have Sullivan
in the Ancestral Surnames field? How can I find the answer to that
question? (I have no known Sullivan ancestor.)
- Another common Irish surname, which does appear among my ancestors,
appears in the Ancestral Surnames column 13 times, variously rendered as
oconnor, Connor, Connors or O'Connor. Are these 13 matches any more
significant than the 12 Sullivan matches?
- Can the Ancestral Surnames column (and ancestral placenames, when
separated) be populated automatically from an uploaded GEDCOM?
- I can filter by any specific matching Ancestral Surname, but how
can I filter by the number of matching Ancestral Surnames?
- I had to do this to extract and analyse the Ancestral Surnames column:
- copy it to a text editor
- change every " / " to a line break
- change every " (" to a tab
- remove every ")"
- paste into a new spreadsheet
- insert a row of headers
- Alt D P to create a pivot table
- I managed to extract the (quite large) frequency table here from the pivot
table
- Apart from "Unknown", Parker with 10 matches is the most common of my
own ancestral surnames; 15 of my ancestral surnames were not shared by
any of my initial 354 matches.
- This would work better if the characters used as delimiters between
surnames and placenames (brackets) were not also allowed as parts of
surnames or placenames!
- I ended up with 5,942 surname lines, implying that my matches have
entered an average of just over 16 ancestral surnames each.
- I've spotted only one Irish townland where I have known relatives
(Barnacahoge, county Mayo), but most placenames are at county, country or
state level only.
- On the Keas/Keyes side, I did find one person whose Keas ancestor I
recognised as having lived in the same Catholic parish as mine
(Patrickswell, county Limerick). However, fifteen years of conventional
genealogical research have failed to prove the precise relationship between
the two neighbouring Keas families, and the DNA evidence appears unlikely
to advance the search for proof.
- There are two other Keas/Keyes matches, one in Germany back to 1813 and
the other in the USA back to 1671, surely just coincidence as my
GGGGgrandfather John Keas lived in Ireland and was born in the 1770s.
- I have e-mailed a few matches to see what happens, and Cindy Wood, the
Keas expert, will e-mail a few others.
Continue to Chapter 2
My own online family tree is at http://pwaldron.info/tng/index.php
but to see it you will have to Register for a New TNG User
Account.
Comments about this page can be left on facebook.