Genealogically useful, misattributed and false DNA matches

Last updated: 5 September 2020

URL: http://pwaldron.info/DNA/GenealogicallyUseful.html

The changes to AncestryDNA's matching algorithm announced on 14 July 2020 and implemented throughout August 2020 sparked off a variety of reactions and got me thinking again about how to identify genealogically useful autosomal DNA matches.

When I first got involved in genetic genealogy back in 2013, I was very confused by a variety of poorly defined jargon which is used to dismiss matches that are not considered genealogically useful. This jargon includes terms such as "IBS", "identical-by-state", "pseudo-segment", "false positive", "false match", etc. I soon realised that many of these terms are generally just as subjective as the thresholds built into the matching algorithms used in the various DNA comparison databases.

Several years of experience in studying the DNA matches of myself and others on many different websites, and my background in mathematical sciences including statistics, have subsequently led me to my own subjective opinions on criteria for assessing the genealogical usefulness of DNA matches, which can be divided into four broad categories:

SNP-based criteria for genealogical usefulness
centiMorgan-based criteria for genealogical usefulness
Base-pair-based criteria for genealogical usefulness
Other criteria for genealogical usefulness and a wish list

I will deal with each of these in turn, starting with the most technical. You may click on the hyperlinks above to skip ahead to the other sections if you wish.

SNP-based criteria for genealogical usefulness

Until such time as a technology is developed to read our paternal chromosomes and maternal chromosomes separately, autosomal DNA matching is based on identifying half-identical regions bounded by opposite homozygous locations. At every location read along the pairs of autosomal chromosomes, the current technology reads an unordered pair of letters, e.g. AA, AT, AC, etc. At every location within a half-identical region, the two individuals being compared have at least one letter in common. For example, at a location where A or C can be observed, two half-identical individuals can have AA and AA, AA and AC, AC and AC, CC and AC, or CC and CC. On the boundaries of half-identical regions, one of the individuals has AA and the other has CC. The vast majority of locations observed on the autosomal chromosomes are bi-allelic, so that only two of the four letters (A,C,G,T) are observed at each location.

The first principle of DNA comparison is that within a long half-identical region there is a long, but not directly observable, identical segment (e.g. ACGTAAGTTGGAC ...) which is common to, for example, one individual's paternal chromosome and the other individual's maternal chromosome. The second principle of DNA comparison is that this identical segment, or most of it, came to both parties from a (probably deceased) common ancestor, although that common ancestor is likewise generally not directly observable (barring exhumation).

Something which is not directly observable can in general neither be proven true nor proven false.

It is often possible to estimate the likelihood or probability that a proposition is true or false. Unless the estimated probability that something is false is 100%, than it can be described at best as probably false and must not be described as false without this qualification.

One reason to dismiss a half-identical region as not genealogical useful is that it may be

half-identical by chance

This refers to regions in which there is a sequence of overlapping short paternal/paternal, paternal/maternal, maternal/paternal and maternal/maternal matches. The fewer the SNPs being compared within the region, the more likely it is to be half-identical by chance. As different laboratories introduce different chips with different sets of SNPs, the more the matching algorithms have to guard against half-identical by chance matching, and the more half-identical by chance regions may slip through the net. As of 16 July 2020, the number of SNPs available for comparison with my top 3,000 matches at GEDmatch, called the overlap, ranged from 47,557 to 345,972. In the early days, when most comparisons were based on the same underlying chip and an overlap of hundreds of thousands of SNPs, I was dubious of any half-identical region containing less than 1,000 SNPs. The present diversity of chips in use means that one can no longer afford to be so fussy about SNP density.

Another reason to dismiss a half-identical region as not genealogical useful is that it may be

half-identical by omission

This refers to the fact that the SNPs which the DNA companies examine are generally not all the SNPs. In other words, the locations not examined are not necessarily locations at which all humans are identical. Thus, it is possible that two people match at a long sequence of consecutive observed SNPs, but that there are unobserved SNPs between the observed SNPs at which the two people do not match. Dave Nicolson has written a paper about this.

A half-identical by chance or half-identical by omission region can be considered a false match.

Half-identical regions generally have fuzzy boundaries at each end of the identical segment which they contain. In other words, there will generally be a gap between the last location in the identical segment and the next opposite homozygous location, e.g.

individual V paternal: AAAAACCCCC
individual V maternal: CCCCCCCCCC
individual W paternal: AAAAAAAAAA
individual W maternal: ACACACCCCA

In this example, the identical segment ends with five As, but V and W are half-identical at the next four locations (where each has at least one C) until the half-identical region eventually ends at the last location shown, where one has two As and the other has two Cs.

For more on these concepts, see Ann Turner's "Identity Crisis" article in the Journal of Genetic Genealogy (Volume 7, Fall, 2011).

In general, an identical segment shared by individuals V and W will have descended to both individuals in its entirety from a single common ancestor. The only other possibility is that there has been a crossover within the identical segment in a recent generation, so that, for example, V inherited the first part of it from one member of an ancestral couple and the second part of it from the other member of that ancestral couple. If the crossover was near either end of the identical segment, then it repesents another type of fuzzy boundary. If the crossover was in the middle of a long identical segment, then, by the same logic, V must ultimately share one of the ancestors from whom he inherited each end of the identical segment with W. As one climbs up the family tree, fuzzy boundaries can be shaved off each end of the initial segment, so that it can shrink considerably, or even disappear, before the common ancestor is reached.

Another reason to dismiss a half-identical region as not genealogical useful is that it may

contain long fuzzy boundaries around a small core segment

Such a segment could be called identical by chance and can be considered a false match.

The prevalence of fuzzy boundaries means that the lengths of identical segments can only be estimated and that these estimates incorporate substantial measurement error. I have not seen any formal study of the distribution of this measurement error.

A consequence of this measurement error is that when three individuals share the same identical segment, the estimated lengths of the three corresponding half-identical regions can be slightly different. It can often be the case that one or two of the half-identical regions are longer than the threshold being used for a particular comparison, and the other two or one are shorter than the threshold. If two of the three individuals are parent and child, then there is a strong (and justifiable) temptation to dismiss these half-identical regions as not being genealogically useful. This was the subject of Debbie Kennett's 2017 study and of other parent/child studies which she cites. Those studies concentrated on the 6cM threshold then used by AncestryDNA for identifying matches, but exactly the same phenomenon is observed around the 20cM threshold used by AncestryDNA for identifying shared matches.

When AncestryDNA announced its intention to raise its threshold from 6 centiMorgans (cM) to 8cM, I would like to have seen one of these parent/child studies extended, as a matter of urgency, to examine how many of the shared matches between parent and child are estimated to share over 8cM (and over 20cM) with one and under 8cM (and under 20cM) with the other, so that, in the 8cM case, they originally appeared "true", but then appeared "false" when one of the matches had disappeared under the new regime. As I have no living parent and no child and do not have access to any parents-and-child trio of AncestryDNA kits, I could not carry out this research myself, and I am not aware of anyone else who did it.

A match of which the estimated length is within a normal margin of error of some arbitrary threshold must not be considered a false match.

centiMorgan-based criteria for genealogical usefulness

Only in the last three paragraphs above have I mentioned the centiMorgan length of a half-identical region or of an identical segment, which is another indicator of whether or not it is genealogically useful. Before returning to centiMorgans, let us consider who our ancestors were. I have estimated that in somewhere like Ireland, where the population is small and there was little inward migration in recent centuries, it is unlikely that any two randomly selected people with no tradition of recent immigrant ancestors are more distantly related than about twelfth cousins. Going back about three times as far, Mark Humphrys argues that we Irish are all descended from Brian Ború, the High King of Ireland who was killed in battle in 1014.

This simple observation has profound implications:

if your parents came from the same population, then they were probably twelfth cousins or closer;
if your match's parents came from the same population, then they too were probably twelfth cousins or closer;
so you and your match are probably related in at least four different ways: paternal/paternal, paternal/maternal, maternal/paternal, maternal/maternal;
extending the analysis backwards, you and your match are probably related in many more ways;
if you and your match have a known recent common ancestral couple, then one of that couple is probably the source of any "long" identical segment (in centiMorgan terms) that you share with your match;
the source of a "short" segment that you share with your match could be a more distant common ancestor of whom you are currently unaware.

If your ancestors come from a smaller population (e.g. an offshore island or a coastal peninsula), then your parents were probably even more closely related than twelfth cousins. If your ancestors come from a larger population, then your parents might not be as closely related as twelfth cousins. However, the other principles above remain valid, whether the 95th percentile relationship is sixth cousin, twelfth cousin or eighteenth cousin.

Hence, a "long" identical segment shared by two known cousins is generally assumed to have come from one of their most recent known common ancestral couple (and a "long" identical segment shared by two known half-cousins is generally assumed to have come from their most recent known common ancestor), while a "short" identical segment shared by two known relatives may, however, have come from some other, less recent, and still unidentified, common ancestor.

"Long" and "short" in this case are subjective terms, and how to interpret them also depends both on the quality of surviving genealogical records for the areas where your ancestor lived and on how geographically diverse your ancestry is.

If surviving genealogical records are good, then you should be able to identify common ancestors with DNA matches with whom you share long identical segments and should also be able to identify the other (not shared) ancestors of yourself and your match on the relevant generation.

If surviving genealogical records are good, and particularly if your ancestors did not move around very much, then you will find that you are multiply related to many of your DNA matches.

If surviving genealogical records are bad, then you will struggle to identify your relationships to any of your DNA matches beyond immediate family members.

The identical segments which you share with a DNA match who is a known relative will cease to be genealogically useful when you can not reliably assign them to one particular ancestor or ancestral couple among those that you share with the match.

In summary, other reasons to dismiss half-identical regions (besides those which are half-identical by chance, etc.) as not genealogical useful is that they may be

shared with a stranger and inherited from a common ancestor too distant to be identified from surviving genealogical records; or
shared with a known relative, but inherited from a common ancestor other than one of the known most recent common ancestral couple.

A match attributed to the wrong common ancestor can be described as a misattributed match, this alone does not make it a false match.

True (and false) matches can become misattributed matches in all sorts of way, including:

misattribution to someone who is not a common ancestor;
misattribution to a common ancestor who is not the source of the shared DNA;
automated misattribution:

using unvalidated family trees (e.g. trees with children born before their parents!);
using wrong trees (e.g. trees which have confused two namesakes or which are linked to the wrong DNA kit);
using incomplete trees (e.g. attributing a shared DNA segment to an ancestor in the tree when it comes from an ancestor omitted from the tree);
using correct trees (e.g. attributing a shared DNA segment to a more recent common ancestor when it comes from a more distant common ancestor);

manual misattribution;
etc.

I ran into the problem of misattributed matches when experimenting with DNA PAINTER. I started out with a 7cM threshold and found no less than four regions of between 7cM and 8cM in length which I shared with known relatives, but which I could not reliably assign to a particular ancestor. In other words, I appear to be doubly related to at least one of the matches whom I match in each of these regions. So I decided that for anything under 8cM the risk that the shared DNA came from a common ancestor other than the known common ancestor is unacceptably high, and for anything over 8cM, the corresponding risk is acceptably low. FamilyTreeDNA has long had an 8cM threshold in its matching algorithm (although it annoyingly reports and counts much smaller half-identical regions once one of them exceeds 8cM). I am delighted to see that AncestryDNA is now doing something similar.

I also ran into the problem of misattributed matches when a region (of slightly over 8cM) in which many of my kits triangulated with each other turned out to be a pile-up region (details here).

Base-pair-based criteria for genealogical usefulness

The ISOGG Wiki cites a relatively old (2014) study by Speed and Balding which used computer simulations based on base pairs and going back for 50 generations. They showed that, for example:

over 50% of 5 Mb (5,000,000 base pair) segments date back over 20 generations;
over 60% of 10 Mb segments date back over 10 generations; and
around 40% of 20 Mb segments date back over 10 generations.

The relationship between base pairs and centiMorgans is highly non-linear, which is the very reason for using the centiMorgan scale rather than the base-pair scale to assess the genealogical usefulness of a match. Hence, the Speed and Balding results can NOT be used directly to draw reliable inferences about the age of segments measured in centiMorgans. I am not aware of any similar study calibrated in centiMorgans, but one is badly needed. Its results will undoubtedly be similar to, but less extreme than, those of Speed and Balding.

In the absence of such a study, the editors of the ISOGG Wiki propose using the Speed and Balding results, in conjunction with a rule of thumb that on average one centiMorgan equals one megabase.

To see where this rule of thumb comes from, I did a one-to-one comparison of my own kit with my own kit at GEDmatch and added the end locations of the 22 segments to get the aggregate length of the autosomes in base pairs (2,865,561,399); then added the cM lengths of the 22 segments to get the aggregate length of the autosomes in cMs (3587.1), which gives an average of 798,852 base pairs per cM, while the rule of thumb rounds this to the nearest Mb per cM and proposes 1,000,000 base pairs per cM.

Thus this rule of thumb introduces two opposite biases:

because the relationship between the cM and Mb scales is highly non-linear, assuming any sort of direct proportionality or linear relationship causes a bias which undoubtedly makes small segments look worse than they really are;
because the actual average ratio between the cM and Mb scales is just under 0.8Mb=1cM, using a one-to-one ratio causes an opposite bias which undoubtedly makes small segments look better than they really are.

Without further research (see below), it is very unclear which of these two biases will be greater. Nevertheless, the ISOGG rule of thumb is still often (mis)used in combination with the Speed and Balding results (and other more justifiable reasons) to warn against the dangers of over-reliance on segments which are short in centiMorgan terms, e.g. here. No theory can be either proven or disproven by a biased methodology. A biased methodology may demonstrate a relationship in the correct direction, but, depending on the direction of the bias, will either exaggerate or understate the strength of the relationship.

By definition, if a set of segments is sorted by megabase length (longest-to-shortest) and then re-sorted by centiMorgan length, segments which descend from very distant ancestors will generally move down the list after re-sorting and segments which descend from more recent ancestors will generally move up the list after re-sorting.

I have been challenged to illustrate the biases introduced by using Mb as a proxy for cM. To do this quickly, I ran the GEDmatch Tier 1 Segment Search on my own kit (VA864386C1) with the minimum thresholds. This gave a sample of exactly 10,000, mostly small, half-identical regions (HIRs) with both cM and Mb length for each HIR. In the absence of any better data, I assessed the genealogical usefulness of each HIR by whether or not I have established my relationship to the other party. In the table below, the columns headed "known" show the half-identical regions that I share with individuals to whom I have established my relationship; the columns headed "unknown" show the half-identical regions that I share with individuals to whom I have NOT established my relationship. All of these relationships are closer than sixth cousin. I divided the HIRs into groups using the Mb ranges from the oft-cited Speed and Balding Figure 2, which I converted into cM equivalents using the true average of just under 0.8Mb=1cM. Each row represents one of the Speed and Balding ranges. Here are the results:


Mb ranges	known	unknown	Grand Total	%known
0.2-0.5Mb	6	74	80	7.5%
0.5-1.0Mb	56	995	1051	5.3%
1-2Mb	182	3112	3294	5.5%	cM ranges	known	unknown	Grand Total	%known
2-5Mb	206	3579	3785	5.4%	3.0-6.2cM	408	7160	7568	5.4%
5-10Mb	106	1025	1131	9.4%	6.3-12.5cM	113	1599	1712	6.6%
10-20Mb	92	406	498	18.5%	12.6-25.0cM	115	469	584	19.7%
20-30Mb	40	69	109	36.7%	25.1-37.5cM	49	34	83	59.0%
30-40Mb	27	7	34	79.4%	37.6-50.0cM	27	4	31	87.1%
40-50Mb	8		8	100.0%	50.1-62.5cM	9	1	10	90.0%
50-60Mb	4		4	100.0%	62.6-75,1cM	8		8	100.0%
60-80Mb	4		4	100.0%	75.2-100.1cM	3		3	100.0%
Over 80Mb	2		2	100.0%	over 100.1cM	1		1	100.0%
Grand Total	733	9267	10000	7.3%	Grand Total	733	9267	10000	7.3%

The two principal objectives for this exercise were to demonstrate:

that the Mb and cM scales are very different, which is illustrated by the differences between the two columns headed "Grand Total" (the fourth and ninth columns); and
that assuming equivalence of Mb and cM scales introduces systematic biases to subsequent calculations, which is illustrated by the differences between the two columns headed "%known" (the fifth and last columns).

Interesting points from this table and the underlying data include the following:

The distribution on the cM scale looks very different from the distribution on the Mb scale:

GEDmatch filters out all HIRs under 3.0cM, regardless of their Mb length.
If the 1cM=1Mb rule of thumb was appropriate, there would be no HIR under 3.0Mb, but 63.0% of the HIRs are below this threshold.
If the 0.8cM=1Mb rule of thumb was appropriate, there would be no HIR under 2.4Mb, but 52.8% of the HIRs are below this threshold.
If this was a random sample, then the average Mb/cM ratio would be around 798,852.
As this is not a random sample but has been filtered by GEDmatch for genealogical usefulness, the average Mb/cM ratio is lower.
Because the Mb/cM relationship is non-linear, the average of the Mb/cM ratios for the individual HIRs (613,148) is different to the ratio of the aggregate Mb length to the aggregate cM length (635,709).
If the relationship between the Mb and cM scales was linear, then the correlation between the two measures would be 1.00.
Because the relationship between the Mb and cM scales is non-linear, the actual correlation between the two measures is 0.86.

In the five Mb-based groups above 30Mb and in the six cM-based groups above 25cM, I have identified a common ancestor for most of the HIRs.
The widely cited result that "40% of 20 Mb segments date back beyond 10 generations" is far more pessimistic than this table, which shows that I have found a much more recent common ancestor for 36.7% of the HIRs in the corresponding 20-30Mb group, and an even more reassuring 59.0% of those in the equivalent cM-based group. Of course, without digging up the common ancestral couples I cannot prove to the doubters that the known common ancestors were the sources of the relevant HIRs.
There is one outlier of 55.0cM/28.4Mb for which I have not been able to identify the common ancestor; the relatively short Mb length seems to give a better estimate of the age of this particular segment.
Ignoring this single outlier, my success rate in finding common ancestors is as good or better when grouping by cM as when grouping by Mb for every group above 10Mb/12.6cM.
The results on the right of the table look a little less depressing than those on the left because of the use of the cM scale in place of the Mb scale.
The results on the left of the table in turn look a little less depressing than those of Speed and Balding because the GEDmatch matching algorithm, like those of the other DNA companies and the efforts of all good genetic genealogists, has already filtered out to the best of its ability many segments from extremely distant ancestors.

I still agree with Blaine Bettinger and others that small segments potentially poison our genealogical research, but the poison is not as deadly as some would have us believe.

In summary, a match sharing a segment inherited from a very distant ancestor can be described as an ancient match and is not genealogically useful, but must not be considered a false match.

Other criteria for genealogical usefulness and a wish list

The cM lengths of DNA matches are strongly correlated with the degree of relationship for close relationships, out to about first cousins.

The cM lengths of DNA matches are only very weakly correlated with the degree of relationship for distant relationships, certainly beyond third cousins.

We need other and better criteria to identify the dozens of genealogically useful matches almost certainly lurking among the many thousands of useless, irrelevant and even false small matches which disappeared under the new AncestryDNA matching algorithm. There is no point in throwing the baby out with the bathwater.

There are three almost equally important criteria which can be used to assess the genealogical usefulness of a half-identical region between yourself and a DNA match:

the length of the half-identical region, whether measured in cM, SNPs or Mb;
the number of other similar and longer half-identical regions which you and your known relatives share with the match and his or her known relatives; and
the number of other individuals who are half-identical to you and your match on this region.

The first of these criteria is widely observable and widely used (and misused). The second criterion is observable for individual-to-individual comparisons, but takes a little more effort to observe for family-to-family comparisons. The third criterion is generally observable only by using advanced third-party tools such as the Tier 1 Segment Search at GEDmatch.

While a single small DNA segment shared by two individuals is not genealogically useful in isolation, a pattern of large and small DNA segments shared by multiple descendants of one ancestor with multiple descendants of another ancestor can be extremely useful (in combination with archival evidence and family traditions) in establishing the relationship between the two ancestors. We need better ways of finding these patterns. A good start is to fish in all the online gene pools and to share your DNA match lists with your known relatives and with your other close matches.

The thresholds at which matches cease to be genealogically useful are different for different purposes:

For chromosome mapping purposes, as already stated, my experience is that 8cM is a sensible threshold for determining what is genealogically useful.
For triangulation purposes, much smaller half-identical regions can still be very useful. The more individuals one adds to a triangulation group, the smaller the overlap shared by everyone in the group, but the more confidently one can predict that the overlap was inherited from one of the common ancestral couple shared by everyone in the group. The overlap may be small, but will be part of much longer segments unquestionably shared by different subgroups of the triangulation group. Sometimes, however, I feel obliged to omit a known relative from a triangulation group in order to make my conclusion more convincing to those with a stronger bias than my own against small triangulated segments.
For purposes of identifying matches at AncestryDNA with whom I can trace common ancestors and validate common ancestor hints, particularly those far down my match list, I find that the best criterion for assessing genealogical usefulness is currently not the shared cMs but the number of shared matches.

The points above inspire part of my wish list for the next round of improvements to the DNA websites:

I wish that AncestryDNA would divert some of the resources devoted to tweaking its quite acceptable DNA-matching algorithm to tweaking, or to completely redesigning, its unacceptably poor tree-matching algorithm, which generates many hints that any sentient human can see are complete nonsense.
I wish that GEDmatch would drop the 7cM minimum threshold used in its Tier 1 triangulation tool to the minimum threshold available for one-to-one comparisons, currently 3cM, and that it would include the SNP overlap as well as the centiMorgan length of each triangulated segment which it reports.
I wish that AncestryDNA would show the number of shared matches for each kit on my match list, rather than forcing me to click twice and then potentially scroll down repeatedly for every single kit in order to count the shared matches. I would also like to be able to sort and filter my match list by the number of shared matches, as I can by the number of shared centiMorgans.
I wish that FTDNA would improve its in-common-with lists to give some indication of how much DNA the other two parties share, as the other DNA comparison websites do: GEDmatch and MyHeritage show the estimated shared cM, while AncestryDNA shows only shared matches where the estimated shared cM exceeds 20 (not counting the tiny segments that FTDNA counts).
I wish that AncestryDNA, FTDNA and GEDmatch would allow their shared match lists to be sorted by the average (or equivalently the total) cM shared with the two kits being compared:

MyHeritage already does this;
GEDmatch sorts by the cM shared with whichever kit is entered first in the web form;
FTDNA and AncestryDNA sort by the cM shared with whichever kit is logged in.

I wish that both AncestryDNA and FTDNA would improve their shared/in-common-with lists to indicate whether the matches are triangulated, as MyHeritage does and as GEDmatch alllows via its additional display and processing options.
I wish that MyHeritage would allow users to filter triangulated and untriangulated shared matches, rather than forcing the user to click and scroll down repeatedly for every single match in order to find and/or count the triangulated shared matches
I wish that all the companies would extend the ability to see matches shared:
- by two kits which are not deemed to be matches, such as two known third or fouth cousins who don't meet the relevant matching threshold (currently possible only at GEDmatch; and at FTDNA, but only by administrators of projects of which both kits are members); and
- by three or more kits (currently possible only at FTDNA and only by administrators of projects of which all the kits are members).
For example, I would like to see a list of the matches shared by all of the descendants of one of my ancestors whose DNA is linked to my online family tree, as the more descendants of the relevant ancestor that an individual matches, the more likely he or she is to be also descended from or closely related to that ancestor.
GEDmatch could add the shared match list to its Multi Kit Analysis in the same way as FTDNA's GAP interface for project administrators adds it to its own autosomal matrix comparisons.
I wish that users had more control over the shared match thresholds for one-to-one comparisons:
- FTDNA gives the user no control and their built-in thresholds are so low and their shared match lists are now so long that they are genealogically almost useless;
- GEDmatch proposes a 10cM default threshold, which is still too low, but which can be adjusted upwards by the user;
- AncestryDNA's fixed 20cM threshold has proven enormously useful in my own research;
- the more sensible sort order used by MyHeritage partly compensates for its low built-in threshold.
Some users might even want to set an upper threshold to eliminate shared close cousins who do not share ancestors, for example if one party's greataunt was married to the other party's greatuncle.
I wish that AncestryDNA would provide a chromosome browser, but I know that I am wasting my breath in adding my voice to those of many thousands of AncestryDNA customers who have been campaigning for this basic and essential tool for many years.

I will conclude with some statistics as of 16 July 2020 on the matches in danger of removal from my own AncestryDNA match list:

I have identified (and starred) 142 known relatives among my 36,571 AncestryDNA matches (0.39%).
AncestryDNA does not appear to report the total number of common ancestor hints for a kit.
Only 11 of my known relatives are among the countless matches with whom I am estimated to share 6cM or 7cM. The number of shared matches for these bottom 11 known relatives are 8, 16, 2, 4, 4, 1, 2, 15, 3, 2 and 1 respectively (average 5.3).
Only 12 of my common ancestor hints are among the countless matches with whom I am estimated to share 6cM or 7cM. I consider six of these 12 hints to be wrong, spurious or plain nonsense (and far more dangerous in the hands of inexperienced genealogists than small DNA segments). The number of shared matches for the six hints with which I agree are 8, 2, 4, 1, 2 and 2 (average 3.2). The number of shared matches for the six hints with which I disagree are 5, 0, 0, 2, 2 and 0 (average 1.5).
As a control group, I looked at the 12 most recent matches (as of 18 July 2020) with whom I am estimated to share 6cM or 7cM. The number of shared matches are 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0 and 0 (average 0.1).

Based on these statistics, I would much rather lose matches with more centiMorgans and fewer shared matches than those with many shared matches and fewer centiMorgans.

I will update these statistics after the changes have been fully implemented.

As of 1 September 2020, there was still some confusion as to what AncestryDNA's new matching criteria were and as to whether they had been fully implemented:

matches under 8cM which had not had notes or coloured dots added by either party and had not been the subject of messages between the parties had been deleted from the perspective of both parties;
matches under 8cM which had notes or coloured dots added by one party remained visible to that party only;
conflicting information had been provided by AncestryDNA to different customers as to whether matches under 8cM which had notes or coloured dots added by one party would become visible again to the other party (see this Facebook discussion).

I hope that my arguments and examples have convinced readers that, as DNA comparison databases grow to tens of millions of individuals and as pile-up regions are identified and eliminated, a well-defined count of shared matches will cease to be just a random artifact of the matching process, but will converge to a very useful measure of the relative genealogical usefulness of small matches and even of non-matches.

Many thanks to those who provided useful feedback via Facebook on previous versions of this page.