Friday, January 14, 2022

Study: Mutation Rate Variability across Human Y-Chromosome Haplogroups

 A small post about a relevant study that was published in 2020.

Qiliang Ding, Ya Hu, Amnon Koren, Andrew G Clark, Mutation Rate Variability across Human Y-Chromosome Haplogroups, Molecular Biology and Evolution, Volume 38, Issue 3, March 2021, Pages 1000–1005, https://doi.org/10.1093/molbev/msaa268

A common assumption in dating patrilineal events using Y-chromosome sequencing data is that the Y-chromosome mutation rate is invariant across haplogroups. Previous studies revealed interhaplogroup heterogeneity in phylogenetic branch length. Whether this heterogeneity is caused by interhaplogroup mutation rate variation or nongenetic confounders remains unknown. Here, we analyzed whole-genome sequences from cultured cells derived from >1,700 males. We confirmed the presence of branch length heterogeneity. We demonstrate that sex-chromosome mutations that appear within cell lines, which likely occurred somatically or in vitro (and are thus not influenced by nongenetic confounders) are informative for germline mutational processes. Using within-cell-line mutations, we computed a relative Y-chromosome somatic mutation rate, and uncovered substantial variation (up to 83.3%) in this proxy for germline mutation rate among haplogroups. This rate positively correlates with phylogenetic branch length, indicating that interhaplogroup mutation rate variation is a likely cause of branch length heterogeneity.

From the supplement 

For the Xue et al. (2009) rate, we identified the following haplogroups having significantly lower relative somatic mutation rate: E1b, R1a, and R1b. For these haplogroups, their actual mutation rate may be lower than the Xue et al. (2009) rate, and thus divergence times may be underestimated.

 

Branch length Y haplo
R1b, E1b, and R1a have the shortest branch lengths

 This is YFull's age calculation method:

The second formula uses an assumed mutation rate of 144.41 years (0.8178*10-9, which is the average of the mutation rates of the ancient Anzick-1 sample and of a group of known genealogies, and an assumed age of 60 years for living providers of YFull samples.

Y-Full has the most accurate Y haplogroup tree due to the largest numbers of SNPs utilized, however, they also use a common mutation rate across haplogroups. 

Summary: 

R1b, E1b & R1a Y-chromosomes have lower somatic mutation rates than others due to shorter branch lengths, and therefore expected TMRCA & formation dates of these clades and subclades are most likely underestimated. YFull uses a common mutation rate across Y Haplogroups, therefore its TMRCA and formation dates for R1b, E1b, and R1a (and subclades) are underestimated.






Friday, January 7, 2022

Steppe-->Brahmin? Final evidence against Harvard's claim

I received a reply from Harvard's Dr. David Reich to the mail I had sent (check previous post comments).

Hi Ashish,

 

The Z-scores are formal tests for model fit – appropriately calibrated by a block jackknife – and the tail of large Z-scores are simply reflecting the reality of model failure.

 

What the analyses are showing is that the “Indian Cline” is in fact best modeled as a two-dimensional cloud not a one-dimensional gradient: a mixture of three source populations that cannot be simplified to a mixture of two source populations as we had done in previous modeling for example in Reich et al. Nature 2019.

 

As such, what our enrichment analyses are showing is simply that the groups of traditionally priestly status tend to be enriched for the Steppe-related source population after controlling for the proportions from the other two.

           Yours, David 


I will address these points in this analysis post. This will be final evidence to disprove the confident Steppe--->Brahmin causality claim made in the study.

Monday, January 3, 2022

Scratching my head! - Pt 2 - Major statistical analysis gap in Narasimhan paper undermines steppe->Brahmin theory

In a previous post, I had critiqued Narasimhan et al 2019 for the inference that steppe ancestry is significantly related to modern Indian Brahmin groups, and hence provides evidence that this steppe ancestry is causal to the Indo Aryan languages.
Nevertheless, the fact that traditional custodians of liturgy in Sanskrit (Brahmins) tend to have more Steppe ancestry than is predicted by a simple ASI-ANI mixture model provides an an independent line of evidence—beyond the distinctive ancestry profile shared between South Asia and Bronze Eastern Europe mirroring the shared features of Indo-Iranian and Balto- Slavic languages (58)—for a Bronze Age Steppe the origin for South Asia’s Indo-European languages.
I gave 3 main reasons why this reasoning and analysis was faulty.

1. Correlation is not causation, Brahmin steppe can be enriched by way of intermarriage with steppe rich brides. It may not have anything to do with the steppe causing the formation of brahmin jAtis.