Saturday, November 27, 2021

A Reply to the Anthrogenica clique's criticism of my Steppe Eneolithic Post

I recently wrote a post in which my analysis showed that The same ancestors which provided iranian like ancestry to Irula tribals also provided ancestry to Steppe Eneolithic as well as South Central Asia (Sarazm aDna etc). One can read the post here.

For feedback as well as spreading the post, I posted the link to a popular DNA & population genomics forum - Anthrogenica.  Quite unexpectedly, immediately after, I was suspended from Anthrogenica with no reason or message whatsoever. I always knew it is a Kurganist bastion, but never quite expected this level of censorship to opposing ideas. I am glad though, starting this blog is now worth it. Also, credit to Davidski at Eurogenes blog for allowing me on his blog comments section even with my opposing views. 

I came to know that the Moderator who banned me is a handle named Coldmountains, a handle who I have in the past ridiculed quite a lot (in Eurogenes comment section) for not finding a single R-L657 indian y haplogroup in the steppe since 2015. Poor guy comes empty handed after each successive paper when new samples from the steppe are published. His search still goes on. Meanwhile the only L657+ sample we have so far in aDna is from Roopkund lake India 800CE.

The link to my Anthrogenica thread is here. Please register and show Anthrogenica some love in this thread and elsewhere. The moderator clique there is in an echochamber and needs some awakening.

Anyway, let move on to the criticism of my post. There is just one, and sadly i couldn't reply because I was banned. Hence this post.

Kale on 25-Nov-2021 wrote

Kotias is a pseudo-haploid sample > That means rather than having two different sets of chromosomes like a real person, it is treated as having two exactly identical sets > That means the drift going to itself it going to be crazy high > If you have an edge coming out of an artificially crazy high drift, the percentage contribution has to be artificially crazy small to avoid overfitting.

This graph is completely uninformative until structured properly.

Kale is absolutely wrong here. The pseudo-haploid* samples do not cause artificial high drift edges, rather, the artificially high drift is due to just 1 sample in the label because of which heterozygosity cannot be computed for the label. This problem is solved by using 2 samples in the label even if samples are pseudo-haploid. This is not a problem for .DG samples as these are diploid genomes and allow for heterozygous calls.

This is exactly what I have done in the graph below (later). I lumped Satsurblia & Kotias into 1 label known as CHG. I will show that my conclusion does not change.

Proof of my claim is from the programmers of Admixtools in their qpGraph readme pasted below. Should have been basic reading right?

Genotypes are expected to be pseudo-haploid -- 2 samples at least per population or drift lengths on leaves are not meaningful.  

As far as edges coming out of artificially high drifts are concerned, sister clades of Kotias also did not help Kales case. See, i spent weeks on the model trying every possibility. Them not being able to read the graph is  not my problem.

Below I will paste my new qpGraph for Steppe Eneolithic which follows these principles and should be acceptable to the Kurganists as well.

  1. Worst residual ZScore below 3.
  2. CHG label now has 2 samples and therefore allows for heterozygous calls.
  3. Each admixture node has a drift edge following it rather than an immediate admixture edge. 1 or 2 admixture edges in my graph follow an immediate admixture node because the drift edge length was 0 (hence i omitted them)
  4. Non 0 drift edges implying that the edge is a true one. (This is not a strict need. qpGraph disallows immediate admixture edge if the admixed node is labeled as a number. But qpGraph allows the admixture node to be a source to another node if the label given to it is alphanumeric. This is useful if multiple admixtures together are to be modeled.)

Please click on the graph for high res mobile view. On Desktop download image and zoom in a zoomable picture viewer.

Steppe eneolithich qpgraph



DISCUSSION

After correcting all criticisms, the need for IndiaN component in Steppe eneolithic does not go away. I again prove that the same ancestors who ultimately provided ancestry to Steppe Eneolithic in 5th mil BCE also provided ancestry to Irula tribe (and by extension most of indian groups). The minute criticisms which Kurganists come up with are immaterial now, because of course they will come up with them. So far, they have been busy denying even Iranian inflow into steppe (its a mater of purity of course!), so to accept South/ SC Asian origin is a different matter altogether.

I conclude that Steppe_En is 
60% EHG + 12% CHG + 22% IndiaN related + 6% IranN related +- std errors

As I stated in my previous post:

Where this IndiaN source lived in 6th millenium bce is unknown yet, but in the vicinity of NW south asia and SC asia will be a good guess. Could be either early Mehrgarh culture or Jeitun culture but we need ancient samples from these cultures.


*pseudo-haploid genome = randomly sampling 1 allele from a biallelic SNP (Single Nucleotide polymorphism) marker.
Diploid genome = sampling both alleles of the biallelic SNP

Heterozygosity = measure of genetic variability in a population (cannot be computed with single haploid genome)

22 comments:

Unknown said...

Its was not criticism, they were just seething.Not a single one provided a alternative qpGraph iirc .Seems like steppe having anything to do with India is blasphemous.
Once again thank you for making white bois seethe.

vAsiSTha said...

Would be nice if you pick a name for comments.
"Seems like steppe having anything to do with India is blasphemous."
Only if its South asia---> steppe. other way round is anthony & harvard approved.

Unknown said...

AG is tame but Eurogenes on the other hand.
https://eurogenes.blogspot.com/2021/11/when-it-seems-like-whole-world-has-gone.html?m=1

These people won't even consider your suggestions instead they resort to name calling. I guess that being skeptic in archaeogenetics is no longer acceptable.




https://eurogenes.blogspot.com/2021/11/when-it-seems-like-whole-world-has-gone.html?m=1

Unknown said...

Seems like my text formatting is all messed up.(I'm phoneposting)

3rdacc said...

Sorry, this may be a bit too much for my understanding. But how exactly did your modification of the graph allow for more CHG admixture.

vAsiSTha said...

@3rdacc

Some extra CHG would be from std errors.
Some because instead of Kotias under CHG label, i now used Kotias and Satsurblia.
3rd change is that each admixture node is now followed by a drift edge before another admixture. that would cause some change.

vAsiSTha said...

"AG is tame but Eurogenes on the other hand."
I dont have a problem with Davidski. we are adults we can handle name calling. Banning on anthrogenica is another matter though, very immature.

vAsiSTha said...

Ah, the pashtun Coldmountains and the afghan Pegasus banning me and then badmouthing me like big men. wrt this topic, Their origin and their rage at me correlates quite well :)

Anonymous said...

How much AASI is in modern South Asians ethnic groups let's say like Jatts, Rors Kamboj, Sindhis and Khartis etc most models give them around 17 to 25% AASI,I'm curious how much would you give them

vAsiSTha said...

Irula is 55% aasi. Possibly one of the most aasi shifted apart from paniya, that I know of. NW India's will probably have 20-30% aasi

dosas said...

Great post, very interesting findings, please keep up your good work; voices like yours are sorely needed, as you have discovered those forums only allow opinions that mirror their echo chamber under the guise of their own pseudo-scientific verbose.

tim drake said...

Well, afaik, that pegasus guy is half kashmiri pandit and half tajik :)

My question is this - What are the implications of this India_N ancestry in steppe eneolithic ? If it is the obvious connection of spread of IE languages, shouldn't south india be IEised too ? (I am the aware of the sanskrit loanwords in southern languages but those are the results of historical period interactions and strong brahmin influence. Classical Sanskrit itself is 'relatively' recent )

Also, If you consider the myths then Sage agastya is considered the father of Tamil language and is considered to be the first to have compiled the grammar for tamil language. There is also this legend of Agastya migrating to south along with velir chiefs.

tim drake said...

Actually, afaik, its even less, For Jats,Rors and Khatris/Aroras, AASI is somewhere around 15%-20%.
@vasistha, have you check AASI of Gonds, bhils etc ? Imo, Gonds will have higher AASI than even the paniyas(at least on G25, they did )

tim drake said...

Oh wanted to add that vaaistha's models give less less AASI for Irulas than the qpADM models in earlier papers(where Irula were deemed to be around ~70% AASI, iirc). So, using vasistha's models, i think NW castes like jats, rors, khatris/aroras will get as less as 5%-10% AASI ancestry :)

vAsiSTha said...

"My question is this - What are the implications of this India_N ancestry in steppe eneolithic ? If it is the obvious connection of spread of IE languages, shouldn't south india be IEised too ?"

Language change is also a function of population sizes. IndiaN ancestry probably reached south after 2000-1500bce. Whereas the steppe_en connection is 5000bce. population sizes must be drastically smaller back in 5000bce. but thats just a guess.

vAsiSTha said...

Yes im indeed getting lower AASI for Irula than with qpAdm IranN + Onge model. but i think 55-60% aasi is correct. this % didnt change in multiple qpGraph iterations.

gamerz_J said...

I don't use anthrogenica, and even though I disagree with aspects of your model, it is still a shame you were banned from a forum just for an opinion.

I was wondering what is the connection between Onge and AASI exactly because I have seen mentioned around that Onge is an imperfect proxy for AASI and it seems this discussion hints to that too? Is it a drift difference?

vAsiSTha said...

Onge and aasi share a common ancestor.
But both got separated due to isolation and genetic drift.

Indosphere said...

Hello Vasistha,
Let me first say that I very much admire the efforts you are making and I think the conclusions you point to are ultimately correct.

However, I do think that if the Kotias aDNA sample published in Wang et al 2019 is in fact pseudohaploid, it is a serious criticism that cannot be dismissed lightly. The issue with pseudohaploid aDNA genomes is that, because of degradation, your recoverable samples have less than 1X sequence coverage for both sets of homologous chromosomes. Therefore you do not know if many of your SNPs (how many would depend on the sample) are in fact homozygous or heterozygous for reference vs. alternative allele at any given locus.

This can make a huge difference to Fst of the population being considered (because heterozygosity is after all a core component of Fst). Admixtools (see Patterson et al 2012) encodes a SNP at any given locus as 0,1, or 2 based on whether the locus is homozygous for the reference allele, or the locus is heterozygous (one reference and one alternative allele), or the locus is homozygous for the alternative allele. If you do not have at least 1X sequence coverage at that locus (i.e. pseudohaploid) you can't score this accurately. In genomes from larger populations, probabilistic techniques like Markov Chain Monte Carlo are used to predict what the homologous allele would have been on the missing chromosomal segment. But of course this isn't possible with aDNA when you don't have a big enough reference population size for probabilistic determination. So what usually happens is that you assume the locus is homozygous for whichever allele of the SNP pair is readable from recovered data.

I think this is their objection: how can you be sure your f4 statistical operations (and hence your drift edge readout) are reliable because the Kotias genome is pseudohaploid and we don't have crucial information about its heterozygosity, which could impact Admixtools calculations.

vAsiSTha said...

@indosphere

The CHG label I have used has both Kotias as well as Satsurblia. So the issue of pseudohaploid causing artificially high drifts is no longer an issue anymore.

Apart from that, there are more pseudo haploid labels. ONG.SG is one.. there is no issue if there are multiple samples in the label.

vAsiSTha said...

Any .SG sample is pseudo haploid.. .DG is diploid. doesn't mean .SG can't be used for qpGraph. Almost all paleolithic samples are .SG. is the suggestion then to not use them at all? That can't be.

I have built a model with low residual fstats, no true 0 drift edges, drift edge immediately following admixtured node (not a strict requirement anyway).


Does not mean that this is the only viable model, however this model is indeed viable.

Indosphere said...

@vasistha. Please contact me rudradev DOT brf ATT g mail DOT com. My background is in molecular biology & I would like to discuss a few things. Thanks.