Sunday, August 14, 2022

Update: IVC & Swat Valley Genetics, Bonus - Kashmiri Pandit Ancestry

SECTION A: THE INDUS PERIPHERY SAMPLES


In the Narasimhan et al 2019 paper, 13 outlier samples were published - 10 from the Iranian site of Shahr-i-Sokhta (abbrv. SiS, various dates 3100-2000 BCE) and 3 from the Bactria Margiana Archaeological Complex (abbrv. BMAC) site of Gonur (2400-2000 BCE). These samples showed an elevated ancestry component found in the Onge Andamanese and modern Indians, which was not found in the main samples from SiS and BMAC. Based on this, the hypothesis was made that these people were migrants from one or more sites of the Indus Valley Civilization (abbrv. IVC) and were labeled as 'Indus Periphery Samples'.

Various Site Locations



A.1 The ancestry composition of the Indus Periphery Samples


Narasimhan et al 2019 models Indus periphery distally* as GanjDareh_N (Iran_Neolithic) + Onge Andamanese (AASI) + WestSiberianHG (ANE related). The paper categorically denies presence of Anatolian farmer ancestry in the Indus Periphery samples, and as a result denies BMAC and SiS as sources of ancestry. 

*Distal models break up the ancestry using distal sources, these may not be the actual mixing populations. Proximal models further refine the possible sources by using only those sources that could have realistically admixed with each other considering geography and time. For example, if Anatolian farmer is proven to be a distal source for Indians, it doesn't mean that the farmers walked to India and mixed here without mixing with anyone else. It would mean that a nearby Iranian or Central Asian source with Anatolian admixture is the true source of the Anatolian admixture in Indians.

Excerpt from Narasimhan 2019 relevant to the Indus Periphery samples


Recently, Maier et al 2022 (preprint) has challenged this model and asserted that a 4th ancestry from Anatolian farmers is indeed required to model these samples.

..a model that was shown to be fitting for all Indus Periphery individuals modeled one by one by Narasimhan et al. (Ganj Dareh Neolithic + Onge (ASI) + West Siberian hunter-gatherers (WSHG)) was rejected for the grouped individuals with a p-value = 0.0044. In contrast, a model “Indus Periphery = Ganj Dareh Neolithic + Onge (ASI) + WSHG + Anatolia Neolithic” was not rejected based on the p>0.01 threshold used in Narasimhan et al. (p-value was marginal but passing at 0.03) and produced plausible admixture proportions for all four sources that are confidently above zero: 53.2 ± 5.3%, 28.7 ± 2.1%, 10.5 ± 1.3%, 7.7 ± 2.9%, respectively.

I tried to replicate Narasimhan's model for each sample (8 of them, the others are too low quality and/or contaminated) with qpAdm using my own set of reference populations*, and found that the model only passes with p-value>>0.05 for samples with the lowest SNP count (indicating low power). Models with P-values green are acceptable, and red are rejected.

*Reference: Mbuti.DG, Cpaleo, WSHG, EHG, IG_Serbia, PPN, Shamanka_En, CHG, Turkey_N (Cpaleo clubs Tianyuan & China_Amur_Paleo together)

Sample ID Label SNPs (out of 1240K) P-value GanjDareh % Tarim_EMBA1 % Onge %
I11466 SiSBA2 140282 0.389 70% 9% 21%
I11459 SiSBA2 320497 0.046 63% 10% 27%
I11456 SiSBA2 633215 0.075 64% 9% 27%
I2123 Gonur2 314707 0.028 73% 12% 15%
I11041 Gonur2 125977 0.940 78% 7% 15%
I10409 Gonur2 105846 0.650 42% 10% 48%
I8728 SiSBA2 645800 0.048 54% 9% 37%
I8726 SiSBA2 604344 0.036 80% 12% 8%

On the other hand, including PPN (Levant Pre Pottery Neolithic culture samples) or Anatolia_N samples as 4th source improve the p-values significantly.


Sample ID Label SNPs (out of 1240K) P-value GanjDareh Tarim_EMBA Onge PPN
I11466 SiSBA2 140282 0.468 54% 11% 26% 9%
I11459 SiSBA2 320497 0.242 51% 12% 30% 7%
I11456 SiSBA2 633215 0.355 54% 11% 28% 7%
I2123 Gonur2 314707 0.113 60% 14% 18% 8%
I11041 Gonur2 125977 0.993 69% 8% 17% 6%
I10409 Gonur2 105846 0.556 35% 11% 50% 4%
I8728 SiSBA2 645800 0.288 49% 9% 39% 3%
I8726 SiSBA2 604344 0.540 72% 12% 11% 5%

The need for PPN/Antaolia Farmer holds when all Indus Periphery samples are analyzed under a single label IVCp.


Label P-value GanjDareh % Tarim_EMBA % Onge % PPN % Anatolia_N %
IVCp 0.0010 66% 10% 24% - -
IVCp 0.3817 56% 12% 25% 7% -
IVCp 0.3336 58% 11% 25% - 6%

So with this, we must conclude that IVCp samples did receive geneflow indirectly from the Levant/Anatolian region. 6-7% in this distal model might seem small, but this potentially means than it entered IVCp through some Iranian or SC Asian source which could have given from 30-60% ancestry to IVCp. 

PROXIMAL MODELS FOR IVCp

Source 1 P-value Source 1 Onge Tarim_EMBA1 Reason for Failure As Source
Tajikistan_Sarazm 0.04 75% 29% -4% Too much tarim in Sarazm
Turkmenistan_C_TepeAnau 0.73 64% 29% 7%
Turkmenistan_C_Namazga.SG 2.50E-09 71% 24% 5% IranN/CHG ratio is low
Bustan_Eneolithic 0.17 63% 28% 9%
Turkmenistan_C_Geoksyur 0.31 67% 30% 3%
Iran_BA1_ShahrISokhta 0.03 68% 26% 6% Anatolian/Iran ratio too high

Tepe_Anau, Geoksyur_En & Bustan_En turn out to be viable sources of ancestry for the IVCp samples.


DISTAL MODELS FOR THE SC ASIAN SOURCES


Target P-value CHG_Kotias GanjDareh Tarim_EMBA1 PPN Anatolia_N
Turkmenistan_C_Geoksyur 0.18 21% 40% 16% 23% -
Turkmenistan_C_Geoksyur 7.50E-10 11% 59% 14% - 16%
Turkmenistan_C_TepeAnau 0.45 11% 68% 9% - 12%
Turkmenistan_C_TepeAnau 0.62 17% 56% 10% 17% -
Bustan_Eneolithic 0.71 20% 55% 8% 17% -
Bustan_Eneolithic 0.78 14% 67% 6% - 13%

*Reference Populations: Mbuti.DG, EHG, China_YR_MN, Serbia_IronGates_Mesolithic, CHG_Satsurblia, WSHG, Turkey_Boncuklu_N, ONG.SG, PPN, Turkey_N
** I had to remove Shamanka_En as reference for this analysis as it consistently failed all models due to excess Shamanka vs other ref pops in the model as compared to actual target. But the same problem did not happen with Onge as reference even though it is also east asian. This removal is not recommended, and this problem did not occur with other Targets in previous tables. There is something mysterious going on about this excess affinity to this specific east asian pop in these models (as compared to actual target) that I will have to figure out some other time. So, I wont consider the above table as final.
*** Because CHG & GanjDareh are similar, their standard errors are higher at ~ +-5% for all models.

We can see a sizeable impact of Anatolian/PPN ancestry in these 4th Millenium BCE samples from SC Asia. Allentoft et al 2022 (preprint) has 1 sample from 4600 BCE Monjukli Tepe, Turkmenistan and the preprint reports it to have 0 Anatolian ancestry. If that is correct, it would seem that this ancestry arrived in SC Asia between 4600-4000 BCE.

A.2 CONCLUSIONS & CAVEATS


1. There is definite presence of Anatolian/PPN ancestry in the IVCp samples.

2. There is a good chance that the IVCp samples descend from some SC Asian population who admixed with AASI like population between 4500-3800 BCE (Admixture date from Narasimhan et al 2019, I take these dates with a pinch of salt as having used the same tools I find that the results differ dramatically depending on sources used).

3. It is also possible that IVCp only got a later admixture from an Anatolian admixed SC Asian source, the earlier population being IranN + Onge admixed. We need proper quality samples from actual IVC sites to confirm or reject this.

4. The SC Asian populations are an admixture between GanjDarehN & CHG like ancestries. Anatolian ancestry admixed with them later, possibly after 4600BCE. (Caveat: Since GanjDareh & CHG are similar ancestries with differing internal proportions, we may possibly be dealing with a different population rather than a recent admixture between CHG & GanjDareh, ie. an old population formed in the same process that gave birth to CHG & Ganj Dareh pre 10k BCE)

5. It may be that that once we get actual IVC samples of good quality, they show no Anatolian ancestry, in which case we would conclude that the IVCp samples weren't actually representative.

A.3 IMPLICATIONS



The above conclusions are extremely important because they overturn one important conclusion (quoted below) of the Shinde et al 2019 paper, which is also coauthored by Narasimhan.

Our results also have linguistic implications. One theory for the origins of the now-widespread Indo-European languages in South Asia is the “Anatolian hypothesis,” which posits that the spread of these languages was propelled by movements of people from Anatolia across the Iranian plateau and into South Asia associated with the spread of farming. However, we have shown that the ancient South Asian farmers represented in the IVC Cline had negligible ancestry related to ancient Anatolian farmers as well as an Iranian-related ancestry component distinct from sampled ancient farmers and herders in Iran. Since language proxy spreads in pre-state societies are often accompanied by large-scale movements of people (Bellwood, 2013), these results argue against the model (Heggarty, 2019) of a trans-Iranian-plateau route for Indo-European language spread into South Asia. However, a natural route for Indo-European languages to have spread into South Asia is from Eastern Europe via Central Asia in the first half of the 2nd millennium BCE, a chain of transmission that did occur as has been documented in detail with ancient DNA.
Maier et al 2022 concludes similarly
These results show that at least with regard to the admixture graph analysis, a key historical conclusion of the study (that the predominant genetic component in the Indus Periphery lineage diverged from the Iranian clade prior to the date of the Ganj Dareh Neolithic group at ca. 10 kya and thus prior to the arrival of West Asian crops and Anatolian genetics in Iran) depends on the parsimony assumption, but the preference for three admixture events instead of four is hard to justify based on archaeological or other arguments.
With this, the trans-Iranian-plateau route for Indo-European languages into Neolithic India reopens, and gains much more importance as Proto-Indo-European homeland is being shifted into the Iran/Armenia region by Harvard and Max Planck labs. 

The above results also reopen the case about whether genetic exchange from the Anatolian region spread farming to NW India. 

But there are other lines of evidence which show that Agriculture in South and Central Asia might not have had any influence genetically from Anatolia. These are:

1. Allentoft et al 2022 (preprint) shows that the Anatolian ancestry did not exist in the Mojukli Depe sample dated to 4600 BCE. 

2. However, the earliest layers of Monjukli Depe dated to 6200 BCE belong to the Jeitun Culture and the people there farmed barley and 2 types of wheat, and also reared sheep and goat. 

3. If the assertion in #1 is true, it would mean that the Anatolian ancestry in SC Asia arrived much later than the practice of agriculture.


SECTION B: SWAT VALLEY IRON AGE


A huge number of samples from multiple Swat valley iron age sites were published in Narasimhan et al 2019. However, I don't believe the paper did full justice to those samples.

Swat PCA

The above 3D PCA is important, and conveys most of the information related to the Swat samples. This is how you should read it.

1. The white cline is the Paniya- IVCp- SiS cline. Paniya have the highest AASI, various IVCp sample with differing AASI % (red and green) fall on the cline, and the other end has IranN heavy SiS cluster.

2. Among the Swat samples, there are 2 clusters. 1 cluster is very close to the previous Paniya-IVCp-SiS cline indicating that it has low amount of external ancestry as compared to IVCp samples. This cluster has almost 0 steppe ancestry and I label it Swat_nosteppe.

3. The other Swat_IA cluster is clearly pulled away from Swat_nosteppe, but towards no particular cluster. Rather it is pulled towards an usampled population which is an admixture between BMAC & Steppe (The intersection between the red and yellow cline). Either such a population existed or Swat received 2 separate inflows from BMAC & Steppe.

Our next step is to prove what we see on the PCA using formal stats.

B.1 Ancestry Composition of Swat_nosteppe cluster


Our Swat_nosteppe cluster consists of 7 samples, 3 samples from the site of Aligrama dated to around 900BCE, 3 samples from Katelai I5399, I12460, I12446 and 1 from Loebanr - I12981. From the PCA, we can see that only the AASI heavy IVCp samples are needed to model this cluster, so we only use I8728 and I11459 as label them as IVCp1. We shall use Dzharkutan_BA1 samples as BMAC proxy.


We can assume that the BMAC ancestry had already started making an impact on Swat valley before Steppe populations made an impact. This connection has already been noted by archaeologists like Viktor Sarianidi (2001).

B.2 Ancestry composition of the rest of the Swat_IA cluster


We shall test various qpAdm rotating models for 3 Swat_IA populations: Loebanr_IA, Katelai_IA, Udegram_IA modeled as Swat_nosteppe + BMAC + Steppe.  Dzharkutan will be the BMAC proxy, and 5 populations from steppe will be checked as potential sources. These will be:

1. Sintashta_MLBA
2. Dali_MLBA
3. Kashkarchi_BA
4. Kokcha_BA
5. CentralSteppeMLBA (Krasnoyarsk, OyDzhaylau, KazakhMys)

The model will be a rotating one to compare steppe sources vs one another. When 1 is used as source, the other 4 will be in the reference group.


Target Steppe Source P-Value Swat_nosteppe BMAC Steppe Source
Loebanr_IA Kashkarchi 0.0021
Loebanr_IA Dali_MLBA 0.17 64% 22% 14%
Loebanr_IA Sintashta 5.30E-05
Loebanr_IA Kokcha 0.0039
Loebanr_IA CentralSteppeMLBA 0.001
Katelai_IA Kashkarchi 0.0018
Katelai_IA Dali_MLBA 0.0009 60% 26% 14%
Katelai_IA Sintashta 2.55E-06
Katelai_IA Kokcha 0.0001
Katelai_IA CentralSteppeMLBA 0.0100 64% 23% 13%
Udegram_IA Kashkarchi 0.15 65% 20% 15%
Udegram_IA Dali_MLBA 0.63 61% 24% 15%
Udegram_IA Sintashta 0.03
Udegram_IA Kokcha 0.18 65% 20% 15%
Udegram_IA CentralSteppeMLBA 0.09 64% 22% 14%

We see average 15% Steppe ancestry in the Swat_IA cluster. Dali_MLBA works for 2 of the 3 labels, so it is deemed to be the most likely source of the steppe ancestry in the Swat region. One of the outliers from Bustan (post BMAC), sample I11520 dated to 1550BCE, has a very similar ancestry profile to one of the Swat_IA cluster. Therefore, we can assume that the steppe related ancestry entered Swat around or before that time.

Dali_MLBA site has 3 published samples dated to 1900-1300BCE. It derives 80% of its ancestry from Sintashta and 20% from the previous local ancestry which is heavy in the Tarim like ANE component.

Modern road from Sintashta - Dali - Swat

Interestingly, the modern highway from Sintashta to Dali to Swat region passes through Dzharkutan. This route is the Inner Asian Mountain Corridor (IAMC).

B.3 Other details about Swat IA


1. The steppe ancestry in Swat_IA was mediated by steppe females as per Narasimhan et al 2019. I concur with this assessment, as the Swat samples showed only 5% of males had R1a+ Y haplogroup but much higher steppe admixture on the X chromosome.

2. However, Narasimhan et al 2019 reported that the steppe admixture in modern Indian men was somehow male mediated showing the opposite trend than Swat_IA.

Using previously reported calls on 1000 Genomes Project Y chromosomes (223), we observe that 62 out of the 221 South Asian males have an R1a Y chromosome corresponding to a ninety-five percent binomial confidence interval of 22-34% for Steppe MLBA ancestry on the entirely male line, which is significantly higher than the ninety-five percent confidence interval of 9-14% on the autosomes in the same set of individuals. These results shows the process of admixture of Central_Steppe_MLBA into the ancestors of the ANI was malebiased, and reveal that the directionality of sex bias was opposite to the pattern observed for the contribution of Central_Steppe_MLBA to SPGT.

I believe this conclusion is premature and based on incomplete logic. Most of the South Asian males are positive for R1a1a1b2a1a+ (or R1a-L657+) a mutation whose birth is around 2000BCE according to Yfull. This date is relevant because it is before the arrival of steppe ancestry into India. None of the 95 steppe bronze age males have this mutation, rather they fall on the brother clade R1a1a1b2a2+ (R1a-Z2124+). L657+ is also absent in modern Europeans, but present in China, Indian Subcontinent and Middle east. In fact, on YFull, the basal subclades of R1a-Y3 and R1a-Y2, the ancestors of L657, are found in modern men from Kuwait and Oman. 

So, if out of the 62/221 males we remove the L657+ males, the 22-34% (as reported by Narasimhan et al) will drop to around 5-10%, and the new conclusion will be that of 'no male biased admixture of steppe_mlba into modern Indians'. This conclusion is congruent with with we actual see in the Swat samples, the only good quality dataset that we have from the Indian subcontinent.


SECTION C: About IndiaN & Yamnaya


In earlier posts I had detailed using qpGraphs how the IndiaN ancestry which originates somewhere in the SC Asian/Iran/NW India region provides ancestry to both Indians as well as Steppe through Steppe_Eneolithic/Yamnaya. 

Considering the conclusions of Section A above, the probability of the origin being to the west of India has increased. If IVC did have Anatolian ancestry, it makes more sense to be a recipient of the IndianN ancestry (along with Anatolian) from SC Asia rather than a donor.

However, the conclusion that IVC and Steppe share a common ancestor and that ancestor is most likely to be the vector of IE languages does not change.

Yamnaya receives geneflow from SC Asia, via Steppe Eneolithic

SintashtaMLBA receives ancestry from SC Asia, via Yamnaya

SC Ancestry is also seen in Anatolia via Kura Araxes culture in Armenia.


BONUS: Kashmiri Pandit Ancestry

We have a single good quality Pandit sample in database. We will try to model his ancestry with Irula + Local source (IVCp, Loebanr_IA or Swat_nosteppe) + Steppe Source (5 sources used above and Kangju samples)

Steppe Source India Source1 India Source 2 P-Value Steppe Source% India Source1% India Source 2 %
Kangju IVCp Irula 0.06 39% 39% 22%
Dali_MLBA IVCp Irula 1.80E-06
CentralSteppe IVCp Irula 2.10E-08
Sintashta_MLBA IVCp Irula 6.30E-08
Kashkarchi IVCp Irula 6.12E-09
Kokcha IVCp Irula 3.20E-08
Kangju Loebanr_IA Irula 0.16 28% 42% 30%
Dali_MLBA Loebanr_IA Irula 2.32E-05
CentralSteppe Loebanr_IA Irula 5.50E-06
Sintashta_MLBA Loebanr_IA Irula 8.10E-06
Kashkarchi Loebanr_IA Irula 4.70E-07
Kokcha Loebanr_IA Irula 3.00E-06
Kangju Swat_nosteppe Irula 0.05 35% 45% 20%
Dali_MLBA Swat_nosteppe Irula 1.20E-06
CentralSteppe Swat_nosteppe Irula 6.90E-08
Sintashta_MLBA Swat_nosteppe Irula 3.96E-07
Kashkarchi Swat_nosteppe Irula 5.90E-09
Kokcha Swat_nosteppe Irula 1.88E-07


Turns out that Kaz_Kangju samples from 200CE are the best and only match for the excess steppe ancestry in this Kashmiri Pandit. Kangju ancestry profile is seen in the data from 300BCE onwards, so this steppe admixture in Pandit seems rather late.

Edited to add:
Some very interesting information which I did not know before, but which lines up perfectly.

Tweet link

Update:

Ran 495 rotating qpAdm models on IVCp, 12 candidates, 4 sources at a time. I made a new python script to collate all the results in one file. 

Only 1 model out of 495, which we have discussed above, passes.

Google sheet link to the model output here, collated in one place. Easily filterable.




References:

1. Narasimhan VM, Patterson N, Moorjani P, et al. The formation of human populations in South and Central Asia. Science. 2019;365(6457):eaat7487. doi:10.1126/science.aat7487

2. Shinde V, Narasimhan VM, Rohland N, et al. An Ancient Harappan Genome Lacks Ancestry from Steppe Pastoralists or Iranian Farmers. Cell. 2019;179(3):729-735.e10. doi:10.1016/j.cell.2019.08.048

3. Robert Maier, Pavel Flegontov, Olga Flegontova, Piya Changmai, David Reich. On the limits of fitting complex models of population history to genetic data. bioRxiv 2022.05.08.491072; doi: https://doi.org/10.1101/2022.05.08.491072

4. Morten E. Allentoft, Martin Sikora, Alba Refoyo-Martínez, et al. Population Genomics of Stone Age Eurasia. bioRxiv 2022.05.04.490594; doi: https://doi.org/10.1101/2022.05.04.490594


27 comments:

Assuwatama said...

>12000bce split of Indian variant of IranN, did this happen or not?

+

Was the IranN in IVC from west with Anatolian admixture, if so shouldn't it be around 7500bce when Mehargarh shifted to agriculture?

Did this Anatolia admixed IranN mixed with older IranN that was already mixed with AASI in India before 4500bce.

Contradictions?

vAsiSTha said...

">12000bce split of Indian variant of IranN, did this happen or not?"

It did not happen if you allow 4 admixtures in Shinde et al graph (as shown by Maier), but if you constrain it to 3 admixtures, then it could have happened.

I think there's good reason to believe it did not happen.

Assuwatama said...

Thanks.

So the next doubt in my mind is if there is Anatolian in IVCp, how they relate to modern samples especially non-steppe dravids. Let's take Irula for example; is there any anatolian in them?

Assuwatama said...

Or has it disappeared after mixing with AASI? Post IVC.

mzp1 said...

Hey, where did you get the Kashmiri Pandit sample from? I really need that to delve deeper into South Asian genetic structure.

vAsiSTha said...

It's part of the Harvard Allen dataset

mzp1 said...
This comment has been removed by the author.
Daniel de França MTd2 said...

What is the reasoning for including Levantine samples? Unless you are thinking about Nostratic, I am not sure how that would help. That's one of the original places for agriculture, so, it would improve the signal anyway.

vAsiSTha said...

Levant PPN is a standard part of my reference populations, it turned out that if Anatolia/ppn wasn't included the model dstats with ppn would be high. So ppn (maybe via anatoliaN, as anatoliaN has ppn admixture) is a potential source.

vAsiSTha said...

"So the next doubt in my mind is if there is Anatolian in IVCp, how they relate to modern samples especially non-steppe dravids. Let's take Irula for example; is there any anatolian in them?"

Havent modeled Irula distally, but it proximally it has IVCp as source and mayne 3-4% Steppe, some Anatolian will be there.

Tigran said...

Are Oase, Bacho Kiro, Ust Ishim east eurasian or in between west and east eurasians according to your models?

Also do you think parts of South Asia were settled by Basal Eurasians, ANE or West Eurasians and had no AASI?

Thanks.

vAsiSTha said...

https://twitter.com/agenetics1/status/1549323911988256768?t=umjtMCDKvpk1CmAWLN2iHg&s=19

I had made a graph for the paleo populations. It's there on my twitter.

vAsiSTha said...

"Also do you think parts of South Asia were settled by Basal Eurasians, ANE or West Eurasians and had no AASI?"

Only in the north and north west there is a chance of 0 aasi.
Rest of India would have aasi.

vAsiSTha said...

Added this to the blogpost. It seems relevant and important.

But there are other lines of evidence which show that Agriculture in South and Central Asia might not have had any influence genetically from Anatolia. These are:

1. Allentoft et al 2022 (preprint) shows that the Anatolian ancestry did not exist in the Mojukli Depe sample dated to 4600 BCE.

2. However, the earliest layers of Monjukli Depe dated to 6200 BCE belong to the Jeitun Culture and the people there farmed barley and 2 types of wheat, and also reared sheep and goat.

3. If the assertion in #1 is true, it would mean that the Anatolian ancestry in SC Asia arrived much later than the practice of agriculture.

Tigran said...

Your twitter link isn't working.

Is it this?

https://pbs.twimg.com/media/FX_eaovUUAEgC8g?format=png&name=900x900

Showing Ust Ishim as West Eurasian?

vAsiSTha said...

Yes, that's the one. It is an automated output from find_graph() in admixtools2. Multiple models will be viable, of which this is one.

Ust Ishim is shown as WE but the drift units from the common nodes are very small, as compared to Sunghir or Kostenki. So it is quite close to Basal Eurasian/zlatykun.

Tigran said...

This is a different type of Basal Eurasian than the one found in ancient West Asians?

Also UI being close to WE would be interesting. He was K2a which is about the same age as K2b which would throw a bit of doubt on K2b being an ENA lineage.

vAsiSTha said...

"This is a different type of Basal Eurasian than the one found in ancient West Asians?"

Cant comment on that without explicitly introducing said West asians into the graph. The most basal sample yet is from Zlaty Kun, Czechia.

"Also UI being close to WE would be interesting. He was K2a which is about the same age as K2b which would throw a bit of doubt on K2b being an ENA lineage."

Vallini et al 2022, models the graph a bit differently wrt Ust-Ishim. So i wouldnt jump to conclusions yet.

Vallini qpGraph

Singh said...

@vAsiSTha

You seem to be confusing Crown Eurasian (Zlaty kun, UI, K14, ENA etc) with Basal Eurasian. That yellow label in your graph should be 'crown eurasian' and you should include sample with BasalEU admixture.

Basal Eurasian in published studies mentions it's only present in West Eurasians samples from like Dzudzuana, Iran_N, CHG, EEF, Natufian etc.

All of this is also explained in Zlaty Kun study.

vAsiSTha said...

@Singh

This is very funny to me. I could not find the word 'crown' in the Prufer et al 2021 study on Zlaty Kun. Its not there.

Meanwhile, Basal eurasian is mentioned a lot of times. Let me enumerate.

1. "This suggests that Zlatý kůň falls basal to the split of the European and Asian populations."

2. "To date, only two ancient Eurasian genomes have been produced from individuals who, like Zlatý kůň, appear to fall basal to the split of Europeans and Asians: Ust’-Ishim and Oase 1."

3. "Extended Data Fig. 6 Comparable signal of Basal Eurasian ancestry in Zlatý kůň and Ust’-Ishim."

Now from Vallini et al 2022

1. "Before these events, we also confirm Zlatý Kůň as the most basal human lineage sequenced to date OoA, potentially representing an earlier wave of expansion out of the Hub."

2. "Taken together, these studies show that sometimes before 45 ka,
Europe was inhabited by a lineage basal to all other
Eurasians (Zlatý Kůň),"

3. "The best-fitting tree we obtained
running OrientAGraph/Treemix matches the one we identified as most supported using qpGraph in Supplementary fig. S2B with Zlatý Kůň as the most basal Eurasian lineage, Bacho Kiro and Tianyuan as sisters, and Ust’Ishim splitting early from that branch (IUP), just after the split from the branch that would lead to Kostenki14 and Tianyuan (UP)."

Hope that satisfies your concerns.

vAsiSTha said...

The basal Eurasian node which provides inputs to IranN and CHG should split somewhere from near Zlaty Kun in the yellow region of my graph.

Singh said...

Not the same thing. Yeah If you add something from Caucasus or Iran to that graph, it will form a new branch which contributes to those samples.

"To gain insight into the genetic relationship of Zlatý kůň to present-day and ancient individuals, we calculated summary statistics based on the sharing of alleles (f3, f4 and D statistics20) with our capture and shotgun datasets. We first compared Zlatý kůň with present-day European and Asian individuals using an African population (Mbuti) as an outgroup and found that Zlatý kůň shares more alleles with Asians than with Europeans (Extended Data Fig. 6). A closer relationship to Asians has also been observed for other Upper Palaeolithic and Mesolithic European hunter-gatherers compared with present-day Europeans and can be explained by ancestry in present-day Europeans from a deeply divergent out-of-Africa lineage referred to as basal Eurasian21. European hunter-gatherers generally do not carry basal Eurasian ancestry, whereas such ancestry is widespread among ancient hunter-gatherers from the Caucasus, Levant and Anatolia22,23,24. When we tested European hunter-gatherers without basal Eurasian ancestry against ancient and present-day Asians, we found that none of these comparisons indicate a closer relationship of Zlatý kůň with either group (Supplementary Sections 5 and 9 and Extended Data Fig. 7). This suggests that Zlatý kůň falls basal to the split of the European and Asian populations."

https://www.nature.com/articles/s41559-021-01443-x

vAsiSTha said...

Both Zlaty Kun and the hypothetical population which gave ancestry to CHG/Iran are basal to the east/west Eurasian split and both are non african ie eurasian.

Therefore, both of them are Basal Eurasian. There is nothing which prevents anyone from Labeling Zlaty Kun as basal eurasian and Vallini et al 2022 does call Zlaty Kun 'the most basal eurasian lineage'.

Piyush said...

Ashish,
Regarding Kangju profile being pretty late and thus, steppe ancestry in KP being later, aren't there some late bronze age samples from western xinjiang which are similar to kangju ? As pegasus says in the comment here https://anthrogenica.com/showthread.php?4492-Split-The-Jatts-amp-Their-Genetic-Origins&p=867393&viewfull=1#post867393

Thus, the kangju getting selected because it may not be truly ancestral but rather it is proxying for similar but older ancient cluster which we don't have due to lack of relevant samples from Iron age ?

Anyways, have you tried using that CHN_Jirentaigoukou_LBA1 as a source for KP ?

vAsiSTha said...

I haven't tried that sample, and yes any samples with similar profile can be valid sources (only if it works in qpAdm rotation) . I have maintained that Kangju means anything with similar profile and that this ancestry possibly existed as early as 500bce. With that said, it is not impossible that 200ce people came and admixed into Kashmiris. Some northern Indian populations have a really late admixture date as per Narasimhan et al 2019.

vAsiSTha said...

As per Niraj Rai, Burzahom site in Kashmir saw steppe ancestry 500bce onwards, so i guess that date makes most sense for Pandits as well.

Piyush said...

"As per Niraj Rai, Burzahom site in Kashmir saw steppe ancestry 500bce onwards, so i guess that date makes most sense for Pandits as well."

Well Dr Rai only claims, doesn't publish lol. Anyways, on a serious note, from what I have heard, he doesn't have data from all the relevant periods from Burzahom site. There is data from bronze age and then directly from the historical period with no iron age data in between!