Saturday, July 29, 2023

The Hybrid Model for Indo-European languages

New paper - Paul Heggarty et al. ,Language trees with sampled ancestors support a hybrid model for the origin of Indo-European languages.Science381,eabg0818(2023).DOI:10.1126/science.abg0818

*Note, in the paper the authors use dates in BP (before present) where present = 2000 CE. While I appreciate the strictly secular nomenclature, I prefer BCE dates as they are easier to comprehend for my brain (for now). So I convert these into BCE by subtracting 2000. eg. 6000 BP = 4000 BCE. Do keep this in mind while reading. Thank you.

Brief Overview of the Method


The authors aver that this method used Bayesian phylogenetic inference which is not similar to either Lexicostatistics or Glottochronology, both of which they consider deeply flawed.

This paper's Bayesian phylogenetic inference analysis is based on a new improved database (IE - CoR 1.0). The IE‑CoR 1.0 database contains data on relationships of cognacy (shared word origin) between 161 Indo-European languages in a reference set of 170 basic meanings. The new languages include the Nuristani branch, extinct Iranic languages from central Asia and a representative of sub-branches of Celtic which was missing from previous databases (Gaulish). The coverage prioritizes non-modern languages, providing a deeper phylogenetic signal and better chronological estimation. This database was contributed by 80 experts of different language sub-families to maximize data accuracy.

The authors state they improved the cognate encoding (keeping 1 lexeme for each cognate set rather than many synonyms used in previous databases which created lots of cognate sets per lexeme. This, for example, artificially elongated the branch length of modern Greek and the age of old Greek). The IE-CoR data set has highly consistent counts of cognate sets across all languages, very close to the target of 1 cognate set per meaning, per language. They also removed the constraints previously placed on ancient languages to be directly ancestral to modern languages which need not be the case. This previously forced 0 branch length (and therefore no divergence), simply forced the changes onto the next branch and elongated branch lengths artificially.

The database also solves the loanword problem in computational cladistics. "IE-CoR introduced the concept of loanword event, through which it has become possible to encode correctly both non-cognacy to the source lexeme, and subsequent cognacy between vertical descendants of that lexeme, once borrowed and fully integrated into the borrower language."

The IE-Cor database can be found here https://iecor.clld.org/


Important Discussion and Conclusions from the Paper


Heggarty et al reaffirm the position of the earliest Indo-European speakers in the south of the Caucasus around ~6100 BCE. They support a hybrid model in which the steppe was a secondary staging ground for European languages. Notably, the beginning of the split from Indo-Iranian into Indo-Aryan and Iranic is dated to ~3500 BCE, a finding wholly incompatible with the Andronovo hypothesis.

DensiTree showing final IE Tree with probability of topologies
DensiTree final output of the paper shows the probability distribution of various topologies. Orphan branches are sampled ancient languages in the database (some examples in red and yellow box markers)



The authors make clear that Proto-Indo-European should include the stage before the Anatolian and Tocharian split, ie. they reject the nomenclature which places PIE in the steppe (which excludes at least the Anatolian branch) used, for e.g in Lazaridis et al 2022 (Southern Arc paper).

Similarly, recasting the question as if a search for the homeland of ‘Late Proto-Indo-European’ serves the same effect, by excluding the same two branches taken to have already diverged by the ‘Late’ stage, as if Anatolian and Tocharian were not relevant to the homeland question. There is in fact considerable inconsistency in linguistic nomenclature about different stages in the family’s diversification. Strictly, ‘proto-’ in any case refers by definition to the last stage of the common ancestor language of a family, immediately before any branches diverged. Proto-Indo-European should thus include Anatolian and Tocharian, since their relatedness with the other branches of Indo-European, in the same family, is not in doubt.

The authors argue for a non-steppe Asian route for Armenian, Greek and Albanian families based on their very early divergence dates. This is one of their conclusions that I am sceptical about, steppe ancestry does reach these regions at some point. However, I am not entirely sure about either theory.

Further south, the expansion of CHG/Iranian ancestry into the eastern Mediterranean and Balkans, without passing through the Steppe and thus without major admixture with EHG ancestry, presents a closer chronological and geographical match with the remaining, deeper European branches in our topology, namely Albanian, Greek and Armenian.

On the route taken by Iran Neolithic ancestry towards the steppe, they remain open to the east of the Caspian route but favour the direct route north from the Caucasus. Given my finding of Sarazm Eneo/Tutkaul neolithic-related ancestry in Steppe Eneolithic and Khvalynsk, I remain open to the east of the Caspian route.

On the route by which this CHG/Iranian ancestry reached the Steppe, aDNA evidence is not yet sufficient to exclude a route via Central Asia, i.e. first eastwards and then north and westwards, counter-clockwise around the Caspian, as hypothesized in (54). Nonetheless, the more parsimonious explanation, also given the aDNA record for time-transects through the Caucasus (48), would be a far shorter route directly northwards through the Caucasus, in line with corresponding expansions in material culture in the archaeological record.

 

On Indo-Iranian


It has been commonly averred that Avestan and Vedic languages were very similar and demonstrated a recent split of Indo-Aryan and Iranic in the 2nd mil BCE. The authors reject that view. Their reasoning is as follows
The inference of an Indo-Iranic split at ~5520 yr B.P. (4540 to 6800 yr B.P.) may, at first glance, seem surprising. Established expectations are for a more recent date, based on the perceived level of similarity between Vedic Sanskrit and Avestan— the earliest known ancient languages in the Indic and Iranic branches, respectively. However, these judgments of linguistic similarity have been largely impressionistic (36) rather than quantified. In the precisely defined IE-CoR meanings, Early Vedic and Younger Avestan share only 58.7% cognacy (37). This matches the level of cognacy that survives between the most divergent sublineages within the Romance clade, for instance, after roughly two millennia since the spread of the Roman Empire. Early Vedic and Younger Avestan themselves date back to at least the mid-fourth and mid-third millennia before present, respectively. A time depth two millennia earlier (~5520 yr B.P.) for the split between their lineages (Indic versus Iranic) is thus consistent with the 58.7% cognacy overlap between them. More widely, ancient Indo-European languages show close similarities in some aspects of their inflectional morphology (noun declension and verb conjugation) and phonology. These similarities have often been assumed to imply a relatively short time span of divergence since their common ancestor language, but these impressions are also unquantified. Our time-depth estimate implies a long period of relative stability in these aspects, while early Indo-European diverged faster in other respects.

The authors overall approve of the Iran Neolithic ancestry as a 'tracer-dye' for IE languages, including Indo-Iranian. In the supplement, they write

From Iran to India, Steppe ancestry is present only in low proportions, and only from a relatively late date, c. 3500 BP (49). This is significantly later than standard expositions of the Steppe hypothesis have proposed, associating Indo-Iranic with the earlier Bactria-Margiana Archaeological Complex (BMAC) culture. Dates for first incursions southwards from Central Asia as late as 3500 BP also leave little scope for the Indo-Iranic superstrate assumed to be present as far south and west as northern Syria and southeast Anatolia, already by the time of the Mitanni kingdom there. 
Unlike the major and relatively sudden incursion of (Forest?) Steppe ancestry into Central Europe with Corded Ware c. 5000 BP, or Yamnaya into the Carpathian Basin around the same time, the weaker and much later signal in south-central Asia does not represent a strong prima facie explanation for the origins and first expansion of Indo-European languages here.

In the context of the Iran Neolithic ancestry, they write

It is found in eastern Iran, and in the Indus Valley (roughly the dividing line between the Iranic and Indic branches today), at the approximate time-depth when the two branches separate from each other in our analysis. This separation could correspond with an eastward expansion along the Ganges valley of what would become the Indic branch, picking up some of its distinctive linguistic characteristics from contact with local populations. This makes for a more straightforward scenario for the chronology, distribution and dominance of Indo-Iranic languages right across this region than a much later and genetically much less significant contribution from Central Asia.

For the steppe hypothesis for Indo-Iranian, they correctly point out the abysmal archaeological evidence (ie 0 steppe artefacts in India and Iran)

It has long remained a recognized weakness of the Steppe hypothesis (pp. 177-181 in (80); pp. 212-217 in (59); (90)) that the archaeological record lacks any obvious impacts out of the Steppe in a time-frame early enough to fit well with the scale of linguistic divergence within Indo-Iranic. Advocates of the Steppe hypothesis have widely assumed that the Andronovo culture ‘must have’ been Indo-Iranic-speaking, but even Mallory “find[s] it extraordinarily difficult to make a case for expansions from this northern region to northern India”, and more generally finds no obvious connection to “the seats of the Medes, Persians or Indo- Aryans” (pp. 191-192 in (90)). The urban culture of the Bactria-Margiana Archaeological Complex (BMAC) was originally widely taken to offer the least bad candidate (7, 89, 90). Samples of aDNA from BMAC contexts, however, lack the expected Steppe ancestry, found only later.

To this, I add these golden words from CC Lamberg-Karlovsky (2004)

There is absolutely NO archaeological evidence for any variant of the Andronovo culture either reaching or influencing the cultures of Iran or northern India in the second millennium. Not a single artifact of identifiable Andronovo type has been recovered from the Iranian Plateau, northern India, or Pakistan.


In the main paper, they reject the steppe hypothesis for Indo-Iranian

In particular, in this hypothesis, Indo-Iranic, the major eastern branch of Indo-European,was one of the last two main branches to emerge, out of a final major clade with Balto-Slavic. Our results contradict this in both chronology and tree topology. Indo-Iranic branches off early, ~6980 yr B.P. (5650 to 8400 yr B.P.), and support for a common clade with Balto- Slavic is minimal, with a posterior probability of only 12.3%. Recent aDNA data from Central and South Asia have sought to trace movements of people into Western and South Asia by migrations southward from the steppe. However, for the period 4300–3700 yr B.P., samples from the Bactria-Margiana Archaeological Complex (BMAC) do not yet attest to any such southward migration (49). Steppe ancestry is not found until ~3500 yr B.P., in the Gandhara Grave Culture in northern Pakistan, and only at limited proportions (49). The interpretation that this ancestry can be identified with the first Indo-Iranic dispersal into South Asia (49) is not straightforwardly compatible with our earlier date for the separation of Indo-Iranic from the rest of Indo-European (~6980 yr B.P.). We also find that Indic and Iranic had diverged from each other already by ~5520 yr B.P. (4540 to 6800 yr B.P.). To reconcile this with a steppe origin would require an alternative scenario in which Indic and Iranic split from each other approximately two millennia before entering South Asia and Western Asia.

They conclude a trans-Iranian-Plateau route for Indo-Iranian  

Our hybrid hypothesis posits that out of this homeland south of the Caucasus, from ~8120 yr B.P., PIE began to diverge as early migrations split it into multiple early branches. One of these branches could have taken Indo-Iranic eastward far earlier than the Steppe hypothesis presumes, but in line with the linguistic chronology in Fig. 3, in which Indo-Iranic emerged as a distinct branch in the early phases of Indo-European divergence. Another main branch reached the steppe directly northward through the Caucasus ~7000 to 6500 yr B.P., compatible with one current interpretation of the aDNA record.

Here, I add that such migration has genetic evidence for SC Asia. In my previous post, I showed how the Tutkaul Neolithic ancestry from Tajikistan 6200 BCE was ~75% Siberian, but by 3600 BCE in Tajikistan, the Sarazm En individuals have a majority Iran Neolithic ancestry and only 20-25% Siberian-related ancestry indicating a gene flow from Iran between those two time periods. The spread of bread wheat from Iran to both India and SC Asia around 4000 BCE also alludes to some migration (Zhao et al 2023). The admixture date between Iran Neolithic and Andamanese-like ancestry in the Indus Periphery individuals was also around 4000 BCE (Narasimhan et al 2019) and the presence of significantly non-zero Anatolian ancestry in Indus Periphery individuals (Maier et al 2023) ensures that this ancestry entered post 6000 BCE via Iran, along with Iran Neolithic ancestry.


Accuracy of predictions

There are two figures to consult here. One is the MCC tree which presents just one 'least bad' topology of the many possible but with single estimates of split dates, ranges and posterior probabilities. The split dates in the Bayesian method are just the estimate of the earliest divergence in the proto-branch - this point is important to remember. The other is the DensiTree which presents a probability distribution of the various topologies. In my opinion, DensiTrees are perfect to gauge when the breakup of a proto-branch into two is complete. Below I will use both of these insights.

1. Break up of Celtic into Goidelic (Scottish Gaelic, Irish, Manx) and Brittonic (Welsh, Breton, Cornish) begins around 1200 BCE (95% CI range of 500 - 1900 BCE) as per the authors' model. This agrees with Patterson et al, 2021 who show ~50% ancestry from France entering Britain around 1000 BCE bringing early Celtic languages. These languages must have entered Ireland, Scotland and the Isle of Man soon after and started diverging from Brittonic.

2. Scandinavian sub-branch of Germanic started to split after ~750 CE into Icelandic, Faroese, Swedish, Norwegian etc. This matches well with the Viking expansion from 750 CE onwards who settled in Iceland, the Faroe Islands, Orkney etc and spread the Scandinavian language family.

3. The West-Germanic branch of Germanic starts splitting into Anglic (English, Old English) and Frisian around 300 CE (range not shown in the paper), which is only slightly before the start of the large-scale Anglo-Saxon settlements from NW Europe across the North Sea in Britain around 450 CE (Gretzinger et al 2022)

4. The split of the Proto-Romance language of the Italic family into Romanian, Sardinian, Italian, Portuguese, Spanish etc commenced between 200-500 CE which is in line with other estimates (Goldstein 2023) and is also consensus.

5. Balto-Slavic is shown to have formed a clade (Posterior Probability 0.63) with Italo-Celto-Germanic as part of a larger North-West-European language family. The split date of Balto-Slavic from the IE tree is said to be ~4500 BCE [95% CI range 3000-6000 BCE]. I definitely prefer the lower end of this range. A 4500 BCE date is too early.

The beginning of the Balto-Slavic split date into Baltic and Slavic is presented as ~1663 BCE (95% CI range of 531 - 3034 BCE). Given that Balto-Slavic modern populations require the specific B-S drifted sources (population bottleneck can cause large random drift) from Baltic_BA individuals (1200-500 BCE Estonia and Latvia) to model both groups, I feel that the lower end of that range is preferable. Although, the drift could have been in place much earlier than 1200 BCE since we don't have baltic samples from the prior period. This specific drift from Baltic_BA is not required to model other European IE groups like Germans, British, Irish etc.

6. According to the final tree, the breakup of proto-slavic begins around ~500 CE (95% CI range of ~200-800 CE) which is in line with mainstream consensus.

7. Anatolian and Tocharian are the first two to branch off, again following the consensus.

8. Armenian-Greek form a clade with a high posterior probability of 0.86, which is a mainstream view as well.

9. Within Indo-Iranian, Indic and Iranic branches begin separating by ~3500 BCE and are full separate by 3000-2500 BCE (based on DensiTree). This would correspond archaeologically to Indus Valley Civilization (Indo-Aryan) and Bactria-Margiana Complex (Iranic). IVC ancestry is present in all Indo-Aryan speakers from NW India to Bangladesh and Sri Lanka. Similarly, BMAC-related ancestry is present in all Iranic speakers from ancient Scythians and Sarmatians to modern Tajiks, Yaghnobis, Afghans and plateau Iranians.

Indo-Aryan branching into Dardo-Nuristani and Middle Indo-Aryan seems to be complete by around 800-600 BCE (visually, based on DensiTree) which seems acceptable, although I haven't seen much research on this specific split.

Within the Iranic branch, West Iranic branching away from East Iranic seems to have been completed by ~900 BCE, just in time when Assyrians and Urartians noted Persians and Medes in West Iran. Tablets of King Shalmaneser III, dated to ~840 BCE note the kingdoms of Parsua and Medes to the east of Assyria, the earliest direct reference to Iranic-sounding kingdoms (https://www.iranicaonline.org/articles/media)

This database included 7 ancient Iranic languages, and 2 ancient Indic languages, more than any used before (usually only Vedic and Avestan were includedbeforeo this paper) significantly increasing the power to accurately detect chronology within this grouping.


Some Problems


1. The tree concludes that Albanian forms a clade with Armeno-Greek to form a paleo-Balkanic sub-group (posterior probability 0.49). While such a sub-group is supported by many (eg see Hyllested & Joseph 2022), there is confusion as to whether Greek is closer to Armenian or Albanian within this subgroup. Overall, this is not a major problem. 

The problem is the early split of this group as per their tree. The Albanian split from Armeno-Greek started ~4500-4000 BCE and Armeno-Greek broke up ~3300 BCE [2000-4900 BCE]. How can this be explained by an early westward push from Zagros into Greece and Albania? While such a movement is borne out by aDna at least till Greece, why would Armeno-Greek remain together till 1000 years later in such a case? Albanian and Greek should have formed a sub-grouping rather than Armeno-Greek. They would have to posit two separate waves; one into Albania first, and then into Greece. Seems wonky if you ask me. At the same time, such an early branch via Anatolia into Greece at least explains why Greek and Anatolian branches don't form a clade.

In my opinion, the origin of this grouping in the steppe_eneolithic > Yamnaya/Catacomb is more likely. This steppe ancestry reaches the Balkans earlier than both Greece and Armenia (Lazaridis et al 2022) and can explain the early Albanian branch off. 

2. Their tree shows a Celto-Germanic clade (PP of 0.87). The mainstream view has supported an Italo-Celtic clade but with many dissidents (See Weiss (2022)). However, the tree in the paper shows an Italo-(Celto-Germanic) grouping with a PP of 1.0. The relatively short proto-Celto-Germanic branch means that these 3 can be seen as trifurcations from proto-Italo-Celto-Germanic. Extending this further, Balto-Slavic (BS) forms a clade with these three with a PP of 0.63, the branch length is relatively short again compared to a long proto-family branch so essentially its a quadfurcation from a NorthWest Indo-European stage (NWIE) into the 4 sub-branches. This is pretty much a consensus view (The interaction between steppe pastoralists and Cucuteni-Trypillia complex farmers around 4000 BCE could be responsible for this shared agri vocabulary in NWIE. See Kroonen et al 2022, Penske et al 2023. This provides additional evidence for a proto-NWIE stage still by 4000 BCE)


MCC Tree of NWIE
Maximum Clade Credibility Tree of NWIE (One of many 'least bad' topologies chosen by the authors)


NWIE DensiTree
NWIE DensiTree



3. Their system places the Nuristani branch within the Dardic branch of Indo-Aryan. This is not the mainstream view, which either places Nuristani as a 3rd branch of I-Ir or assumes a complicated history in which Nuristanis were Dardic speakers who came under intense contact with Iranic speakers and therefore their language became intermediate between Dardic and Iranic (Degener 2002). The authors address this mismatch in the supplement viz.

a) The tree shows Nuristani shares cognacy with Dardic but the Iranic elements are part of the phonology which are not captured by their system. ie the words are Indo-Aryan but the sounds are Iranic. 
b) There are cases of loanwords from Dardic neighbours into Nuristani not marked in their system as such because the languages of that region are poorly studied.

Overall, I don't think this is a big deal, the case of Nuristani is known to be complicated, with a lot of horizontal contact involved. But what this paper does tell us about Nuristani is that it shares the highest cognacy (in the basic 170-word dataset) with Dardic languages, not Iranic.


4. Position of Tocharian:
The very early split of Tocharian (~5000, 95% CI range of 3400-6600 BCE) is as problematic as previous claims of early Tocharian split after Anatolian, irrespective of timeframe. If Afanasievo or Andronovo were the sources of this language, Tocharian should have formed a clade with one of the NWIE languages or with Armenian/Greek (if the source is in the steppe). If these early dates are correct, the only source that I see fit for Tocharian is SC Asia/Caspian coast which survived as an isolated tribe and at some point made its way into the Tarim basin.

I am not in the camp that Tocharian is definitely a very early branch of PIE. Eric Hamp (2012) suggested Tocharian as being deeply nested somewhere within the NWIE cluster, close to Italic/Celtic. 
I am open to that idea, but can't comment more since it's not my area of study. Archaeology and genetics wise, I prefer an Andronovo source for Tocharian since the archaeological impact in Xinjiang (starting 1600 BCE) is infinite times more than Andronovo archaeological impact in South Asia (0 Andronovo sites or materials), genetic impact in Xinjiang is also much higher, with >30 R1a-Z2124 individuals genotyped (Kumar et al, 2022). Eric Hamp's nested topology for Toch is consistent with an Andronovo source for Tocharian.

Some other reasons for this Andronovo-Tocharian connection:
1. Andronovans were not Indo-Iranians, but the majority of them still spoke some kind of IE language given their steppe ancestry. What other IE language in the region is unaccounted for? Right - Tocharian!
2. Direct archaeological and genetic link between Andronovo and Xinjiang around 1500 BCE is evidenced. (Kumar et al, 2022).
3. The Andronovan chariots reached the Shang dynasty in China by 1200 BCE, Lubotsky (1998) notes Tocharian loanwords related to chariots, wheels and chariot gear in Old Chinese. The Tocharian-related language of Andronovans could explain these loanwords. 


Final Verdict


This is a remarkable paper, well-researched and each potential objection has been addressed either in the paper or in the supplement. I suspect this will be a high-impact paper of this decade. It recognizes various flaws of the steppe theory, most notably the Andronovo Indo-Iranian theory which posits that steppe pastoralists changed the linguistic and cultural landscape of the most populated region of the world (Indian subcontinent and Iran) without leaving a single archaeological material trace, unlike in Europe where Corded Ware, Bell-Beaker and Battle-Axe ancient cultures verify such a steppe expansion. And its solutions to these problems are quite acceptable - a trans-Iranian plateau spread of the Indo-Iranian branch. The conclusion of the CHG/Iran Neolithic 'tracer dye' ancestry as the marker of IE spread has now gained a lot of currency after similar conclusions by Lazaridis et al 2022; Wang et al 2019; Reich (2018); Krause & Trappe (2022).

At the same time, some of the findings need a better explanation, especially - Tocharian and the Armenian/Greek/Albanian branches. The deep dates for these branches imply that these speakers were already in their respective regions long before the first attestation of these languages. Just feels like a tight case has not been made for these in terms of genetics and archaeology.


REFERENCES

Eric Hamp, “The Expansion of the Indo-European Languages” Sino-Platonic Papers, 239 (August 2013)

Hyllested, A., & Joseph, B. (2022). Albanian. In T. Olander (Ed.), The Indo-European Language Family: A Phylogenetic Perspective (pp. 223-245). Cambridge: Cambridge University Press. doi:10.1017/9781108758666.013

Patterson, N., Isakov, M., Booth, T. et al. Large-scale migration into Britain during the Middle to Late Bronze Age. Nature 601, 588–594 (2022). https://doi.org/10.1038/s41586-021-04287-4

Gretzinger, J., Sayer, D., Justeau, P. et al. The Anglo-Saxon migration and the formation of the early English gene pool. Nature 610, 112–119 (2022). https://doi.org/10.1038/s41586-022-05247-2

Vikas Kumar et al. ,Bronze and Iron Age movements underlie Xinjiang population history.Science376,62-69(2022).DOI:10.1126/science.abk1534

Degener, Almuth. "The Nuristani languages." (2002).

Iosif Lazaridis et al. ,The genetic history of the Southern Arc: A bridge between West Asia and Europe.Science377,eabm4247(2022).DOI:10.1126/science.abm4247

Kroonen G, Jakob A, Palmér AI, van Sluis P, Wigman A (2022) Indo-European cereal terminology suggests a Northwest Pontic homeland for the core Indo-European languages. PLOS ONE 17(10): e0275744. https://doi.org/10.1371/journal.pone.0275744

Reich, David. Who we are and how we got here: Ancient DNA and the new science of the human past. Oxford University Press, 2018.

Wang, Chuan-Chao, et al. "Ancient human genome-wide data from a 3000-year interval in the Caucasus corresponds with eco-geographic regions." Nature communications 10.1 (2019): 590.

Krause, Johannes, and Thomas Trappe. A short history of humanity: A new history of old Europe. Random House Trade Paperbacks, 2022.

Lubotsky, A. M. "Tocharian loan words in Old Chinese: Chariots, chariot gear, and town building." The Bronze Age and Early Iron Age Peoples of Eastern Central Asia (1998): 379-390.

Lamberg-Karlovsky, Carl C. "Archaeology and language: the case of the Bronze Age Indo-Iranians." The Indo-Aryan Controversy (2004): 142-177.

Goldstein, David M. "Divergence-time estimation in Indo-European: The case of Latin" Diachronica, 2023

Weiss, M. (2022). Italo-Celtic. In T. Olander (Ed.), The Indo-European Language Family: A Phylogenetic Perspective (pp. 102-113). Cambridge: Cambridge University Press. doi:10.1017/9781108758666.007

Penske, S., Rohrlach, A.B., Childebayeva, A. et al. Early contact between late farming and pastoralist societies in southeastern Europe. Nature (2023). https://doi.org/10.1038/s41586-023-06334-8

Zhao, X., Guo, Y., Kang, L. et al. Population genomics unravels the Holocene history of bread wheat and its relatives. Nat. Plants 9, 403–419 (2023). https://doi.org/10.1038/s41477-023-01367-3

Narasimhan VM, Patterson N, Moorjani P, et al. The formation of human populations in South and Central Asia. Science. 2019;365(6457):eaat7487. doi:10.1126/science.aat7487

Maier R, Flegontov P, Flegontova O, Işıldak U, Changmai P, Reich D. On the limits of fitting complex models of population history to f-statistics. Elife. 2023;12:e85492. Published 2023 Jun 29. doi:10.7554/eLife.85492

208 comments:

«Oldest   ‹Older   201 – 208 of 208
Orpheus said...

Also on a sidenote, CHG in Yamnaya is mostly male-mediated. See Lazaridis on twatter
https://nitter.net/iosif_lazaridis/status/1563953730499878926

So there's no contradiction between male-biased migrations and a potential language shift, especially if coinciding with new technology arriving on the steppe which transformed its lifestyle and even religion. (Which it did, see Khvalynsk paper)

Is this definite proof of anything? Probably not. Is it a very strong indication of what happened, given recent papers as well? Yes.
Is it also a very strong indication of what did NOT happen? Also yes. That's simply what the new data that comes out show

Kavi said...

You know, people have to understand the level of expertise and knowledge in this field is actually very low, dismally low, even within within 'professional' spaces.

I actually just realised ALL the PCAs in this space are WRONG. The PCAs generated in this field are based in snps, they are actually PCAs of snps not of actual genetic distance.

The CORRECT was to run a PCA is to get meaningful genetic distance measures such as FST, F2 or outgroup f3 in a matrix form, and then run PCA on that data.

The current 'maintstream' way of running PCAs what it does is it produces is it looks at snp variation across the sample space, and aims to reduce this variation in terms of correlated snp. This can be useful when learning about the process of allele mutation and frequency change and different alleles are correlated and changing over time.

But it doesnt give you real population dynamics, it just gives you allele dynamics. So for instance it gets massively skewed by selection of important or useful alleles across populations, and it gives that process more weight than total drift.

These PCAs wont be consistent with other tools like qpGraph, fstats or anything else.

So you have to understand the level of 'science' is woefully inadequate and most of this needs re-evaluation.

Kavi said...

These PCAs are meant to analyse the relationships between SNPs not Populations/individuals.

Tryormaster said...

Did the taurine cattle domestication happened after Iran HG reached India?

genome said...

@Rob mutts davidski come with a real username

andrew said...

"*Note, in the paper the authors use dates in BP (before present) where present = 2000 CE. While I appreciate the strictly secular nomenclature, I prefer BCE dates as they are easier to comprehend for my brain (for now). So I convert these into BCE by subtracting 2000. eg. 6000 BP = 4000 BCE. Do keep this in mind while reading. Thank you."

It is a minor point (I've commented elsewhere at length on the paper itself and don't need to do that again here several months later), but in archaeology, historical linguistics, and genetics, "BP" is defined as a matter of academic convention to mean years prior to 1950 CE, so that you don't need to consider the date of the publication to make BP dates in different papers comparable to each other.

Bronze said...

Rob and Davidski are pathetic and wrong about everything.

Its pretty much a done deal by now, that there was a massive CHG expansion/invasion from the south (Iran or further south) and it was completely male mediated. indian/iran related males invaded the Steppe and killed many of the locals men and impregnated their women, giving rise to the corded ware and sintastha populations.

R1a is not native to the steppe, and neither is J1.

archlingo said...

hello,
neither the Gray team nor this author have any knowledge of the definition of "Glottochronology". GC is just a subdisciplin of lexicostatistics by adding the aim of getting a time scale. It must not be confused with the different approaches to GC, as done here. Hans J.J.G. Holm

«Oldest ‹Older   201 – 208 of 208   Newer› Newest»