Monday, November 29, 2021

The West/East Divide among Indo-Aryan languages

Came upon two linguistic papers which clarify the relationship of Aryan languages of India with the other old language families, namely Dravidian & Munda families.

india language family map
Language Family map of India

Saturday, November 27, 2021

A Reply to the Anthrogenica clique's criticism of my Steppe Eneolithic Post

I recently wrote a post in which my analysis showed that The same ancestors which provided iranian like ancestry to Irula tribals also provided ancestry to Steppe Eneolithic as well as South Central Asia (Sarazm aDna etc). One can read the post here.

For feedback as well as spreading the post, I posted the link to a popular DNA & population genomics forum - Anthrogenica.  Quite unexpectedly, immediately after, I was suspended from Anthrogenica with no reason or message whatsoever. I always knew it is a Kurganist bastion, but never quite expected this level of censorship to opposing ideas. I am glad though, starting this blog is now worth it. Also, credit to Davidski at Eurogenes blog for allowing me on his blog comments section even with my opposing views. 

I came to know that the Moderator who banned me is a handle named Coldmountains, a handle who I have in the past ridiculed quite a lot (in Eurogenes comment section) for not finding a single R-L657 indian y haplogroup in the steppe since 2015. Poor guy comes empty handed after each successive paper when new samples from the steppe are published. His search still goes on. Meanwhile the only L657+ sample we have so far in aDna is from Roopkund lake India 800CE.

The link to my Anthrogenica thread is here. Please register and show Anthrogenica some love in this thread and elsewhere. The moderator clique there is in an echochamber and needs some awakening.

Anyway, let move on to the criticism of my post. There is just one, and sadly i couldn't reply because I was banned. Hence this post.

Kale on 25-Nov-2021 wrote

Kotias is a pseudo-haploid sample > That means rather than having two different sets of chromosomes like a real person, it is treated as having two exactly identical sets > That means the drift going to itself it going to be crazy high > If you have an edge coming out of an artificially crazy high drift, the percentage contribution has to be artificially crazy small to avoid overfitting.

This graph is completely uninformative until structured properly.

Kale is absolutely wrong here. The pseudo-haploid* samples do not cause artificial high drift edges, rather, the artificially high drift is due to just 1 sample in the label because of which heterozygosity cannot be computed for the label. This problem is solved by using 2 samples in the label even if samples are pseudo-haploid. This is not a problem for .DG samples as these are diploid genomes and allow for heterozygous calls.

This is exactly what I have done in the graph below (later). I lumped Satsurblia & Kotias into 1 label known as CHG. I will show that my conclusion does not change.

Proof of my claim is from the programmers of Admixtools in their qpGraph readme pasted below. Should have been basic reading right?

Genotypes are expected to be pseudo-haploid -- 2 samples at least per population or drift lengths on leaves are not meaningful.  

As far as edges coming out of artificially high drifts are concerned, sister clades of Kotias also did not help Kales case. See, i spent weeks on the model trying every possibility. Them not being able to read the graph is  not my problem.

Below I will paste my new qpGraph for Steppe Eneolithic which follows these principles and should be acceptable to the Kurganists as well.

  1. Worst residual ZScore below 3.
  2. CHG label now has 2 samples and therefore allows for heterozygous calls.
  3. Each admixture node has a drift edge following it rather than an immediate admixture edge. 1 or 2 admixture edges in my graph follow an immediate admixture node because the drift edge length was 0 (hence i omitted them)
  4. Non 0 drift edges implying that the edge is a true one. (This is not a strict need. qpGraph disallows immediate admixture edge if the admixed node is labeled as a number. But qpGraph allows the admixture node to be a source to another node if the label given to it is alphanumeric. This is useful if multiple admixtures together are to be modeled.)

Please click on the graph for high res mobile view. On Desktop download image and zoom in a zoomable picture viewer.

Steppe eneolithich qpgraph



DISCUSSION

After correcting all criticisms, the need for IndiaN component in Steppe eneolithic does not go away. I again prove that the same ancestors who ultimately provided ancestry to Steppe Eneolithic in 5th mil BCE also provided ancestry to Irula tribe (and by extension most of indian groups). The minute criticisms which Kurganists come up with are immaterial now, because of course they will come up with them. So far, they have been busy denying even Iranian inflow into steppe (its a mater of purity of course!), so to accept South/ SC Asian origin is a different matter altogether.

I conclude that Steppe_En is 
60% EHG + 12% CHG + 22% IndiaN related + 6% IranN related +- std errors

As I stated in my previous post:

Where this IndiaN source lived in 6th millenium bce is unknown yet, but in the vicinity of NW south asia and SC asia will be a good guess. Could be either early Mehrgarh culture or Jeitun culture but we need ancient samples from these cultures.


*pseudo-haploid genome = randomly sampling 1 allele from a biallelic SNP (Single Nucleotide polymorphism) marker.
Diploid genome = sampling both alleles of the biallelic SNP

Heterozygosity = measure of genetic variability in a population (cannot be computed with single haploid genome)

Wednesday, November 24, 2021

South & SC Asian neolithic ancestry in Steppe Eneolithic

I have been working on this project for 3 weeks, and results are nothing less than spectacular! But for those who follow my comments on other blogs, this is hardly surprisng.


In 2019, 3 very important human remain samples from the Caucasus region were published1. 2 of these samples were from a site called Progress II, and 1 from a site known as Vonjucka. Both these sites are just north of the caucasus mountains in now Southern Russia. The carbon dating of these samples shows that the oldest lived around 4900 BCE and the newest around 4200 BCE.

Monday, November 22, 2021

Open Thread - View on Bitcoin/cryptos?

So, Bitcoin 100k or 10k? Any other cryptos to consider? Which other samples do you want me to explore with qpAdm/qpgraph?


R-L657 poll results so far

Lets keep the votes coming :)

Where did R-L657 Originate
 
pollcode.com free polls

Saturday, November 20, 2021

Scratching my head! - Criticism of Narsimhan VM et al 2019 - Pt 1

In this post, I will start getting my thoughts about the seminal SC Asian paper organized in written form. I have had these stored in my head for 2 years. The data in the paper is absolutely fantastic, and we get the first aDna from South Asia ie Swat valley. But there are so many conclusions and data analysis choices made by the authors that left me scratching my head, so much so that this post got delayed because I just have so much to say but don't know where to begin.

Wednesday, November 17, 2021

Integrating raw DNA data into aDna database for use in Admixtools

Representative example of a user's data being merged into Eigenstrat format for Admixtools


I have gotten quite a few enquiries already from people wanting to merge their raw data (from AncestryDNA, FTDNA, LivingDNA, MyHeritage or 23andMe) into the aDna eigenstrat database so they can use it for qpAdm analysis of their own after reading my qpAdm 101 tutorial. I want to help, but converting merging etc is a time consuming process and a bit complicated.

I could possibly do it for 2-3 people a day. In case some people don't have linux or don't want the hassle of running the programs themselves, I could even run 4-5 models suggested by them for a fee.

Let me know in the poll and also the comments if its something you guys will be interested in. With the virus lockdown affecting my business sector, I have got some extra time on my hands.

Update: 

Since I already have 2-3 people wanting their data merged and get the eigenstrat file for personal use, I have decided to go ahead and open it up for all.


Saturday, November 13, 2021

Are Bactrian (BMAC) horses not horses?

Librado, P., Khan, N., Fages, A. et al. The origins and spread of domestic horses from the Western Eurasian steppes. Nature 598, 634–640 (2021). https://doi.org/10.1038/s41586-021-04018-9

This paper was published in October 2021. The following is the abstract:
Domestication of horses fundamentally transformed long-range mobility and warfare. However, modern domesticated breeds do not descend from the earliest domestic horse lineage associated with archaeological evidence of bridling, milking and corralling at Botai, Central Asia around 3500 BC. Other longstanding candidate regions for horse domestication, such as Iberia and Anatolia, have also recently been challenged. Thus, the genetic, geographic and temporal origins of modern domestic horses have remained unknown. Here we pinpoint the Western Eurasian steppes, especially the lower Volga-Don region, as the homeland of modern domestic horses. Furthermore, we map the population changes accompanying domestication from 273 ancient horse genomes. This reveals that modern domestic horses ultimately replaced almost all other local populations as they expanded rapidly across Eurasia from about 2000 BC, synchronously with equestrian material culture, including Sintashta spoke-wheeled chariots. We find that equestrianism involved strong selection for critical locomotor and behavioural adaptations at the GSDMC and ZFPM1 genes. Our results reject the commonly held association between horseback riding and the massive expansion of Yamnaya steppe pastoralists into Europe around 3000 BC driving the spread of Indo-European languages. This contrasts with the scenario in Asia where Indo-Iranian languages, chariots and horses spread together, following the early second millennium BC Sintashta culture

I have a serious problem with the emphasized part, and the emphasis is mine. So I exchanged mails with the authors. They are pasted below. Thoughts in the comments will be appreciated.

Thursday, November 11, 2021

HOW TO Part 2:: How to test models using qpAdm - qpAdm 101

qpAdm is a statistical tool for studying the ancestry of populations with histories that involve admixture between two or more source populations. I hope this tutorial helps people interested in doing own research as qpAdm validated models which are constructed well provide a high standard of evidence to the model.

qpAdm is used to validate models that are fed into it by the user. The details that need to passed to the qpAdm program are as follows.

  1. Target population
  2. List of 2 or more source populations 
  3. List of Right populations or Right Pops.
  4. The populations in 1 & 2 are together called Left Populations or Left Pops and the first population in this list is considered as target population by qpAdm.
  5. The first population among the right pops has to be a basal population (Outgroup) and usually an african population like Mbuti, ShumLaka or Mota etc is chosen for this purpose.

A standard example of a qpAdm model is 


 Target population (Target) = source population 1 (Source 1) + source population 2 (Source 2) 

The qpAdm output will contain a p-value (also called tail probability or tailprob), admixture coefficients x & y for Source1 and Source2 respectively such that x+y = 1 (or 100%) and standard errors for those coefficients. 

Monday, November 8, 2021

Brahmins from NW india have ~15% frequency of European mtDna - New paper

From:
Gagandeep Singh, Srinivas Yellapu, Harkirat Singh Sandhu, Indu Sharma, Varun Sharma & AJS Bhanwer (2021): Genetic Characterization of the North-West Indian Population: Analysis of Mitochondrial DNA Control Region Variation, Annals of Human Biology, DOI: 10.1080/03014460.2021.1879933


The authors sampled 197 total people of NW India from the Jat Sikh, Bania, Khatri, Brahmin & SC castes. One of the main findings is below.

North-West Indian population groups had a total of 55.86% of samples characterised as belonging to South Asian ancestry haplogroups (M, U2, U4), followed by West Eurasian (40.18%, H, HV, I, J, K, N, R, R0, T, U1a, U5a, U7, U8a, W, X) and East Asian (3.96%, A, B, C, D, F, G ) (Fig. 1).

The below analysis is mine, after studying the raw data in the paper.

Stepwise tutorial to Install Admixtools & Plink

 I will make 3 separate posts on how to install Admixtools, run qpAdm and run qpGraph respectively. This first blog post will detail steps to successfully install Admixtools by Harvard lab.

STEP 1.  Have a Linux system

Admixtools cannot be run from a Windows system. If you have a Windows or MacOS Host system, install Oracle VirtualBox from here. Its free.

Install any Guest linux system on this VBox. I use OpenSuse Leap v15.1 for no particular reason apart from that I'm used to it. The virtual image for installation can be downloaded for free from here.

Once this is done, install the guest Linux system on the VBOX. This video below can guide you. 


In the Virtual box settings (can only edit it when Guest OS is shut down), make sure to use 2 processors, 3d acceleration enabled in display settings and I would suggest using 4GB RAM. Use VBoxSVGA as Graphics controller in display settings so that you can get full screen display in guest OS.

Saturday, November 6, 2021

Wheels, Languages and Bullshit (Or How Not To Do Linguistic Archaeology) - J.S. Morris

Came across a paper on IE Linguistics by Jonathan Sherman Morris. I found it extremely humorous. Its a long read but covers a lot of criticisms of the way linguistic archaeology has been used to serve the authors purposes, especially authors like David Anthony. The following quotes are from this paper. 

horse wheel and languages

Friday, November 5, 2021

East, West & Basal Eurasian components in select ancient samples from the paleolithic and neolithic

I am working on some qpGraphs to figure out the different Iran farmer like ancestries in Ganj Dareh, CHG, TepeAbdulHosein, Hotu cave, Wezmeh cave, SC Asia (Sarazm) and India etc. Its taking longer than I expected because of many populations adding complexity. 

So I thought i would use the graphs already made to shed some light on some Eurasian samples. When I publish qpGraphs please keep some things in mind while analyzing.

Tuesday, November 2, 2021

Ancient ancestral south indians (AASI) came from the east, not straight from Africa!

From Hallast, P., Agdzhoyan, A., Balanovsky, O. et al. A Southeast Asian origin for present-day non-African human Y chromosomes. Hum Genet 140, 299–307 (2021). https://doi.org/10.1007/s00439-020-02204-9 

A very good paper which throws some light on the post out of africa peopling of Eurasia, based on analysis of Y- haplogroups of ancient as well as modern samples.
Here, we show that phylogenetic analyses of haplogroup C, D and FT sequences, including very rare deep-rooting lineages, together with phylogeographic analyses of ancient and present-day non-African Y chromosomes, all point to East/Southeast Asia as the origin 50,000–55,000 years ago of all known surviving non-African male lineages (apart from recent migrants). This observation contrasts with the expectation of a West Eurasian origin predicted by a simple model of expansion from a source near Africa, and can be interpreted as resulting from extensive genetic drift in the initial population or replacement of early western Y lineages from the east, thus informing and constraining models of the initial expansion.
Presence of haplogroups C, D and F in 2302 present-day samples.
Presence of haplogroups C, D and F in 2302 present-day samples. The map demonstrates how many of the three haplogroups of interest (none, one, two, or all three) were found in different areas of the Old World and Near Oceania. Black dots indicate the locations of the studied populations