search this blog

Thursday, October 23, 2014

Ancient DNA from Iron Age and Medieval Poland

A new paper at PLoS ONE featuring ancient mitochondrial (mtDNA) data from Wielbark, Przeworsk and early Slavic remains argues for matrilineal continuity in present-day Poland since the Iron Age. It's actually based on a thesis that I blogged about more than two years ago (see here). However, it does include some fresh insights, so it's worth a look even if you read the thesis. RoIA stands for Roman Iron Age.

Three modern populations or groups of populations (Lithuanians and Latvians, Poles, and Czechs and Slovaks) were found to contain significantly higher percentages (p,0.05) of shared informative haplotypes with the RoIA samples compared to other present-day populations (Figure 2, Table S4). Notably, modern Poles shared the highest number (nine) of informative mtDNA haplotypes with the RoIA individuals.


Of particular interest are three RoIA samples assigned to subhaplogroup H5a1, which were recovered from the Kowalewko (sample K1), the Gaski, and the Rogowo (samples G1 and R3) burial sites (see Figure 1). Recent studies on mtDNA hg H5 have revealed that phylogenetically older subbranches, H5a3, H5a4 and H5e, are observed primarily in modern populations from southern Europe, while the younger ones, including H5a1 that was found among RoIA individuals in our study, date to around 4.000 years ago (kya) and are found predominantly among Slavic populations of Central and East Europe, including contemporary Poles [15]. Notably, we also found one ME sample belonging to subhaplogroup H5a1 (sample OL1 in Table 3). The presence of subclusters of H5a1 in four ancient samples belonging to both the RoIA and the ME periods, and in contemporary Poles, indicates the genetic continuity of this maternal lineage in the territory of modern-day Poland from at least Roman Iron Age i.e., 2 kya.


The evolutionary age of H5 sub-branches (,4 kya) [15] also approximates the age of N1a1a2 subclade found in the RoIA population (sample KA2) (Table 2). The coalescence age of N1a1a2 is around 3.4–4 kya, making this haplotype one of the youngest sub-branches within hg N [52]. The N1a1a2 haplotype found in one RoIA individual was classified as unique because no exact match was found among the twelve comparative populations or groups of populations used in the haplotype sharing test. Notably, a similar N1a1a2 haplotype carrying an additional transition at position 16172 was found in a modern-day Polish individual [53].

I suspect the publication of these results at this time, so many months after they were first revealed in the aforementioned thesis, is part of an effort to drum up interest and secure funding for a new project on the genetic history of Greater Poland, which was announced late last year (see here). I say that because one of the people organizing the project, Janusz Piontek, is also listed as a co-author on this paper. So if we're lucky we might soon see full genome sequences from a few of these Iron Age and Medieval samples.


Juras A, Dabert M, Kushniarevich A, Malmstro¨m H, Raghavan M, et al. (2014) Ancient DNA Reveals Matrilineal Continuity in Present-Day Poland over the Last Two Millennia. PLoS ONE 9(10): e110839. doi:10.1371/journal.pone.0110839

Monday, October 6, 2014

The power of imputation

The latest version of the Affymetrix Human Origins genotype dataset, published last month along with Lazaridis et al. 2014, is an awesome resource for population genetics (see here). However, it lacks Polish samples, which is a major drawback as far as this blogger is concerned.

Hopefully this oversight is corrected soon. In the meantime, I decided to include 15 Poles from the Eurogenes Project dataset in my copy of the Human Origins. But in order to do that I first had to impute around 460K genotypes for each of these people.

Imputing so many markers might sound pretty crazy, but it's actually very doable, especially for genetically homogeneous groups with relatively low haplotype diversity, like the Polish population. I used BEAGLE 3.3.2 for the job, mostly because I'm familiar with it, but also because it's quick and accurate.

My reference panel included 1090 individuals, most of them shared by Eurogenes and Human Origins, and just over 1 million markers. Only around 130K of the markers were shared by the two datasets, but well over 50% of the 1 million genotypes were observed in each of the Poles. This meant that I was imputing sporadically missing data, which is certainly a more sensible strategy than attempting to fill in long stretches of empty calls.

Everything seems to have worked out just fine, and the proof is in the pudding. Below are two Principal Component Analyses (PCA) featuring the Poles alongside 50 samples from the HGDP. The first PCA is based on observed genotypes, while the second on markers that were imputed into the Polish genomes. PCA are very sensitive to artifacts like genotyping errors, but as you can see, there's very little difference between these results. Also, keep in mind that the SNPs used in the Human Origins were specifically chosen for population genetics, while those in the Eurogenes dataset come from chips mostly designed for commercial ancestry and medical work.

Also, here's a PCA based on more than 300K SNPs, both observed and imputed in the Poles, featuring all of the West Eurasian samples from the filtered version of Human Origins, as well as the 15 Polish individuals. Note that the Poles cluster more or less between the Czechs and groups from the East Baltic region, and overlap most strongly with Belarusians, which makes sense.


Brian L. Browning, Sharon R. Browning, A Unified Approach to Genotype Imputation and Haplotype-Phase Inference for Large Data Sets of Trios and Unrelated Individuals, AJHG, Volume 84, Issue 2, p210–223, 13 February 2009, DOI:

Lazaridis et al., Ancient human genomes suggest three ancestral populations for present-day Europeans, Nature, 513, 409–413 (18 September 2014), doi:10.1038/nature13673