nature.com

Investigating the genetic characteristics of the Csangos, a traditionally Hungarian speaking ethnic group residing in Romania - Journal of Human Genetics

  • ️Melegh, Béla
  • ️Sat Jul 11 2020

Introduction

Csangos are a nonhomogeneous ethnic group living mostly in Moldavia, Romania in East–Central Europe. Their census size is ~230,000 individuals, and population density of Csangos is the highest in Bacau County [1]. Their traditional language is Csango, an archaic Hungarian dialect, which recently became severely endangered due to language shift. Today only about 43,000 Csangos speak their traditional language, and they are typically from the elder age groups of the Csango community [1]. The majority of the Csango people today speaks natively Romanian, the official language of Romania.

The self-identity of Csangos was always based on their culture, on their religion, and language. These suggest that Csangos have Hungarian origins and arrived in Moldavia from the west, which is generally accepted by many experts from Hungary and from Romania [2, 3]. Their ancient culture features indeed remarkable connections to Hungarians, especially regarding the similar characteristics of their music, and the fact that the language of their folk songs is Hungarian. As Catholicism is the main religion of Hungary, religion of the Csangos is also Roman Catholic, in strong contrast of the predominantly Orthodox Romania [4].

Regarding the origin of Csango people, there are many theories since the 19th century. These theories form two groups according to whether the ancestors of Csango people arrived in the historical region of Moldavia before or after the Mongol invasion of East and East–Central Europe, which occurred in 1241–42 [5]. However, it is more likely that the ancestors of Csangos derive from after the Mongol invasion [5]. The Hungarian origin of Csangos is accepted by most Romanian and Hungarian scholars, but it is debated whether Csangos originate either from the Kingdom of Hungary or from the Szekler (or Székely) people. Szeklers are also Hungarian speaking and ethnographically a Hungarian group living in the middle of Romania (Székely Land) with Hungarian origin [6, 7]. The most probable theory is, supported by history, linguistics, cultural anthropology, and toponymic evidences, that Csango ancestors began to immigrate to the region of Moldavia in the 14th century [5]. The Hungarian king, Louis I (or Louis the Great) defeated and driven out the Mongols from Moldavia in 1345 and supported the foundation of a Moldavian Principality [8, 9]. The economic attractiveness of the new principality induced immigration from the Kingdom of Hungary into Moldavia [5].

Previous genetic studies on Hungarian speaking ethnic groups are based on the investigation of Y chromosomal haplogroups, mitochondrial DNA (mtDNA) haplogroups, and gene frequency analysis based on nuclear loci.

The gene frequency analysis of eight Hungarian speaking ethnic groups considered 24 blood group and red cell enzyme genes. Gene frequencies were analyzed in association with Hungarian samples and with samples from non-Hungarian groups that are ancestral populations of Hungarians and Hungarian ethnic groups at various extent. The study showed that Csangos are closest to a Hungarian ethnic group living in the middle of today’s Hungary in the Kiskunság [10]. This Hungarian ethnic group has connections to a Turkic people, the Cumans, who arrived three centuries after the conquering Hungarians due to the Mongol invasion and due to colonization processes after the invasion [10]. The study also pointed out that after the Slavs and Germans, the Iranians and Turks also are major contributors to the ancestry of the Hungarian ethnic or ethnographic groups, especially in case of the Csango and the Szekler people, who seems to have a somewhat similar genetic pool. The evidences of the study suggested that these groups could descend from people left behind by the conquering Hungarians, therefore might also support that Csangos arrived in the historical area of Moldavia before the Mongol invasion [10]. The study based on mtDNA analysis showed that there are notable differences between the Szekler and the Csango people despite their genetic similarities [11]. From the 198 investigated haplotypes, only 21 were shared between the two ethnic groups. Further analyses in this study pointed out that gene flow between these populations were asymmetric, migration from Csango to Székely regions were more likely, but it could occur much less vice versa. This study supported the hypothesis that Csangos originate from the territory of the Kingdom of Hungary (from the times after the Mongol invasion) rather than from Szekler settlements [11]. The study also points out that the genetic isolation of Csangos is significant compared to the Szeklers and also to other regional populations. They reported that the R0 and U haplogroups were significantly lower in Csangos than the expected values according to average European standards. The study also highlights the similarity of Csango and Szekler gene pool in the sense that both include Central-Asian genetic elements [11].

A study based on Y haplogroup analysis showed that the Central/Inner Asian ancestry of Csangos could be 6.3%, which is somewhat higher than the ancestry derived from Central Asia in case of Hungarians, which was 5.1% [12]. This higher rate could be the result of the above-mentioned isolation of Csangos who are therefore less admixed with neighboring populations compared with Hungarians. However, the study mentions also that this Central-Asian ancestry is not necessarily derived directly from Central-Asian populations, but also indirectly through Caucasus populations, East Slavs, and Middle East.

An investigation implementing autosomal haplotype analysis and examining the admixture events involving the people of European countries showed that Hungarians have a 4.4% ancestry from Central Asia, which was the highest among the studied populations [13]. In comparison, the Central-Asian ancestry component was found 2.5% in case of Romanians, 2.3% in case of Bulgarians and Lithuanians, 1.9% in the case of Poles, and 0% in Greeks. Belarusians showed a somewhat higher proportion, which was 3.6%. The study stated that this trace of Central-Asian ancestry component could be the result of the invasion of people from the Asian steppes also including the ancestors of Hungarians.

According to historical literature, supported by a limited number of archeological findings, the proto-Hungarians migrated from the Eurasian steppe among other nomadic groups in the early Middle Ages [14,15,16,17]. They have a common history with Turkic tribes, which invaded and conquered the Eastern regions of Europe, such as the Khazars, who established their long lasting (c. 650–969) khaganate in East Europe. A Khazar group called Kabars even joined to the migrating proto-Hungarians tribes [18]. Other Turkic groups, such as the Bulgars, Pechenegs, Kipchacks, and the previously mentioned Cumans also share a common history with them [19,20,21]. Based on these results of historical and genetic studies, we attempted to estimate the genetic similarities of Csangos and European populations and to investigate possible Turkic ancestries in Csangos and investigated populations.

Here we analyzed the genome-wide autosomal SNP array data from 30 Csango individuals, genotyped on Affymetrix 1M platform, and 665 samples from distinct European and Turkic groups. We used the datasets at our disposal to represent distinct Eurasian populations. Various allele frequency-based population structure, ancestry, and admixture estimating algorithms, haplogroup, and linkage disequilibrium-based software were used attempting to better understand the genetic characteristics, the ancestry, and identity of the Moldavian Csango people.

Materials and methods

Datasets

We genotyped here 30 Csango individuals on the Affymetrix Genome-Wide SNP Array 6.0 platform. DNA was extracted from leukocytes using anticoagulated blood samples. The genotype calling and quality control were carried out using the Affymetrix Power Tools command line software bundle. Raw genotype calls were converted to binary PLINK format using a domestic script and also PLINK1.9 [22, 23], and were filtered further according to genotype data missingness in all samples. SNPs with missing genotypes were removed from the data using the “geno” flag of PLINK1.9. The missing call frequency threshold was set to 0.1. Genetic distances were also added to the marker data with PLINK using the HapMap Phase 2 GRCh37 genetic map, which allowed to carry out the tests based on linkage [24]. The obtained dataset contained 868,130 SNPs.

All participants gave their written informed consent to participate in this study. They all got personal verbal information prior their signed consent, which was approved for this study by the Regional Research Ethics Committee of Pécs. The samples were anonymized. This study follows the principles expressed in the Declaration of Helsinki.

Further ethnic groups investigated in this study were obtained from various public sources. These public data were the International Haplotype Map Phase III dataset (HapMap, n = 1115 from 11 populations, genotyped on Illumina Human1M and Affymetrix 1M platforms) [25], the CEPH-Human Genome Diversity Panel data (HGDP, n = 1043 from 52 populations, 660,918 SNPs genotyped on Illumina 650Y array) [26, 27], and various datasets from the public repository of the Estonian Biocentre [28,29,30,31,32,33]. HapMap data were used to place the Csangos, Hungarians, and Romanians in a worldwide context by our PCA-based population structure analyses. HGDP data and datasets from the Estonian Biocentre were applied widely in all investigations and represented the European region and Turkic groups from all over Eurasia. HGDP and Estonian Biocentre datasets were applied to study the European ancestry of the Csango people and to investigate their possible Turkic ancestry considering Eurasian Turkic groups in our tests. East–Central Europeans, most of the Turkic groups and some other European populations were extracted from the datasets of the Estonian Biocentre. Some of the West European populations and one of the Central-Asian Turkic groups was acquired from the HGDP dataset (Supplementary Table 1).

Inferring population structure and ancestry

We implemented three different methods in order to investigate the relationship between Csangos and the considered populations listed in the previous section. We performed principal component analysis (PCA) on the data applying SMARTPCA released with the EIGENSOFT 6.01 software package [34]. Fixation index or pairwise average allele frequency differentiation (Fst) values was also computed using the SMARTPCA software. The STRUCTURE-like maximum likelihood estimation-based (ML) clustering method, implemented in ADMIXTURE 1.22 was also applied in order to infer ancestry distribution in the investigated ethnic groups in a perspective of a number of common hypothetical ancestral groups [35]. We also computed the ML tree of investigated populations based on genome-wide allele frequency data using the TreeMix 1.13 software, which is able to visualize population splits, admixture and also able to infer migration processes occurred in the past [36].

According to the followings, we created three distinct datasets for PCA and ADMIXTURE analyses. The first dataset contained the Csango samples and all of the HapMap populations. LD-based pruning function of PLINK1.9 was applied because background LD can affect the PCA and also ML-based ancestry analyses. We set the pairwise genotypic correlation variable r2 to 0.4. The size of the sliding window was 50 SNPs with a sliding of 5 SNPs at a time. The resulting dataset with the Csango and the HapMap samples counted 105,574 SNPs. The second datasets contained the Csangos and populations all over Europe. East–Central Europe was represented by Bosnians, Bulgarians, Croatians, Hungarians, Kosovans, Montenegrins, Romanians, Serbians, Slovakians, and Slovenians. The East-European group consisted of Belarusians, Lithuanians, Poles, Russians, and Ukrainians. South Europeans were the Greeks, Italians (including the North Italian HGDP group) and Macedonians. Sweden and Latvian samples represented North Europe. Turkish samples from the Anatolian Peninsula was also added to these tests. In the case of the third dataset, we also incorporated samples from several Turkic ethnic groups all over Eurasia. Turkic groups from the Caucasus were Azeris, Balkars, Kumyks and Nogais, European Turkic people were the Chuvash, Bashkirs, the Gagauz and the Tatars. The Central-Asian Turkic group comprised of Karakalpak, Turkmen, Uyghur, and Uzbek samples, while Tuvans and Yakuts were considered as Siberian Turkic groups. Number of individuals in the second and third datasets were 472 and 665, number of SNPs after the LD-based pruning were 69,538 and 70,357, respectively.

For the ML Tree, we created a fourth dataset including a smaller number of populations considering the difficulty of visualizing too many populations on an ML tree. We selected ethnic groups for this analysis according to the results of the PCA and ADMIXTURE analyses and applied only the most defining populations of the clusters established by these analyses. The dataset on which TreeMix analysis was performed contained the Csangos, the West and East-European populations, and all Turkic-related groups. The European and Turkic populations were the following: Bulgarians, Hungarians, Romanians, Belarusians, Mordovians, Russians, Ukrainians, North Italians, Basques, French, Orcadians, Germans, Spaniards, Lithuanians, Turks, Balkars, Kumyks, Nogais, Chuvash, Turkmens, Uyghurs, Uzbeks, Azeris, Bashkirs, Gagauz, Tatars, Karakalpaks, Tuvans, and Yakuts. The dataset contained n = 477 individuals and 79,756 SNPs after the LD-based SNP pruning method which was carried out with the same settings as the previous pruning procedures. We applied a window size of 1000 SNPs and used the Yakuts as the root population. At the first analysis, migration was not estimated in the ML graph, although the second ML tree was built with two migration events based on the preliminary residual fit and migration weight results. All other options were left on its default setting.

Relationship of the Csango people to Hungarians and to Romanians

In order to investigate the relationship of Csangos with Hungarians and Romanians, we used the dataset created for the TreeMix analysis containing the West European, East European, and Turkic groups without the LD-based SNP pruning. The unpruned dataset featured 92,809 SNPs and 477 samples.

For revealing relationships between the three groups, Csangos, Hungarians, and Romanians, we applied a 4-population test method from the ADMIXTOOLS 4.1 software package [37]. With this formal test of admixture, we intended to test whether Hungarians or Romanians have more significant common ancestry with the Csango people. In our first D-statistics setup we hypothesized that Csangos are related to Romanians, on the second tree we assumed that Csangos are not related neither Hungarians nor Romanians. These hypothetical unrooted phylogenetic trees were ((Hungarians, Bulgarians)(Romanians, Csangos)) and ((Hungarians, Romanians)(Bulgarians, Csangos)). We also investigated the D-statistics setup ((Africans, Csangos)(Hungarians, Romanians)), which can provide clear evidence for the more significant relatedness of Hungarians to the Csango people. Unlike the other tests of this section, we applied for this setup the unpruned version of the dataset created for the first PCA containing the HapMap samples. It featured 155,274 SNPs.

We applied IBD segment detection using the Refined IBD algorithm of Beagle 4.1 to further detail the relationship of the three populations by characterizing the average pairwise IBD sharing similarities and differences of the three populations with the European and the Turkic groups [38]. Major alleles of the dataset were set as the A1 allele with PLINK1.9 and was converted to Variant Call Format 4.1 with the PLINK/SEQ v0.10 conversion algorithm [39]. We set the minimum IBD segment length to 3 centiMorgan, used the IBD trim parameter setting 10. We applied also an IBD scale parameter according to the \(\sqrt {n/100}\) recommendation where n is equal to the number of individuals in the dataset to be analyzed and \(\sqrt {n/100}\) ≥ 2 [38]. All other parameters were left on the default setting.

We used the output of the Refined IBD to compute an average pairwise IBD sharing between populations I and J.

$${\mathrm{Average}}\,{\mathrm{share}} \,=\, \frac{{\mathop {\sum }\nolimits_{{{i}} \,=\, 1}^{{n}} \mathop {\sum }\nolimits_{{{j}} \,=\, 1}^{{m}} {\mathrm{IBD}}_{{{ij}}}}}{{{{n}} \,\times\, {{m}}}}.$$

In the equation, IBDij is the length of IBD segment shared between the individuals i and j and n, m are the number of individuals in population I and J [40].

Examining Turkic ancestry in the Csango people

For these tests, we applied the same dataset created for the first D-Statistics and IBD analysis, but the African Yoruba from the HapMap data, functioning as an outgroup for these tests, was added to the data (number of SNPs were 87,386).

F4 ratio estimation from the ADMIXTURE 4.1 software package was applied to further investigate the connection between Csangos and Central-Asian, Siberian Turkic groups. In our setup we applied the phylogenetic relationship F4 (Central-Asian Turkic group; Yoruba; Csangos; Turks)/(Central-Asian Turkic group; Yoruba; Siberian Turkic groups; Turks) in order to characterize the Siberian Turkic ancestry of Csangos. Admixture between the populations applied in the F4 ratio estimation setup can bias the results. Therefore, in order to confirm that there are no detectable admixture events between targeted, untargeted populations, and Csangos, we applied the 3-population test method of the ADMIXTURE 4.1 software package.

The ALDER 1.03 algorithm was used in order to further confirm the Central-Asian/Siberian Turkic ancestry of Csangos and to infer the date of admixture between them, which might allow to better understand the circumstances that lead to the admixture of these populations. ALDER is based on the decay of LD caused by an admixture event [41]. The algorithm computes the correlations between SNPs in the presumably admixed target population, which is weighted according to the allele frequency differences in ancestral populations. Ancestral populations serve as reference samples to the algorithm. The algorithm uses allele frequency values of the reference populations to amplify the signal of admixture LD, which facilitates filter out the background LD. All ALDER analyses applied 2 reference populations. We used Central-Asian or Siberian Turkic groups and Turks as reference populations, and the target population was the Csangos.

Results

Population structure and ancestry of the Csango people

Whereas contemporary Hungarians are assumed to be the closest relatives to the Csango people and the Csangos are an ethnic group of Romania, therefore we placed first the Csango people, Hungarians, and Romanians in a worldwide context using the HapMap data. The purpose of this was to better understand their relationship to each other and to worldwide populations. According to our PCA results, these three populations form a loose cluster with each other and with the CEU samples, and do not cluster together with the TSI group (Fig. 1). Plotting the results on the first four principal components, it can be seen that relationship of Csango samples with Hungarians and Romanians are similar, although Csangos tends to be closer to Hungarians on PC1. Our second PCA analysis included various European populations and also Turkic populations all over Eurasia (Fig. 2). European populations scatter on the first principal component from North-East Europe through East, East–Central, and West Europe forming a straight line. South European populations can be found at the end of this line establishing a coherent cluster with high one-dimensional variability. Csangos cluster on this line to East–Central European populations as expected. Turkic populations form multiple groups, and in contrast to Europeans, their variability is much more two dimensional, since they plot with high variability both on the first and second principal components. Turkic groups of the Caucasus region form a well-defined cluster with Turks with the exception of Nogais which group might have a higher rate of East-Asian ancestry. Central-Asian populations are in a loose cluster between the Turkic groups from the Caucasus and from the Siberian region. European Turkic groups also form a loose group like Central Asians, and they cluster between East Europeans and Central-Asian Turkic groups. We can observe that Mordovians, Csangos, and Basques fall somewhat outside the line-shaped cluster parallel to PC1 formed by the European populations. The Mordovian and Csango samples have a moderate to slight orientation toward the Central-Asian and Siberian Turkic groups. This could suggest the more significant East Eurasian or Turkic ancestry of these populations, which should be further investigated. German samples are inhomogeneous, and some of the German samples also show this tendency, which can be the result of the recent 20th century Turkish immigration into Germany [42]. In contrast, Basques show less East Eurasian ancestry since Basque samples cluster slightly to the opposite direction than of Mordovians and Csangos. Basques are a genetically closed Pre-Indo-European indigenous group from the Neolithic residing on the coast of the Bay of Biscay, which admixed with Middle Eastern immigrant groups to a very limited extent [43].

Fig. 1
figure 1

PCA results of Csangos, Hungarians, Romanians, and HapMap populations on the first four principal components. Note that this PCA includes all HapMap populations, but only European and GIH populations were plotted on the first four principal component. The eigenvalues of the eigenvectors (PCs) 1, 2, 3, and 4 were 100.35, 51.34, 7.20, and 6.60, respectively

Full size image

Fig. 2
figure 2

PCA results of Csangos, Europeans, and regional Turkic populations on the first four principal components. The eigenvalues of the eigenvectors 1, 2, 3 and 4 were 8.46, 2.98, 1.93, and 1.43, respectively

Full size image

Pairwise average allele frequency differentiation calculations based on the second PCA showed that Csangos have slightly higher Fst values with European populations than both Hungarians and Romanians. Csangos have the lowest Fst with East–Central Europeans and highest Fst with North Europeans. In case of Turkic groups, Hungarian Fst values are closer to the values observed for Csangos than Fst values between Turkic peoples and Romanians (Supplementary Table 2).

We ran two ADMIXTURE analyses with K = 2 to K = 15 hypothetical common ancestral groups. The cross-validation error showed to be the smallest at K = 3 at both runs. CV-error rates at K = 3 were 0.5431 and 0.5428, respectively. ADMIXTURE analysis results at K = 3 reflects the results of PCA. Although ancestry component share of East–Central Europeans were generally similar, presumably due to the subsequent admixture events throughout the history of the East–Central European region, we can observe slight differences. Results of the ADMIXTURE analysis in the case of the East–Central European populations show us that the Csangos have similar extent of East-European ancestry as the Bosnians, Croatians, Hungarians, and Slovenians. This East-European ancestry component is somewhat smaller in Bulgarians, Kosovans, Montenegrins, and Romanians. The Middle Eastern share of the Csango ancestry shows more similarity to Romanians, Montenegrins, and Serbians. Csangos have less Pre-Indo-European derived West European ancestry than other East–Central Europeans (Fig. 3a). ADMIXTURE analysis, taking into account all populations of the PCA analysis and incorporating all available Turkic groups into the tests, strengthened our PCA results, and shows us that Csangos are the closest to East–Central Europeans, and in contrast to other East–Central European populations, they generally have a somewhat larger share of the Central-Asian/Siberian Turkic ancestry component, therefore shows a slightly stronger similarity for this ancestry component to East-European populations, which phenomenon should be further investigated (Fig. 3b).

Fig. 3
figure 3

ADMIXTURE analysis results at K = 3 common ancestral groups. a ADMIXTURE analysis of Csangos, Europeans, and Turks. b ADMIXTURE analysis of Csangos, Europeans, and regional Turkic populations. One column represents one individual, one column group corresponds to a regional population. ADMIXTURE graphs with K = 3 to K = 10 are available in the Supplementary Fig. 1

Full size image

The TreeMix tests strengthened the calculations conducted by the PCA and ADMIXTURE analysis since it gave similar population relationship results. They also provided new information about population relationships. TreeMix estimated migration events from the area of Central Asia and Siberia into East Europe when we instructed the algorithm to incorporate the two most probable migration event into the ML tree (Supplementary Fig. 2a, b). We also tested the reproducibility of the 2-migration model repeating the same run 10 times with the same parameters. Migration from East Asia and Central Asia into East Europe are observable on every runs and on some of the runs the algorithm also estimates migration from these regions to the Middle East (Supplementary Fig. 3).

Relationship of the Csangos with Hungarians and Romanians

We used the 4-population test in order to investigate Csango, Hungarian, and Romanian relationships. First, we assumed that Csangos are closer related to Romanians since they live in Romania. In our second setup, we assumed that Csango people have an entirely different ancestry and do not related neither to Hungarians nor to Romanians. The D-statistics of these 4-population test setups showed that these assumptions are not correct and showed that Csangos might have stronger common ancestry with Hungarians than with Romanians. Our setup applying the Yoruba from the HapMap data showed that the genetic relatedness of Csangos shows to be indeed stronger with Hungarians (Supplementary Table 3).

In order to further characterize the relationship of these three populations, we applied an IBD segment detection method and calculated the average pairwise IBD sharing of the three populations with each other, with West and East-European populations and with all investigated Turkic populations. Average IBD share of Csangos with Hungarians was 4.08. In case of Romanians, the share was 3.63. The difference between these average pairwise IBD sharing values is almost identical to the difference of IBD share of Csangos and Romanians and of Hungarians and Romanians, which was 3.26. The IBD analysis also shows that IBD share of Csango people with East Europeans is similarly high as in the case of Hungarians, but their IBD share with West Europeans are more similar to the share in case of Romanians (Fig. 4).

Fig. 4
figure 4

Average pairwise IBD sharing results. a Average share between Csangos and European, Turkic population. b Average share between Csangos and European and Turkic regional groups

Full size image

Generally, IBD share with Turks and Turkic people are approximately equal in all three populations, but in case of European Turkic groups, this share increases together in Hungarians and Csangos. The IBD detection method also strengthened the previous results, that Siberian Turkic groups might be a significant source of ancestry in case of Csangos relative to all other investigated populations, except of Mordovians.

Relationship of the Csangos with Turkic people

Previous investigations pointed out the possibility that Csangos are unique in the East–Central European area in the way that they might possess relatively significant East-Asian derived Turkic ancestry.

F4 ratio estimation was applied as a formal test of admixture in order to further investigate the possibility of East-Asian admixture. We were able to detect admixture with the Siberian Turkic groups and the proportion of East-Asian Turkic ancestry approached 4 percent (4.22%) (Table 1). Since admixture between populations applied in the F4 ratio estimation model can bias the results, we tested the applied groups using 3-population test and found no evidence of detectable admixture (Supplementary Table 4).

Table 1 F4 ratio estimation results

Full size table

We were able to verify the Siberian and Central-Asian Turkic ancestry also with the ALDER method based on the decay of admixture linkage disequilibrium. ALDER estimated that the ancestors of the Csango people admixed with Siberian Turkic people, ancestors of the Yakuts 38.85 ± 9.41 generations ago, which means that these admixtures occurred 853–1399 years ago, if we assume that one generation equals 29 years (Supplementary Fig. 4, Table 2) (z = 3.41, p = 0.00066) [44]. In case of the Central-Asian Uyghurs and Karakalpaks, admixture could occur 34.27 ± 5.13 generations or 845–1142 years ago (z = 2.57, p = 0.01) and 36.91 ± 9.89 generations or 784–1357 years ago (z = 2.61, p = 0.009), respectively.

Table 2 ALDER calculation results

Full size table

Discussion

Using PCA and ML-based population structure and ancestry estimation methods, we placed the Csangos in East–Central Europe and also in a European context. Although, results of ADMIXTURE were similar in case of all East–Central European populations we were able to observe a significant East-European ancestry in Csangos, which is similar to that of Hungarians. According also to the results of ADMIXTURE their Middle Eastern ancestry showed more similarity to Romanians. However, Csangos showed marginal West European ancestry derived from the Pre-Indo-European era, which seems to be almost unique to Csangos in East–Central Europe and shows similarity with East Europeans. These analyses pointed out also that while other European populations have detectable ancestry only from the Middle Eastern region, Csangos, like Mordovians, could possess also detectable Central-Asian and Siberian Turkic ancestry.

Fst calculations showed us that Csangos have higher Fst with Europeans than Hungarians and Romanians show on average, and Csangos are more similar to Hungarians in case of Fst with most Turkic populations.

Using a formal test of admixture, first we investigated if Csangos have Hungarian ancestry. The D-statistics results of the 4-population test suggested that Hungarians have a stronger connection with the Csangos than Romanians.

Average pairwise IBD share calculations confirmed also the more significant Hungarian ancestry in Csangos in contrast of their Romanian ancestry. IBD sharing confirmed the Central-Asian and Siberian ancestry source in the Csangos which tended to be far more significant compared to this share inferred in case of either Hungarians or Romanians. Average pairwise IBD share results showed also the previously observed phenomenon that East-European ancestry in Csangos and Hungarians approaches each other very well.

We intended to confirm the supposed relationship among Csangos and Central-Asian/Siberian Turkic groups using tests based on different principles. A formal test of admixture based on allele frequency and another test method based on the temporal decrease of admixture linkage disequilibrium were applied for this purpose. Both methods confirmed these Asian ancestry sources. The estimated proportion of this ancestry was ~4%. According to the estimates of the ALDER algorithm, admixture of Central-Asian and Siberian Turkic groups with Csangos could occur 784–1399 years ago, which includes the times preceding the conquest of the Carpathian basin of proto-Hungarians and also the first century of the Kingdom of Hungary.

While the study referred earlier and based on Y haplogroup analysis detected a common Inner Asian Altaic ancestry component in contemporary East–Central European Hungarian speaking ethnic groups, we found two main sources of Turkic ancestry from which the Central-Asian/Siberian Turkic derived ancestry component is rather exclusive to the investigated Csangos. Hungarians, as well as most East–Central European populations possess Turkic ancestry mostly derived from the Middle East and from its neighboring Central-Asian regions. Latter ancestry components could originate from the early Middle Ages when nomadic Turkic tribes from West Asia invaded Europe and could originate also from the Ottoman invasion of East–Central Europe and recent Turkic immigration events into Europe. According to our results, Central-Asian and Siberian Turkic ancestry in Csangos could derive either from the migration period of the proto-Hungarian tribes or from immediately after the times of the Hungarian conquest of the Carpathian basin. If this ancestry component is derived from the times when proto-Hungarian tribes have settled in the area of the Carpathian Basin already, this ancestry component could originate indirectly from contacting other populations migrating also from Asia toward Europe, e.g., from other Turkic groups with the same process mentioned previously in connection with the Middle Eastern/Central-Asian Turkic ancestry component observable in most East and East–Central European populations. Alternatively, applying Turkish samples as second reference group might bias the ALDER analysis results by estimating a later date for the Central-Asian/Siberian Turkic gene flow due to the fact that there are more recent detectable gene flow from the Middle East e.g., from the times of the Ottoman invasion [45]. In order to answer the questions arising from these results, ancient DNA from the investigated area would be needed and the issue will be further investigated. This could also answer the question that Csangos are direct descendants of migrating proto-Hungarians remained in the area of the Siret, or arrived later from the Kingdom of Hungary.

Our results are concordant with the conclusions of previous historical, linguistical, cultural anthropological, and genetic studies identifying Csangos as an ethnic group with Hungarian origin. We were also able to find a detectable East-Asian or Siberian Turkic component in their ancestry which distinguishes them from all investigated East–Central European populations and points out also that Csangos are living in significant genetic isolation in Romania and might preserved some of the genetic legacy of the migrating proto-Hungarian tribes arrived into East–Central Europe in the end of the 9th century.

References

  1. Tánczos V. Language shift among the Moldavian Csángós. Cluj-Napoca: Romanian Institute for Research on National Minorities; 2012.

  2. Tytti I-A. Csango minority culture in Romania. Committee on culture, science and education. Council of Europe; 2001. http://assembly.coe.int/nw/xml/XRef/Xref-DocDetails-EN.asp?FileID=16906&lang=EN.

  3. Lehel P, Tánczos V. Language use, attitudes, strategies. linguistic identity and ethnicity in the Moldavian Csángó villages. Cluj Napoca: ISPMN Publishing; 2012.

    Google Scholar 

  4. Tánczos V. Hungarians in Moldavia. Budapest: Teleki László Foundation. Institute for Central European Studies; 1998.

  5. Baker R. On the origin of the Moldavian Csángós. The Slavonic and East European Review. 1997;75:658–80.

    Google Scholar 

  6. Stroschein S. Ethnic struggle, coexistence, and democratization in Eastern Europe. New York: Cambridge University Press; 2012.

  7. Ramet SP. Protestantism and politics in eastern Europe and Russia: the communist and postcommunist eras. vol. 3. Durham, NC: Duke University Press; 1992.

  8. Spinei V. Moldavia in the 11th–14th centuries. Bucureşti, România: Editura Academiei Republicii Socialiste Româna; 1986.

  9. Engel P. The realm of St Stephen: a history of medieval Hungary, 895–1526. London: I.B. Tauris Publishers; 2001.

  10. Guglielmino CR, De Silvestri A, Beres J. Probable ancestors of Hungarian ethnic groups: an admixture analysis. Ann Hum Genet. 2000;64:145–59.

    Article  CAS  Google Scholar 

  11. Brandstatter A, Egyed B, Zimmermann B, Duftner N, Padar Z, Parson W. Migration rates and genetic structure of two Hungarian ethnic groups in Transylvania, Romania. Ann Hum Genet. 2007;71:791–803.

    Article  CAS  Google Scholar 

  12. Biro A, Feher T, Barany G, Pamjav H. Testing Central and Inner Asian admixture among contemporary Hungarians. Forensic Sci Int Genet. 2015;15:121–6.

    Article  CAS  Google Scholar 

  13. Hellenthal G, Busby GBJ, Band G, Wilson JF, Capelli C, Falush D, et al. A genetic atlas of human admixture history. Science. 2014;343:747–51.

    Article  CAS  Google Scholar 

  14. Curta F. Southeastern Europe in the middle ages, 500–1250. New York: Cambridge University Press; 2006.

  15. Róna-Tas A. Hungarians and Europe in the Early Middle Ages: an introduction to early Hungarian history (Translated by Nicholas Bodoczky). Budapest: CEU Press; 1999.

  16. Langó P. Archaeological research on the conquering Hungarians: a review. In: Research on the prehistory of the Hungarians: review: papers presented at the meetings of the Institute of Archaeology of the HAS, 2003–2004. Budapest: Varia Archaeologica Hungarica 18; 2005. p. 175–340.

  17. Kovács L. Remarks on the archaeological remains of the 9th–10th century Hungarians. In: Research on the prehistory of the Hungarians: review: papers presented at the meetings of the Institute of Archaeology of the HAS, 2003–2004. Budapest: Varia Archaeologica Hungarica 18; 2005. p. 351–68.

  18. Tóth SL. The Qavars (Qabars) and their role in the Hungarian Tribal Federation. Chronica. 2016;12:3–22.

    Google Scholar 

  19. Sugar PF, Hanák P, Frank T. A history of Hungary. Bloomington, IN, USA: Indiana University Press; 1990.

  20. Vásáry I. Cumans and Tatars: Oriental Military in the Pre-Ottoman Balkans, 1185–1365. 1st ed. New York: Cambridge University Press; 2005.

  21. Golden PB. The peoples of the south Russian steppes. In: The Cambridge History of Early Inner Asia. New York: Cambridge University Press; 1990.

  22. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81:559–75.

    Article  CAS  Google Scholar 

  23. Chang CC, Chow CC, Tellier LC, Vattikuti S, Purcell SM, Lee JJ. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience. 2015;4:7.

    Article  Google Scholar 

  24. Frazer KA, Ballinger DG, Cox DR, Hinds DA, Stuve LL, Gibbs RA, et al. A second generation human haplotype map of over 3.1 million SNPs. Nature. 2007;449:851–61.

    Article  CAS  Google Scholar 

  25. Altshuler DM, Gibbs RA, Peltonen L, Altshuler DM, Gibbs RA, Peltonen L, et al. Integrating common and rare genetic variation in diverse human populations. Nature. 2010;467:52–8.

    Article  CAS  Google Scholar 

  26. Cann HM, de Toma C, Cazes L, Legrand M-F, Morel V, Piouffre L, et al. A human genome diversity cell line panel. Science. 2002;296:261–2.

    Article  CAS  Google Scholar 

  27. Rosenberg NA, Mahajan S, Ramachandran S, Zhao C, Pritchard JK, Feldman MW. Clines, clusters, and the effect of study design on the inference of human population structure. PLoS Genet. 2005;1:e70.

    Article  Google Scholar 

  28. Yunusbayev B, Metspalu M, Jarve M, Kutuev I, Rootsi S, Metspalu E, et al. The Caucasus as an asymmetric semipermeable barrier to ancient human migrations. Mol Biol Evol. 2012;29:359–65.

    Article  CAS  Google Scholar 

  29. Behar DM, Yunusbayev B, Metspalu M, Metspalu E, Rosset S, Parik J, et al. The genome-wide structure of the Jewish people. Nature. 2010;466:238–42.

    Article  CAS  Google Scholar 

  30. Kovacevic L, Tambets K, Ilumae A-M, Kushniarevich A, Yunusbayev B, Solnik A, et al. Standing at the gateway to Europe—the genetic structure of Western balkan populations based on autosomal and haploid markers. PLoS ONE. 2014;9:e105090.

    Article  Google Scholar 

  31. Kushniarevich A, Utevska O, Chuhryaeva M, Agdzhoyan A, Dibirova K, Uktveryte I, et al. Genetic heritage of the Balto-Slavic speaking populations: a synthesis of autosomal, mitochondrial and Y-chromosomal data. PLoS ONE. 2015;10:e0135820.

    Article  Google Scholar 

  32. Behar DM, Metspalu M, Baran Y, Kopelman NM, Yunusbayev B, Gladstein A, et al. No evidence from genome-wide data of a Khazar origin for the Ashkenazi Jews. Hum Biol. 2013;85:859–900.

    Article  Google Scholar 

  33. Yunusbayev B, Metspalu M, Metspalu E, Valeev A, Litvinov S, Valiev R, et al. The genetic legacy of the expansion of Turkic-speaking nomads across Eurasia. PLoS Genet. 2015;11:e1005068.

    Article  Google Scholar 

  34. Patterson N, Price AL, Reich D. Population structure and eigenanalysis. PLoS Genet. 2006;2:e190.

    Article  Google Scholar 

  35. Alexander DH, Novembre J, Lange K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 2009;19:1655–64.

    Article  CAS  Google Scholar 

  36. Pickrell JK, Pritchard JK. Inference of population splits and mixtures from genome-wide allele frequency data. PLoS Genet. 2012;8:e1002967.

    Article  CAS  Google Scholar 

  37. Patterson N, Moorjani P, Luo Y, Mallick S, Rohland N, Zhan Y, et al. Ancient admixture in human history. Genetics. 2012;192:1065–93.

    Article  Google Scholar 

  38. Browning BL, Browning SR. Improving the accuracy and efficiency of identity-by-descent detection in population data. Genetics. 2013;194:459–71.

    Article  Google Scholar 

  39. Purcell S. PLINK/SEQ: a library for the analysis of genetic variation data. 2014. https://atgu.mgh.harvard.edu/plinkseq.

  40. Atzmon G, Hao L, Pe’er I, Velez C, Pearlman A, Palamara PF, et al. Abraham’s children in the genome era: major Jewish diaspora populations comprise distinct genetic clusters with shared Middle Eastern Ancestry. Am J Hum Genet. 2010;86:850–9.

    Article  CAS  Google Scholar 

  41. Loh PR, Lipson M, Patterson N, Moorjani P, Pickrell JK, Reich D, et al. Inferring admixture histories of human populations using linkage disequilibrium. Genetics. 2013;193:1233–54.

    Article  Google Scholar 

  42. Nathans E. The politics of citizenship in Germany: ethnicity, utility and nationalism. Oxford, UK: Berg Publishers; 2004.

  43. Semino O, Passarino G, Oefner PJ, Lin AA, Arbuzova S, Beckman LE, et al. The genetic legacy of Paleolithic Homo sapiens sapiens in extant Europeans: a Y chromosome perspective. Science. 2000;290:1155–9.

    Article  CAS  Google Scholar 

  44. Fenner JN. Cross-cultural estimation of the human generation interval for use in genetics-based population divergence studies. Am J Phys Anthropol. 2005;128:415–23.

    Article  Google Scholar 

  45. Banfai Z, Melegh BI, Sumegi K, Hadzsiev K, Miseta A, Kasler M, et al. Revealing the genetic impact of the Ottoman occupation on ethnic groups of East–Central Europe and on the Roma population of the area. Front Genet. 2019;10:558.

    Article  CAS  Google Scholar 

Download references

Acknowledgements

The present scientific contribution is dedicated to the 650th anniversary of the foundation of the University of Pécs, Hungary. This study was supported by the National Scientific Research Program (NKFI) K 119540. This study was supported by the Research University Resource, Institutional Excellence Grant 2016.; Center for Excellence—Center of Molecular Medicine; GINOP-2.3.2-15-2016-00039; Grant Manager: Ministry of Human Resources, Hungary. This study was supported by the Human Resources Development Operational Program, Ministry of Human Resources, Hungary and by the Medical School of University of Pécs; EFOP-3.6.3-VEKOP-16-2017-00009 and EFOP-3.6.1-16-2016-00004.

Author information

Author notes

  1. These authors contributed equally: Valerián Ádám, Zsolt Bánfai

Authors and Affiliations

  1. Department of Medical Genetics, Clinical Centre, University of Pécs, Pécs, Hungary

    Valerián Ádám, Zsolt Bánfai, Anita Maász, Katalin Sümegi & Béla Melegh

  2. Szentágothai Research Centre, University of Pécs, Pécs, Hungary

    Valerián Ádám, Zsolt Bánfai, Anita Maász, Katalin Sümegi & Béla Melegh

  3. Department of Laboratory Medicine, Medical School, University of Pécs, Pécs, Hungary

    Attila Miseta

Authors

  1. Valerián Ádám

    You can also search for this author in PubMed Google Scholar

  2. Zsolt Bánfai

    You can also search for this author in PubMed Google Scholar

  3. Anita Maász

    You can also search for this author in PubMed Google Scholar

  4. Katalin Sümegi

    You can also search for this author in PubMed Google Scholar

  5. Attila Miseta

    You can also search for this author in PubMed Google Scholar

  6. Béla Melegh

    You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Zsolt Bánfai or Béla Melegh.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ádám, V., Bánfai, Z., Maász, A. et al. Investigating the genetic characteristics of the Csangos, a traditionally Hungarian speaking ethnic group residing in Romania. J Hum Genet 65, 1093–1103 (2020). https://doi.org/10.1038/s10038-020-0799-6

Download citation

  • Received: 04 March 2020

  • Revised: 23 June 2020

  • Accepted: 27 June 2020

  • Published: 11 July 2020

  • Issue Date: December 2020

  • DOI: https://doi.org/10.1038/s10038-020-0799-6