The textbooks in the EHUskaratuak Corpus

Since 2002, The Basque Language Service at the University of The Basque Country has been publishing college textbooks in Basque translation, systematically following an especially designed methodology. These books belong to all the areas of knowledge: Humanities, Social Sciences, Engineering and Technology, Health Sciences, and Mathematics and Materials Science. Many of these books are considered international milestones in their fields. We have them translated from English, French and Spanish into Basque.

Titles are proposed for translation by lecturers of the different areas of study at the University. The Basque Language Service collects these requests from each school, and then from each campus. Then The Basque Language Committee of the University selects the titles that will be finally translated, within the range of each year´s budget and in accordance with the guidelines of the Basque Language Normalization Plan.

Once the titles have been chosen, they are translated using a particular methodology that is applied in the translation of all the books that are fed into the EHUskaratuak Corpus.

Here is a brief description of the aforesaid methodology:

Each book is translated by a translator with an excellent command of both translation strategies and the subject matter at hand.

Besides the translator, two other people participate in the translation process: on the one hand, there is a technical reviser with domain expertise, who supervises the translation for terminological and conceptual quality assurance.

On the other hand, there is a language reviser, usually a member of the Basque Language Service, who makes sure that the final Basque version of the book is written in correct, appropriate and readable language.

So, the translation team for each book is composed by three participants: the translator, the technical reviser and the linguistic reviser.

All the books in the EHUskaratuak Corpus have been translated using this methodology, which ensures the quality of each translation.

The EHUskaratuak Corpus

Corpora are collections of texts that are used as real samples to describe and study a language. They are used in many fields, such as Linguistics, Lexicography, Terminology, Translation, Language Teaching etc. There are different types of corpora, according to the criteria used to make them. For example, they can be monolingual or multilingual, general or specialized and so on. Nowadays most corpora are in digital format, and enriched with linguistic information, so they can be more easily searched and studied.

Une honetan online dagoen EHUskaratuak Corpusak 2008tik 2016rainoko liburuak barne hartzen ditu, 5 alor eta 18 azpialorretan banatuta. Guztira, beraz, 75 liburu dira.

The table below shows the total number of words and sentences in the corpus, as well as for each of its four languages. These numbers will go on increasing as new translations are fed into the corpus.

Language Sentences Words
Basque 631.284 8.130.410
Spanish 233.485 4.423.095
French 15.791 351.747
English 382.007 5.684.825
TOTAL 1.232.377 18.048.431

The EHUskaratuak Corpus has been developed by Elhuyar Hizkuntza eta Teknologia.