Textometric and comparative analysis of the bilingual corpus of the journal Infotheca
Abstract
This paper presents a textometric and comparative analysis of the bilingual corpus of the scientific journal Infotheca, which comprises parallel texts in Serbian and English. The corpus was extracted from the digital library Bibliša, with metadata linked to Wikidata. Particular attention is given to the analysis of the Serbian and English subcorpora using textometric methods, including frequency analysis, keyword analysis, collocation analysis, and topic modeling. Following the individual analyses, a comparative analysis was conducted with the aim of identifying differences and similarities in the lexical characteristics of the two
languages. The results show that, although the texts are translation equivalents, there are notable differences in term distribution, indicating the influence of linguistic and scientific conventions. The paper contributes to the development of methodologies for analyzing bilingual corpora in the fields of digital humanities and
language technologies.
Keywords: bilingual corpus, textometrics, Infotheca, parallel texts, comparative analysis, Wikidata, digital libraries.

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.


