Textometric and comparative analysis of the bilingual corpus of the journal Infotheca

Abstract

This paper presents a textometric and comparative analysis of the bilingual corpus of the scientific journal Infotheca, which comprises parallel texts in Serbian and English. The corpus was extracted from the digital library Bibliša, with metadata linked to Wikidata. Particular attention is given to the analysis of the Serbian and English subcorpora using textometric methods, including frequency analysis, keyword analysis, collocation analysis, and topic modeling. Following the individual analyses, a comparative analysis was conducted with the aim of identifying differences and similarities in the lexical characteristics of the two
languages. The results show that, although the texts are translation equivalents, there are notable differences in term distribution, indicating the influence of linguistic and scientific conventions. The paper contributes to the development of methodologies for analyzing bilingual corpora in the fields of digital humanities and
language technologies.
Keywords: bilingual corpus, textometrics, Infotheca, parallel texts, comparative analysis, Wikidata, digital libraries.

Published
2026-04-28
How to Cite
STANKOVIĆ, Ranka. Textometric and comparative analysis of the bilingual corpus of the journal Infotheca. Infotheca - Journal for Digital Humanities, [S.l.], v. 26, n. 1, p. 43-66, apr. 2026. ISSN 2217-9461. Available at: <https://infoteka.bg.ac.rs/ojs/index.php/Infoteka/article/view/2026.26.1.2_en>. Date accessed: 22 may 2026. doi: https://doi.org/10.18485/infotheca.2026.26.1.2.