New Language Models for Serbian

Mihailo Škoric University of Belgrade, Faculty of Mining and Geology, Belgrade,Serbia http://orcid.org/0000-0003-4811-8692

DOI: https://doi.org/10.18485/2024.24.1.1

Abstract

The paper will briefly present the development history of transformer-based language models for the Serbian language. Several new models for text generation and vectorization, trained on the resources of the Society for Language Resources and Technologies, will also be presented. Ten selected vectorization models for Serbian, including the two new ones, will be compared on four natural language processing tasks. Paper will be analyze which models are the best for each selected task, how does their size and the size of their training sets affect the performance on those tasks, and what is the optimal setting to train the best language models for the Serbian language.

Published

2025-03-07

How to Cite

ŠKORIC, Mihailo. New Language Models for Serbian. Infotheca - Journal for Digital Humanities, [S.l.], v. 24, n. 1, p. 7-28, mar. 2025. ISSN 2217-9461. Available at: <https://infoteka.bg.ac.rs/ojs/index.php/Infoteka/article/view/2024.24.1.1_en>. Date accessed: 01 apr. 2025. doi: https://doi.org/10.18485/2024.24.1.1.

Citation Formats

Issue

Vol 24 No 1 (2024): Infotheca - Journal of Informatics and Librarianship

Section

Articles

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

		Faculty of Philology, University of Belgrade
		University Library „Svetozar Marković“
		Association of Libraries of the Universities of Serbia

New Language Models for Serbian

Abstract

Publisher