A Suffix Subsumption-Based Approach to Building Stemmers and Lemmatizers for Highly Inflectional Languages with Sparse Resources
Abstract
We present a general suffix-based method for construction of stemmers and lemmatizers for highly inflectional languages with only sparse resources. The process is directly implementable with described efficient design and it is evaluated on a construction of a stemmer for the Serbian language. The evaluation on real data has shown an accuracy of 79%.
Published
2024-03-12
How to Cite
KEŠELJ, Vlado; ŠIPKA, Danko.
A Suffix Subsumption-Based Approach to Building Stemmers and Lemmatizers for Highly Inflectional Languages with Sparse Resources.
Infotheca - Journal for Digital Humanities, [S.l.], v. 9, n. 1/2, p. 23a-33a, mar. 2024.
ISSN 2217-9461.
Available at: <https://infoteka.bg.ac.rs/ojs/index.php/Infoteka/article/view/508>. Date accessed: 18 nov. 2024.
Section
Articles
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.