A Suffix Subsumption-Based Approach to Building Stemmers and Lemmatizers for Highly Inflectional Languages with Sparse Resources

  • Vlado Kešelj Dalhousie University
  • Danko Šipka Arizona State University

Abstract

We present a general suffix-based method for construction of stemmers and lemmatizers for highly inflectional languages with only sparse resources. The process is directly implementable with described efficient design and it is evaluated on a construction of a stemmer for the Serbian language. The evaluation on real data has shown an accuracy of 79%.

Published
2024-03-12
How to Cite
KEŠELJ, Vlado; ŠIPKA, Danko. A Suffix Subsumption-Based Approach to Building Stemmers and Lemmatizers for Highly Inflectional Languages with Sparse Resources. Infotheca - Journal for Digital Humanities, [S.l.], v. 9, n. 1/2, p. 23a-33a, mar. 2024. ISSN 2217-9461. Available at: <https://infoteka.bg.ac.rs/ojs/index.php/Infoteka/article/view/508>. Date accessed: 21 dec. 2024.