Vol. 16, No. ½, August 2016
Jovana Kovačević, Jelena Graovac
University of Belgrade,
Faculty of Mathematics,
Department for Computer Science and Informatics
APPLICATION OF A STRUCTURAL SUPPORT VECTOR MACHINE METHOD TO N-GRAM BASED TEXT CLASSIFICATION IN SERBIAN
DOI 10.18485/infotheca.2016.16.1_2.1
UDC: 811.163.41'322.2
Keywords: hierarchical text classification, Support Vector Machine Method, Ebart corpus
Abstract: The paper presents classification results that were obtained using the Support Vector Machine method (SVM) over a hierarchically organized corpus of documents in Serbian.Two techniques derived from the SVM with structural output have been applied: multiclass flat classification and hierarchical classification. A common representation model of a document and a class or a hierarchy of classes the document belongs to, specific for this form of the SVM method, is based on different length byte n-grams. Four tf-idf statistics have been used that define significance of an n-gram for a specific document. The described techniques and statistics have been tested on a hierarchically structured subset of the Ebart corpus of newspaper texts. The results obtained for both types of classifiers are similar for the corpus as a whole, while hierarchical classifier performs better for most specific classes with a small number of texts.
SCIENTIFIC PAPERS
Jovana Kovačević, Jelena Graovac
Miljana Mladenović
ONTOLOGY-BASED RECOGNITION OF RHETORICAL FIGURES ABSTRACT
Tanja Ivanović
LEXICAL ANALYSIS OF TWO-WORD TERMINOLOGICAL PHRASES WITHIN DISTRIBUTION SYSTEM
Milena Milinković
THE BIBLIOMETRIC AND CITATION ANALYSES OF THE SPATIUM JOURNAL
PROFESSIONAL PAPERS
Gordana Nedeljkov
E-BOOKS AND NEW DIMENSION OF READING SUMMARY
Milena Obradović, Aleksandra Arsenijević, Mihajlo Škorić
PREPARATION OF MULTIMEDIA DOCUMENT "YU ROCK SCENE"
REVIEWS
Vladan Devedžić, Milan Krstić
SOCIAL SCIENCES AND COMPUTING: MASTER STUDY PROGRAM REVIEW
Jelena Mitrović
REVIEW OF THE 2015 EUROLAN SUMMER SCHOOL IN COMPUTATIONAL LINGUISTICS