Vol. 16, No. ½, August 2016

 

Jovana Kovačević, Jelena Graovac 
University of Belgrade,
Faculty of Mathematics,
Department for Computer Science and Informatics

 

APPLICATION OF A STRUCTURAL SUPPORT VECTOR MACHINE METHOD TO N-GRAM BASED TEXT CLASSIFICATION IN SERBIAN

DOI 10.18485/infotheca.2016.16.1_2.1
UDC: 811.163.41'322.2
Keywords: hierarchical text classification, Support Vector Machine Method, Ebart corpus
AbstractThe paper presents classification results that were obtained using the Support Vector Machine method (SVM) over a hierarchically organized corpus of documents in Serbian.Two techniques derived from the SVM with structural output have been applied: multiclass flat classification and hierarchical classification. A common representation model of a document and a class or a hierarchy of classes the document belongs to, specific for this form of the SVM method, is based on different length byte n-grams. Four tf-idf statistics have been used that define significance of an n-gram for a specific document. The described techniques and statistics have been tested on a hierarchically structured subset of the Ebart corpus of newspaper texts. The results obtained for both types of classifiers are similar for the corpus as a whole, while hierarchical classifier performs better for most specific classes with a small number of texts. 

 

                                                                                                            


SCIENTIFIC PAPERS

 

Jovana Kovačević, Jelena Graovac

APPLICATION OF A STRUCTURAL SUPPORT VECTOR MACHINE METHOD TO N-GRAM BASED TEXT CLASSIFICATION IN SERBIAN

Miljana Mladenović

ONTOLOGY-BASED RECOGNITION OF RHETORICAL FIGURES ABSTRACT

Tanja Ivanović

LEXICAL ANALYSIS OF TWO-WORD TERMINOLOGICAL PHRASES WITHIN DISTRIBUTION SYSTEM

Milena Milinković

THE BIBLIOMETRIC AND CITATION ANALYSES OF THE SPATIUM JOURNAL

 

PROFESSIONAL PAPERS
 

Gordana Nedeljkov

 E-BOOKS AND NEW DIMENSION OF READING SUMMARY

Milena Obradović, Aleksandra Arsenijević, Mihajlo Škorić

PREPARATION OF MULTIMEDIA DOCUMENT "YU ROCK SCENE"

 

REVIEWS
 

Vladan Devedžić, Milan Krstić

SOCIAL SCIENCES AND COMPUTING: MASTER STUDY PROGRAM REVIEW

Jelena Mitrović

REVIEW OF THE 2015 EUROLAN SUMMER SCHOOL IN COMPUTATIONAL LINGUISTICS