Vol. 12, No. 2, December 2011

 

Milos Utvić
University of Belgrade, Faculty of Philology
 
ANNOTATING THE CORPUS OF CONTEMPORARY SERBIAN
 
UDC: 004.9:811.163.41’374
Remark: This paper presents results of research that have been achieved during 2011 supported by the Serbian Ministry of Education and Science under the grant 178006 (Serbian Language and its Resources) and by project CESAR, as a part of a wider network of excellence called META-NET, funded by the European Union.
Keywords: annotation, corpus, tagger, TreeTagger
Abstract: This article describes stages in annotation of the 113 million Corpus of Contemporary Serbian (preparation and implementation). There are several levels of annotation which have been conducted. Corresponding bibliographical information is attached to each corpus text. Part-of-speech (PoS) tagset is prepared, based on the electronic morphological dictionary of Serbian, as well as dictionary of possible annotations adapted for TreeTagger, the PoS tagging system. The Corpus of Contemporary Serbian has been automatically, morphosyntactically annotated with TreeTagger software, i.e. information about part of speech and lemma has been attached to each corpus word form. TreeTagger used manually tagged one million word corpus INTERA as a training set. Ten-fold cross-validation is used for evaluation of applied annotation procedure.

 

 


REVIEWS

THE EUROPEANA LICENSING FRAMEWORK

ARTICLE

Jay Jordan
OCLC: COLLABORATIVELY BUILDING WEBSCALE SERVICES WITH LIBRARIES

Milos Utvić 
ANNOTATING THE CORPUS OF CONTEMPORARY SERBIAN

Biljana Lazić,Jelica Poklopić 
MULTIMEDIA DOCUMENT “CULT RADIO PROGRAMS” - AN INSIGHT INTO THE STATUS OF THE ARCHIVES OF RADIO STATIONS IN SERBIA

Aleksandra Pavlović
"THE SERPENT IN THE GARDEN OF EDEN”: INTELLECTUAL PROPERTY IN THE DIGITAL MILLENNIUM

REVIEWS

Milan Vasiljević 
EXPERIENCES FROM THE OCLC EUROPE AND USA STUDY TOUR

Aleksandra Trtovac 
INTERNATIONAL SCIENTIFIC CONFERENCE “DIGITALIZATION OF CULTURAL AND SCIENTIFIC HERITAGE, UNIVERSITY REPOSITORIES AND DISTANCE LEARNING”

Adam Sofronijević 
LIBRARIANSHIP AND DIGITAL SERENDIPITY TPDL AND CIKM 2011 CONFERENCES