Vol. 11, No. 1, April 2010

Sandra Gucul-Milojević
University of Belgrade, Faculty of Philology

PERSONAL NAMES IN INFORMATION EXTRACTION

 
UDC: 004.832.2:025.4
Keywords: personal name, information extraction, electronic text, finite state automata, electronic dictionary, local grammar, computational linguistic
Abstract: The production of electronic texts on the Internet in digital librariesand archives increases every day and the need for adequate software tools that would enable users to manipulate texts and automatically process them increases with it. In the first part of the paper, various definitions of the Information Extraction field, the short history of the development of IE methods, and its different types and possible applications shall be presented. There are various methods of information extraction. Some are simple methods based on pattern matching, and some that use finite-state automata, context-free grammars or statistical models which are rather more complex. In the second part of the paper, the method for the precise automatic string recognition in a Serbian language digital text of a Serbian name and a surname, as well as English names transcribed in Serbian, will be presented and analyzed. Personal names represent an important part of the lexica of written texts regardless of their form, printed or electronic, and they are widely researched in the information extraction field. The method that is described in this work has been developed in LADL (Laboratoire d’Automatique Documentaire et Linguistique).
 

PDF

 


ARTICLE

 

Tomaž Erjavec 
TEXT ENCODING INITIATIVE GUIDELINES AND THEIR LOCALISATION

Annibale EliaSimonetta Vietri
LEXIS-GRAMMAR & SEMANTIC WEB

Biljana Kalezić
SOFTWARE PIRACY IN SERBIA

Sandra Gucul-Milojević 
PERSONAL NAMES IN INFORMATION EXTRACTION

 

REVIEWS

Marija Stiković 
THE FIRST EUROPEAN SUMMER SCHOOL “CULTURE & TECHNOLOGY”

Adam Sofronijević 
AN INSIGTH INTO ETHICS IN SCIENCE AND CULTURE