Personal Names in Information Extraction
Abstract
The production of electronic texts on the Internet in digital libraries and archives increases every day and the need for adequate software tools that would enable users to manipulate texts and automatically process them increases with it. In the first part of the paper, various definitions of the Information Extraction field, the short history of the development of IE methods, and its different types and possible applications shall be presented. There are various methods of information extraction. Some are simple methods based on pattern matching, and some that use finite-state automata, context-free grammars or statistical models which are rather more complex. In the second part of the paper, the method for the precise automatic string recognition in a Serbian language digital text of a Serbian name and a surname, as well as English names transcribed in Serbian, will be presented and analyzed. Personal names represent an important part of the lexica of written texts regardless of their form, printed or electronic, and they are widely researched in the information extraction field. The method that is described in this work has been developed in LADL (Laboratoire d’Automatique Documentaire et Linguistique).
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.