(c) Larry Ewing, Simon Budig, Garrett LeSage
Ó 1994 Ç.

Department of Computer Science

PetrSU | Software projects | AMICT | Staff | News archive | Contact | Search

Using Universals in Information Extraction

Gael Lejeune (University of Helsinki, Finland)

In IE systems we mostly search for lexical items which are compared to existing patterns. These patterns are supposed to cover almost each kind of item combination related to the domain of the IE system. It therefore needs a large amount of resources that are hard to build and might not be usable for another language.

In PULS, our project of epidemic surveillance, we had an English pattern-based system and we worked on a pilot study on French to prepare a multilingual extension. In the system I will present here, the process is mainly language-independent and based on general principles of press articles structure. Basically we use the 5W rule: What, Where, Who, When, Why must be found in the top of the document.

In our preliminary experiments we try only to extract disease (What), location (Where), cases (Who) and date (When). It requires small French resources to perform good results: precision is 87% and recall is 93%. We have good reasons to think that this approach will also be efficient on other languages.