Word Senses: The Stepping Stones in Semantic-Based Natural Language Processing.

Most of the successful commercial applications in language processing (text and/or speech) dispense of any explicit concern on semantics, with the usual motivations stemming from the computational high costs required by dealing with semantics in case of large volumes of data. With recent advances in corpus linguistics and statistical-based methods in NLP, revealing useful semantic features of linguistic data is becoming cheaper and cheaper and the

Exploiting Aligned Parallel Corpora in Multilingual Studies and Applications.

Parallel corpora encode extremely valuable linguistic knowledge, the revealing of which is facilitated by the recent advances in multilingual corpus linguistics. The linguistic decisions made by the human translators in order to faithfully convey the meaning of the source text can be traced and used as evidence on linguistic facts which, in a monolingual context, might be unavailable to (or overlooked by) a computer program. Multilingual technologies,

Ontology-supported Text Classification based on Cross-lingual Word Sense Disambiguation

The paper reports on recent experiments in cross-lingual document processing (with a case study for Bulgarian-English-Romanian language pairs) and brings evidence on the benefits of using linguistic ontologies for achieving, with a high level of accuracy, difficult tasks in NLP such as word alignment, word sense disambiguation, document classification, cross-language information retrieval, etc. We provide brief descriptions of the parallel corpus

A Cross-Lingual Romanian to English Question Answering System

This paper describes the development of a Question Answering (QA) system and its evaluation results in the Romanian-English cross-lingual track organized as part of the CLEF 2006 campaign. The development stages of the cross-lingual Question Answering system are described incrementally throughout the paper, at the same time pinpointing the problems that occurred and the way they were addressed. The system adheres to the classical architecture for

