Ruprecht-Karls-Universität Heidelberg

DGfS-CL/EXCITEMENT Fall School 2013

Course 3: Textual Entailment (Sebastian Pado, Tae-Gil Noh, Britta Zeller)


Contents

Textual Entailment captures a common sense notion of entailment between two natural language texts, P (premise) and H (hypothesis). It is defined to be true if "a human reading P would infer that H is most likely true" (Dagan et al. 2005).

The relevance of Textual Entailment lies in its promise to

  • subsume a substantial chunk of the semantic processing in a range of NLP tasks including IE, QA, MT Evaluation, and Summarization;
  • provide a notion of entailment that is not tied to a particular representation but provides a "common ground" for comparing and contrasting semantic processing mechanisms in an end-to-end setting.

Since 2005, there have been yearly shared tasks on Recognizing Textual Entailment (the "RTE challenges") which have pushed forward the regular annotation of new datasets, including "general-purpose" datasets as well as "task-specific" datasets. In turn, these datasets have led to the development of a wide range of processing algorithms.

An important realization was that in almost all cases, the quality of entailment recognition hinges on the quality of the available inference knowledge at the lexical level (poodle -> dog) as well as the phrasal level (X bought Y from Z -> Z sold Y to X). Such inference knowledge can be obtained from resources like WordNet and FrameNet, but since coverage is often a problem, methods to acquire such knowledge from other sources (typically large corpora) are of primary importance.

Goals

The first goal of this course is to give participants a detailed knowledge of the state-of-the-art theories and concepts in textual entailment recognition. This includes the typology of the major algorithmic approaches, relevant linguistic phenomena, applying textual entailment to NLP applications, and acquiring inference knowledge from various sources.

The second goal of the course is to provide participants with a concrete hands-on introduction to building and working with systems for recognizing textual entailment. This part of the course will consist mainly of lab sessions and will use the Java-based EXCITEMENT architecture which subsumes various proposed textual entailment algorithms and offers a multilingual, reusable, open-sourced platform for semantic processing that can serve as a basis for experiments with existing algorithms, extensions with new algorithms, or ports to new languages.

Prerequisites

Knowledge of basic concepts in computational linguistics and linguistics (on the level of "introduction to computational lingusitics" or "introduction to formal linguistics"). A working knowledge of Java certainly helps, too.

References

  • I. Androutsopoulos, P. Malakasiotis. A Survey of Paraphrasing and Textual Entailment Methods. Journal of Artificial Intelligence Research, 38, 135–187. 2010. PDF Link.
  • I. Dagan, O. Glickman, B. Magnini. The PASCAL Recognizing Textual Entailment Challenge. Proceedings of the 1st PASCAL RTE Workshop. 2005. Southampton, UK. PDF Link.
  • I. Dagan, D. Roth, F. Zanzotto. Recognizing Textual Entailment. 2012. Morgan & Claypool Publishers.
  • S. Pado and I. Dagan. Textual Entailment. In R. Mitkov (ed.): Oxford Handbook of Natural Language Processing. To appear 2013. OUP.
  • M. Sammons, V.G. Vydiswaran, and D. Roth. Recognizing Textual Entailment. In D. Bikel and I. Zitouni (eds.): Multilingual Natural Language Applications: From Theory to Practice. 2012. IBM Press.

More references will be added.

zum Seitenanfang