Mardi 24 juin:

Student session 9h00 Accueil
9h30 Robust processing of spoken situated dialogue
Pierre Lison (Saarbruecken) abstract
Recent years have seen increasing interest in service robots endowed with communicative capabilities. These robots must often operate in open-ended environments and interact with humans using natural language to perform a variety of service-oriented tasks. Yet, developing dialogue systems able to robustly and accurately interpret spoken utterances remains a significant challenge. This is notably due to the presence of numerous speech disfluencies (pauses, repetitions, repairs, etc.) and to the globally poor performance of speech recognition, especially when operating in noisy, real-world environments. For my master thesis, I'm developing techniques and algorithms to improve the robustness of spoken dialogue comprehension at various processing stages, from speech recognition to the semantic interpretation. In this talk, I will focus on one particular technique: a model for speech recognition in natural environments which relies on contextual information about salient entities to prime utterance recognition. The hypothesis underlying our approach is that, in situated human-robot interactions, the speech recognition performance can be significantly enhanced by exploiting knowledge about the immediate physical environment and the dialogue history. To this end, visual salience (objects perceived in the physical scene) and linguistic salience (previously referred expressions within the current dialogue) are integrated into a single cross-modal salience model. The model is dynamically updated as the environment evolves, and is used to establish expectations about uttered words which are most likely to be heard given the context. The update is realised by continuously adapting the word-class probabilities specified in the statistical language model. Our work draws upon insights from cognitive science, as we know that humans systematically exploit dialogue and situated context to guide attentional mechanisms and help disambiguate and refine linguistic input by filtering out unlikely interpretations. In this talk, I will discuss the motivations behind our approach, describe our implementation as part of a distributed, cognitive architecture for mobile robots, and report the evaluation results on a test suite, which show a statistically significant improvement on the recognition rate.
10h00 The use of Left-Dislocation in English Conversation
Trevor Benjamin (Saarbruecken)
10h30 Pause
11h00 Automated Semantic Classification of French Verbs
Ingrid Falk (Nancy) abstract
The aim of this work is to explore (semi-)automatic means to create a Levin-type classification of French verbs, suitable for Natural Language Processing. An extensive digital verb lexicon based on Levin's classification method for English is VerbNet. VerbNet systematically extends Levin's classes while ensuring that class members have a common semantics and share a common set of syntactic frames and thematic roles. We base our work on three available syntax lexicons for French: Volem, the Grammar-Lexicon (Ladl) and Dicovalence and investigate ways to reorganise the verbs in these resources into VerbNet-like verb classes using Formal Concept Analysis (FCA) techniques. We discuss possible evaluation schemes and finally focus on an evaluation methodology wrt. VerbNet, of which we present the theoretical motivation and analyse the feasibility on a small hand-built example.
11h30 Interaction Grammar for the Persian Language
Masood Ghayoomi abstract
In this research we want to use the Interaction Grammar to represent the construction of Persian noun phrases in trees. XMG is utilized for representing the constructions by using factorization and inherited hierarchy relations. Then with the help of XMG, we define IG by taking the advantages of polarities on the features, and tree descriptions for the various constructions that are introduced. Then we use Leopar for the graphical representations of the trees and parsing them. Finally we apply our test suites to the parser in order to check whether we have the correct parsing and representation of the phrases. The experimental result showed that we could parse the phrases successfully, even the most complex one which has various constructions in it.
12h00 Repas
14h00 Building an English Interaction Grammar
Jennifer Planul (Nancy) abstract
Interaction Grammar (IG) are a grammatical formalism based on the notion of polarity. Polarities express the resource sensitivity of natural languages by modeling the distinction between saturated and unsaturated syntactic structures. Syntactic composition is represented as a chemical reaction guided by the saturation of polarities. It is expressed in a model-theoritic framework where grammars are constraint systems using the notion of tree description and parsing appears as a process of building tree description models satisfying criteria of saturation and minimality. Following the idea of multilinguality, we built an IG for English from the actual Guy Perrier's French IG with XMG, trying to keep shared classes. I'll present our English IG and its results on the English TSNLP test suite, showing similarities and divergences found between the two languages.
14h30 XTAG using XMG
Katya Alahverdzhieva (Nancy) abstract
In this talk, I will present my Master's thesis, which is aimed at developing a core computational grammar for English within a framework specially designed for factorising tree-based grammar and grammars equipped with a syntax/semantic interface. I will begin by outlining the context for the specification of the grammar, namely the linguistic decisions that were taken and the language of description. This includes familiarising the audience with large-scale tree-adjoing grammars (our starting point for the current grammar development) and the language of description (which provides a high level of abstraction and allows for meta-grammar specification). Further on, I will report on the adopted methodology, which was scaled up by implementing an actual grammar. I will conclude by reporting on the results and the phenomena covered. I will also outline some possible directions for future work.
15h00 Pause
15h30 Acoustic Properties of Focus in English Interrogatives in Native and Non-Native Speech
Nadiya Yampolska (Nancy)
16h00 Toward evaluation of acoustic-to-articulatory speech inversion using a 3D articulograph and a 2D articulatory model
Mat Wilson (Nancy)
16h30 How to give a research talk
Patrick Blackburn

Mercredi 25 juin:

Fouille d'Erreurs dans des Ressources linguistiques 8h30 Mining the concept of Error Miming
Eric De La Clergerie (Alpage) abstract
I will show how a simple error mining algorithm running on corpus parsing results and coupled with an adapted interface may help to discover errors that are always present in large linguistic resources such as lexica and grammars. This work has since been completed by preliminary experiments to suggest corrections for potential lexical errors. The error mining algorithm can also be extended to deal with more complex cases of errors, for instance on form, lemma or category sequences that may suggest grammatical errors rather than lexical ones. Finally I will say a few words about the adaptation of the algorithm for knowledge acquisition tasks and the relationships of the algorithm with Expectation Maximization (EM) algorithm.
9h30 Computer-Aided Correction and Extension of a Syntactic Wide-Coverage Lexicon
Lionel Nicolas (Sophia Antipolis) abstract
Parsers based on manually created resources, namely a grammar and a morphologic and syntactic lexicon, rely on the quality of these resources. Thus, increasing the parser coverage and precision usually implies improving these two resources. If done manually, such a task is very difficult: because it is time consuming and complex, and because knowing which resource is the true culprit for a given mistake is not always obvious. Some techniques bring a convenient way to automatically identify forms having potentially erroneous entries in a lexicon. We have integrated and extended such kind of technique in a wider process in order to automatically provide lexical corrections, thanks to the grammar ability to tell how these forms could be used as part of correct parses. We present in this paper an implementation of this process and discuss the main results we have obtained on a syntactic wide-coverage French lexicon.
10h00 Pause
10h30 Error Mining for Linguistic Knowledge Acquisition with DELPH-IN Grammars
Yi Zhang (Saarbruecken) abstract
In this talk, I will present the applications of error-mining based techniques in two linguistic knowledge acquisition tasks with a large-scale HPSG for English. First, in the context of general purpose lexical acquisition, I will report the practical application of error-mining for the detection of missing lexical entries on the British National Corpus. As an extension to the standard error-mining technique, I will further report our recent chart-mining experiments for verb-particle acquisition, which extract indications of linguistic regularities from intermediate parsing results.
11h00 Enhancing Performance of Lexicalised Grammars: A Case Study with a German HPSG Grammar
Kostadin Cholakov (Saarbruecken) abstract
In this presentation we discuss linguistically oriented and motivated machine learning methods whose aim is to improve robustness of lexicalised grammars in real life applications. We use efficient error mining techniques to show clearly that the main hindrance to the robustness of systems using grammars like the aforementioned ones is low lexical coverage. To this effect, we develop linguistically-driven methods that use detailed morphosyntactic information to automatically improve the lexical coverage of such grammars maintaining their linguistic quality.We demonstrate these methods in a case study with a German grammar, developed within the framework of HPSG, and prove their efficiency and usability by applying them on real-life test data.
11h30 Mining Multiword Units for Performance Improvement of Lexicalised Grammars
Valia Kordoni (Saarbruecken) abstract
In the first part of the talk I focus on the linguistic properties of Multiword Expressions (MWEs), taking a closer look at their lexical, syntactic, as well as semantic characteristics. In the second part of the talk I focus on methods for the automatic acquisition of MWEs for robust grammar engineering. First I investigate the hypothesis that MWEs can be detected by the distinct statistical properties of their component words, regardless of their type, comparing various statistical measures, a procedure which leads to extremely interesting conclusions. I then investigate the influence of the size and quality of different corpora, using the BNC and the Web search engines Google and Yahoo. I conclude that, in terms of language usage, web generated corpora are fairly similar to more carefully built corpora, like the BNC, indicating that the lack of control and balance of these corpora are probably compensated by their size. Finally, I show a qualitative evaluation of the results of automatically adding extracted MWEs to existing linguistic resources. I argue that the process of the automatic addition of extracted MWEs to existing linguistic resources improves qualitatively, if a more compositional approach to the grammar/lexicon automated extension is adopted.
12h00 Test Suite Generation
Sylvain Schmitz (TALARIS) abstract
Just like other error mining techniques, mining for over-generation issues in a grammar requires a test suite in order to detect failures. We present work in progress that aims to generate such a test suite directly from the grammar, in order to explore its quirks and corner cases in an orderly manner.
12h30 Repas
Semantique et Inférence 14h00 Relating Nessie and Abstract Categorial Grammars
Sebastien Hinderer abstract
Nessie is a tool that computes semantic representations for complx syntactic units. It does so thanks to a lexicon that stores representations of words (described by lambda-terms) and an abstract syntax tree that guides the semantic construction process. Abstract Categorial Grammars (ACGs), on the other hand, are an abstracon of the syntax-semantics interface. They allow one to map a syntactic structure to a semantic representation. We will introduce these two notions and list some properties of ACGs explaining why they are interesting. The heart of the presentation will consist in showing that the languages Nessie can recognize are exactly those recognized by an ACG of order 2. We will conclude by deducing some interesting properties this result confers to Nessie.
14h30 Réécriture et Détection d'Implications Textuelles
Paul Bedaride
15h00 Tacit sensing in situated dialogue
Luciana Benotti abstract
When interlocutors are engaged in situated dialogue, their informational states evolve not only through dialogue acts but also through physical and sensing acts. All these acts can be performed either explicitly or tacitly during the interaction. However, even when acts are performed tacitly, often their execution can be inferred from subsequent acts. I present a model for the inference of tacit sensing acts in a non-traditional conversational system (a text-adventure game) using a non-traditional automated planner (an Artificial Intelligence planning system that is able to find plans when knowledge is incomplete).
15h30 Clause type identification vs. information packaging
Corinna Anderson abstract
This talk will explore the connections between the obligatory marking of clause types in natural language for interpretation (i.e. declarative, relative, interrogative, etc), and discourse-based variation in constituent order that does not affect the semantic content or type (i.e. information packaging). Most of the data will come from Nepali, an Indo-Aryan (Indo-European) language in which interrogative and relative clauses are marked independently (in both morphology and syntax), and all types of "wh-movement" appear to be optional. Finite interrogative and relative clauses in Nepali are distinguished by the type of "wh-phrase" that appears within the clause, and by the typical behavior of this element. The basic generalizations apply to a wide range of unrelated languages with similar properties, which would benefit from a common analysis. I propose that the lack of parallelism between relative and interrogative clauses in such languages can be straightforwardly explained by considering questions of information packaging (Topic and Focus, in particular) alongside syntactic requirements.
16h00 Pause
Carl Pollard

Jeudi 26 juin:

Grammaires d'Arbres Adjoints 9h00 Présentation de GerTT, une jeune grammaire TT-MCTAG de l'Allemand
Laura Kallmeyer, Timm Lichte (Tuebingen) abstract
GerTT est une grammar d'arbre de l'allemand, implantée actuellement au moyen d'une extension des grammaires d'arbres adjoints multi-composantes (MCTAG), extension appelée TT-MCTAG (MCTAG avec noeuds partagés et tuples d'arbres). Dans notre présentation, nous motiverons ce formalisme et donnerons les analyses pour différents aspects de la syntaxe de l'allemand. En particulier, nous nous concentrerons sur les constructions cohérentes, le brouillage et la complémentation. Nous montrerons également les limites de ces analyses. En outre, l'intégration des ressources lexicales sera abordée. Finallement, nous introduirons la sémantique sous-spécifiée inclue dans GerTT pour le calcul sémantique.
9h45 TuLiPA: un environnement d'analyse syntaxico-sémantique pour grammaires d'arbres
Yannick Parmentier abstract
Cette présentation vise à introduire le système d'analyse Tuebingen Linguistic Parsing Architecture (TuLiPA). TuLiPA se caractérise par l'utilisation des grammaires à concaténation d'intervalles comme formalisme pivot. La modularité induite par l'utilisation d'un formalisme pivot facilite l'extension du système d'analyse à d'autres formalismes (à l'heure actuelle, TuLiPA peut analyser des grammaires d'arbres adjoints - TAG -, des grammaires d'arbres adjoints à composantes multiples avec tuples d'arbres - TT-MCTAG -, et des grammaires à concaténation d'intervalles - RCG -). Dans ce contexte, nous présenterons en détail l'architecture modulaire de TuLiPA, ainsi que deux extensions apportées récemment : l'intégration d'un module de désambiguisation lexicale à la (Bonfante et al, 2004) et un calcul sémantique à la (Gardent et Kallmeyer, 2003). Nous donnerons également un apercu de l'environnement de développement de grammaires construit autour de TuLiPA. TuLiPA est un travail commun avec Johannes Dellert, Kilian Evang, Laura Kallmeyer, Timm Lichte et Wolfgang Maier.
10h30 Pause
11h00 Traitement des phénomènes de coordination et de contrôle au moyen des grammaires d'arbres adjoints multi-composantes : est ce réaliste ?
Djamé Seddah (Université Paris IV Sorbonne) abstract
Dans cette présentation, nous introduisons une formalisation pour différentes structures de coordination elliptiques dans le cadre des grammaires d'arbres adjoints à composantes multiples (MCTAG). De nombreux auteurs décrivent la coordination elliptique via des constructions parallèles où des dérivations symétriques peuvent être observées à partir d'une analyse prédicat-argument désirée. Nous montrons que les structures de coordination les plus célèbres, entre autres les constructions de type zeugma et les coordinations de catégorie différentes, peuvent être analysées simplement via l'addition à MCTAG d'un méchanisme synchrone simple. Ce méchanisme est réalisé en deux étapes: Premièrement, nous associons un arbre élémentaire non-réalisé à un arbre réalisé à l'intérieur d'un ensemble, puis nous ajoutons des liens de partage entre noeuds de dérivation. Ainsi, en utilisant le domaine de localité étendu fourni par MCTAG, la plupart des structures de coordination qui ne peuvent pas être analysées par les grammaires d'arbres adjoints lexicalisées recoivent une analyse MCTAG. Nous discuterons les possibilités d'implantation au moyen d'une factorisation d'arbres ainsi que les possibilités d'extension de cette analyse pour traiter les phénomène des verbes à contrôle.
11h45 Expériences d'analyse syntaxique statistique pour le Français
BenoÎt Crabbé, Marie Candito (Paris 7)
12h30 Repas
Grammaires d'Interaction 14h00 Les grammaires d'interaction : théorie et pratique
Bruno Guillaume abstract
Dans cette présentation, nous décrirons le formalismes des IG (Grammaires d'Interaction). Nous illustrerons notamment les spécificités du formalisme (polarités et structures sous-spécifiées) à l'aide d'exemples. Nous présenterons également l'analyseur syntaxique basée sur les IG qui est développé par l'équipe Calligramme.
14h45 Filtrage (supertagging) et robustesse
Jonathan Marchand abstract
Pour les grammaires lexicalisées comme les GIs, la désambiguisation lexicale est un sujet essentiel. Dans ce contexte, je montrerai une méthode originale pour filtrer les sélections lexicales basées sur les polarités et comment elle est utilisée dans LEOPAR. Puis, afin de traiter l'agrammaticalité, je présenterai une technique d'analyse syntaxique pour détecter les chunks d'un phrase.
15h30 Pause
16h00 Modélisation de la coordination et filtrage
Joseph Le Roux abstract
Au cours de cet exposé, nous présenterons une modélisation simple de la coordination qui s'étend directement aux cas de coordination de non-constituants et de coordination de groupes d'arguments. De plus, nous montrerons que nous pouvons tirer profit de la modélisation pour améliorer le filtrage par polarité.
16h45 La sémantique dans les GI
Mathieu Morey