Tuesday 24th of June:

Student session 9h00 Reception
9h30 Robust processing of spoken situated dialogue
Pierre Lison (Saarbruecken) abstract
Recent years have seen increasing interest in service robots endowed with communicative capabilities. These robots must often operate in open-ended environments and interact with humans using natural language to perform a variety of service-oriented tasks. Yet, developing dialogue systems able to robustly and accurately interpret spoken utterances remains a significant challenge. This is notably due to the presence of numerous speech disfluencies (pauses, repetitions, repairs, etc.) and to the globally poor performance of speech recognition, especially when operating in noisy, real-world environments. For my master thesis, I'm developing techniques and algorithms to improve the robustness of spoken dialogue comprehension at various processing stages, from speech recognition to the semantic interpretation. In this talk, I will focus on one particular technique: a model for speech recognition in natural environments which relies on contextual information about salient entities to prime utterance recognition. The hypothesis underlying our approach is that, in situated human-robot interactions, the speech recognition performance can be significantly enhanced by exploiting knowledge about the immediate physical environment and the dialogue history. To this end, visual salience (objects perceived in the physical scene) and linguistic salience (previously referred expressions within the current dialogue) are integrated into a single cross-modal salience model. The model is dynamically updated as the environment evolves, and is used to establish expectations about uttered words which are most likely to be heard given the context. The update is realised by continuously adapting the word-class probabilities specified in the statistical language model. Our work draws upon insights from cognitive science, as we know that humans systematically exploit dialogue and situated context to guide attentional mechanisms and help disambiguate and refine linguistic input by filtering out unlikely interpretations. In this talk, I will discuss the motivations behind our approach, describe our implementation as part of a distributed, cognitive architecture for mobile robots, and report the evaluation results on a test suite, which show a statistically significant improvement on the recognition rate.
10h00 The use of Left-Dislocation in English Conversation
Trevor Benjamin (Saarbruecken)
10h30 Break
11h00 Automated Semantic Classification of French Verbs
Ingrid Falk (Nancy) abstract
The aim of this work is to explore (semi-)automatic means to create a Levin-type classification of French verbs, suitable for Natural Language Processing. An extensive digital verb lexicon based on Levin's classification method for English is VerbNet. VerbNet systematically extends Levin's classes while ensuring that class members have a common semantics and share a common set of syntactic frames and thematic roles. We base our work on three available syntax lexicons for French: Volem, the Grammar-Lexicon (Ladl) and Dicovalence and investigate ways to reorganise the verbs in these resources into VerbNet-like verb classes using Formal Concept Analysis (FCA) techniques. We discuss possible evaluation schemes and finally focus on an evaluation methodology wrt. VerbNet, of which we present the theoretical motivation and analyse the feasibility on a small hand-built example.
11h30 Interaction Grammar for the Persian Language
Masood Ghayoomi abstract
In this research we want to use the Interaction Grammar to represent the construction of Persian noun phrases in trees. XMG is utilized for representing the constructions by using factorization and inherited hierarchy relations. Then with the help of XMG, we define IG by taking the advantages of polarities on the features, and tree descriptions for the various constructions that are introduced. Then we use Leopar for the graphical representations of the trees and parsing them. Finally we apply our test suites to the parser in order to check whether we have the correct parsing and representation of the phrases. The experimental result showed that we could parse the phrases successfully, even the most complex one which has various constructions in it.
12h00 Lunch
14h00 Building an English Interaction Grammar
Jennifer Planul (Nancy) abstract
Interaction Grammar (IG) are a grammatical formalism based on the notion of polarity. Polarities express the resource sensitivity of natural languages by modeling the distinction between saturated and unsaturated syntactic structures. Syntactic composition is represented as a chemical reaction guided by the saturation of polarities. It is expressed in a model-theoritic framework where grammars are constraint systems using the notion of tree description and parsing appears as a process of building tree description models satisfying criteria of saturation and minimality. Following the idea of multilinguality, we built an IG for English from the actual Guy Perrier's French IG with XMG, trying to keep shared classes. I'll present our English IG and its results on the English TSNLP test suite, showing similarities and divergences found between the two languages.
14h30 XTAG using XMG
Katya Alahverdzhieva (Nancy) abstract
In this talk, I will present my Master's thesis, which is aimed at developing a core computational grammar for English within a framework specially designed for factorising tree-based grammar and grammars equipped with a syntax/semantic interface. I will begin by outlining the context for the specification of the grammar, namely the linguistic decisions that were taken and the language of description. This includes familiarising the audience with large-scale tree-adjoing grammars (our starting point for the current grammar development) and the language of description (which provides a high level of abstraction and allows for meta-grammar specification). Further on, I will report on the adopted methodology, which was scaled up by implementing an actual grammar. I will conclude by reporting on the results and the phenomena covered. I will also outline some possible directions for future work.
15h00 Break
15h30 Acoustic Properties of Focus in English Interrogatives in Native and Non-Native Speech
Nadiya Yampolska (Nancy)
16h00 Toward evaluation of acoustic-to-articulatory speech inversion using a 3D articulograph and a 2D articulatory model
Mat Wilson (Nancy)
16h30 How to give a research talk
Patrick Blackburn

Wednesday 25th of June:

Error Mining in Linguistic Resources 8h30 Mining the concept of Error Mining
Eric De La Clergerie (Alpage) abstract
I will show how a simple error mining algorithm running on corpus parsing results and coupled with an adapted interface may help to discover errors that are always present in large linguistic resources such as lexica and grammars. This work has since been completed by preliminary experiments to suggest corrections for potential lexical errors. The error mining algorithm can also be extended to deal with more complex cases of errors, for instance on form, lemma or category sequences that may suggest grammatical errors rather than lexical ones. Finally I will say a few words about the adaptation of the algorithm for knowledge acquisition tasks and the relationships of the algorithm with Expectation Maximization (EM) algorithm.
9h30 Computer-Aided Correction and Extension of a Syntactic Wide-Coverage Lexicon
Lionel Nicolas (Sophia Antipolis) abstract
Parsers based on manually created resources, namely a grammar and a morphologic and syntactic lexicon, rely on the quality of these resources. Thus, increasing the parser coverage and precision usually implies improving these two resources. If done manually, such a task is very difficult: because it is time consuming and complex, and because knowing which resource is the true culprit for a given mistake is not always obvious. Some techniques bring a convenient way to automatically identify forms having potentially erroneous entries in a lexicon. We have integrated and extended such kind of technique in a wider process in order to automatically provide lexical corrections, thanks to the grammar ability to tell how these forms could be used as part of correct parses. We present in this paper an implementation of this process and discuss the main results we have obtained on a syntactic wide-coverage French lexicon.
10h00 Break
10h30 Error Mining for Linguistic Knowledge Acquisition with DELPH-IN Grammars
Yi Zhang (Saarbruecken) abstract
In this talk, I will present the applications of error-mining based techniques in two linguistic knowledge acquisition tasks with a large-scale HPSG for English. First, in the context of general purpose lexical acquisition, I will report the practical application of error-mining for the detection of missing lexical entries on the British National Corpus. As an extension to the standard error-mining technique, I will further report our recent chart-mining experiments for verb-particle acquisition, which extract indications of linguistic regularities from intermediate parsing results.
11h00 Enhancing Performance of Lexicalised Grammars: A Case Study with a German HPSG Grammar
Kostadin Cholakov (Saarbruecken) abstract
In this presentation we discuss linguistically oriented and motivated machine learning methods whose aim is to improve robustness of lexicalised grammars in real life applications. We use efficient error mining techniques to show clearly that the main hindrance to the robustness of systems using grammars like the aforementioned ones is low lexical coverage. To this effect, we develop linguistically-driven methods that use detailed morphosyntactic information to automatically improve the lexical coverage of such grammars maintaining their linguistic quality.We demonstrate these methods in a case study with a German grammar, developed within the framework of HPSG, and prove their efficiency and usability by applying them on real-life test data.
11h30 Mining Multiword Units for Performance Improvement of Lexicalised Grammars
Valia Kordoni (Saarbruecken) abstract
In the first part of the talk I focus on the linguistic properties of Multiword Expressions (MWEs), taking a closer look at their lexical, syntactic, as well as semantic characteristics. In the second part of the talk I focus on methods for the automatic acquisition of MWEs for robust grammar engineering. First I investigate the hypothesis that MWEs can be detected by the distinct statistical properties of their component words, regardless of their type, comparing various statistical measures, a procedure which leads to extremely interesting conclusions. I then investigate the influence of the size and quality of different corpora, using the BNC and the Web search engines Google and Yahoo. I conclude that, in terms of language usage, web generated corpora are fairly similar to more carefully built corpora, like the BNC, indicating that the lack of control and balance of these corpora are probably compensated by their size. Finally, I show a qualitative evaluation of the results of automatically adding extracted MWEs to existing linguistic resources. I argue that the process of the automatic addition of extracted MWEs to existing linguistic resources improves qualitatively, if a more compositional approach to the grammar/lexicon automated extension is adopted.
12h00 Test Suite Generation
Sylvain Schmitz (TALARIS) abstract
Just like other error mining techniques, mining for over-generation issues in a grammar requires a test suite in order to detect failures. We present work in progress that aims to generate such a test suite directly from the grammar, in order to explore its quirks and corner cases in an orderly manner.
12h30 Lunch
Semantic and Inference 14h00 Relating Nessie and Abstract Categorial Grammars
Sebastien Hinderer abstract
Nessie is a tool that computes semantic representations for complx syntactic units. It does so thanks to a lexicon that stores representations of words (described by lambda-terms) and an abstract syntax tree that guides the semantic construction process. Abstract Categorial Grammars (ACGs), on the other hand, are an abstracon of the syntax-semantics interface. They allow one to map a syntactic structure to a semantic representation. We will introduce these two notions and list some properties of ACGs explaining why they are interesting. The heart of the presentation will consist in showing that the languages Nessie can recognize are exactly those recognized by an ACG of order 2. We will conclude by deducing some interesting properties this result confers to Nessie.
14h30 Rewriting and Recognising Textual Entailment
Paul Bedaride
15h00 Tacit sensing in situated dialogue
Luciana Benotti abstract
When interlocutors are engaged in situated dialogue, their informational states evolve not only through dialogue acts but also through physical and sensing acts. All these acts can be performed either explicitly or tacitly during the interaction. However, even when acts are performed tacitly, often their execution can be inferred from subsequent acts. I present a model for the inference of tacit sensing acts in a non-traditional conversational system (a text-adventure game) using a non-traditional automated planner (an Artificial Intelligence planning system that is able to find plans when knowledge is incomplete).
15h30 Clause type identification vs. information packaging
Corinna Anderson abstract
This talk will explore the connections between the obligatory marking of clause types in natural language for interpretation (i.e. declarative, relative, interrogative, etc), and discourse-based variation in constituent order that does not affect the semantic content or type (i.e. information packaging). Most of the data will come from Nepali, an Indo-Aryan (Indo-European) language in which interrogative and relative clauses are marked independently (in both morphology and syntax), and all types of "wh-movement" appear to be optional. Finite interrogative and relative clauses in Nepali are distinguished by the type of "wh-phrase" that appears within the clause, and by the typical behavior of this element. The basic generalizations apply to a wide range of unrelated languages with similar properties, which would benefit from a common analysis. I propose that the lack of parallelism between relative and interrogative clauses in such languages can be straightforwardly explained by considering questions of information packaging (Topic and Focus, in particular) alongside syntactic requirements.
16h00 Break
Carl Pollard

Thursday 26th June:

Tree-Adjoining Grammars 9h00 Introducing GerTT, a young TT-MCTAG for German
Laura Kallmeyer, Timm Lichte (Tuebingen) abstract
GerTT is a TAG-based grammar for German, that is currently being implemented using an extension of multicomponent TAG (MCTAG), called TT-MCTAG (MCTAG with shared nodes and tree tuples). In our talk, we will motivate this formalism of choice and sketch the analyses of several aspects of German syntax. In particular, we will concentrate on coherent constructions and scrambling and on complementation. We will also point out some limitations of these analyses. Furthermore, the integration of lexical ressources will also be considered. Finally, we will mention the undespecified semantics for semantic processing, that is included in GerTT.
9h45 TuLiPA: a syntactic and semantic parsing environment for tree-based grammars
Yannick Parmentier abstract
This talk introduces the Tuebingen Linguistic Parsing Architecture (TuLiPA). TuLiPA distinguishes itself from other parsing environments via the use of Range Concatenation Grammar as a pivot formalism. The modularity brought by using a pivot formalism facilitates the extension of the parsing system to other syntactic formalisms (currently TuLiPA supports Tree-Adjoining Grammar, Multi-Component Tree-Adjoining Grammar with Tree Tuples, and Range Concatenation Grammar). In this context, we will present TuLiPA's modular architecture, along with two extensions implemented recently: a lexical disambiguation module à la (Bonfante et al, 2004) and a semantic calculus module à la (Gardent and Kallmeyer, 2003). We will also give an overview of the environment for designing grammars built on TuLiPA. TuLiPA is a joint work with Johannes Dellert, Kilian Evang, Laura Kallmeyer, Timm Lichte and Wolfgang Maier.
10h30 Break
11h00 Coordination and Control processing using Multi-Component TAG grammar: is this realistic?
Djamé Seddah (Université Paris IV Sorbonne) abstract
In this presentation, we introduce a formalization of various elliptical coordination structures within the Multi-Component Tree Adjunct Grammar framework. Numerous authors describe elliptic coordination as parallel constructions where symetric derivations can be observed from a desired predicate-argument structure analysis. We show that most famous coordinate structures, including zeugma constructions and coordination of unlike category, can be analyzed simply with the addition of a simple synchronous mechanism to the MCTAG framework . This mechanism consists of two steps : First we associate an unrealized elementary tree to a realized one inside a tree set, then we add shared links between derivation nodes. Thus using the extended domain of locality provided by MCTAGs most of coordinate structures which cannot be analyzed by Lexicalized Tree adjunct Gramar are given a proper MC-TAG analysis. We will discuss possible implemenation using tree factorization and discuss some possible extensions to this analysis to deal with control-verb phenomenon.
11h45 Experiments with statistic parsing of French
BenoÎt Crabbé, Marie Candito (Paris 7)
12h30 Lunch
Interaction Grammars 14h00 Interaction Grammars: theory and practice
Bruno Guillaume abstract
In this talk, we will give an overview of the IG (Interaction Grammars) formalism. The specificities of the formalism (polarization and underspecified structures) will be illustrated through examples. The parser based on IG and developped by the Calligramme team will also be presented.
14h45 Filtrage (supertagging) et robustesse
Jonathan Marchand abstract
In lexicalized grammars like the IGs, lexical disambiguation is an important issue. In this context, I'll show an original method for filtering lexical selections based on polarities and how it's used in LEOPAR. Next, to deal with agrammaticality, I'll present a parsing technique for detecting small chunks in a sentence.
15h30 Break
16h00 Coordination in IG : syntax and supertagging
Joseph Le Roux abstract
In this talk, we will present a simple account of coordination in Interaction Grammars that can be extended to non-constituent coordinations and cluster-argument coordinations. Furthermore, we will show how this account can give a simple method that improves polarity filtering for supertagging.
16h45 Semantic in IG
Mathieu Morey