| Student session | 9h00 | Reception |
| 9h30 | Robust processing of spoken situated dialogue Pierre Lison (Saarbruecken) abstract
Recent years have seen increasing interest in service robots endowed
with communicative capabilities. These robots must often operate in
open-ended environments and interact with humans using natural language
to perform a variety of service-oriented tasks. Yet, developing
dialogue systems able to robustly and accurately interpret spoken
utterances remains a significant challenge. This is notably due to the
presence of numerous speech disfluencies (pauses, repetitions, repairs,
etc.) and to the globally poor performance of speech recognition,
especially when operating in noisy, real-world environments.
For my master thesis, I'm developing techniques and algorithms to
improve the robustness of spoken dialogue comprehension at various
processing stages, from speech recognition to the semantic
interpretation. In this talk, I will focus on one particular technique:
a model for speech recognition in natural environments which relies on
contextual information about salient entities to prime utterance
recognition.
The hypothesis underlying our approach is that, in situated human-robot
interactions, the speech recognition performance can be significantly
enhanced by exploiting knowledge about the immediate physical
environment and the dialogue history.
To this end, visual salience (objects perceived in the physical scene)
and linguistic salience (previously referred expressions within the
current dialogue) are integrated into a single cross-modal salience
model.
The model is dynamically updated as the environment evolves, and is used
to establish expectations about uttered words which are most likely to
be heard given the context. The update is realised by continuously
adapting the word-class probabilities specified in the statistical
language model.
Our work draws upon insights from cognitive science, as we know that
humans systematically exploit dialogue and situated context to guide
attentional mechanisms and help disambiguate and refine linguistic input
by filtering out unlikely interpretations.
In this talk, I will discuss the motivations behind our approach,
describe our implementation as part of a distributed, cognitive
architecture for mobile robots, and report the evaluation results on a
test suite, which show a statistically significant improvement on the
recognition rate.
|
|
| 10h00 | The use of Left-Dislocation in English Conversation Trevor Benjamin (Saarbruecken) |
|
| 10h30 | Break | |
| 11h00 | Automated Semantic Classification of French Verbs Ingrid Falk (Nancy) abstract
The aim of this work is to explore (semi-)automatic means to create
a Levin-type classification of French verbs, suitable for
Natural Language Processing. An extensive digital
verb lexicon based on Levin's classification method for English is
VerbNet. VerbNet systematically extends Levin's
classes while ensuring that class members have a common semantics
and share a common set of syntactic frames and thematic roles.
We base our work on three available syntax lexicons for French:
Volem, the Grammar-Lexicon (Ladl) and Dicovalence and investigate
ways to reorganise the verbs in these resources into VerbNet-like
verb classes using Formal Concept Analysis (FCA) techniques.
We discuss possible evaluation schemes and finally focus on an
evaluation methodology wrt. VerbNet, of which we present the
theoretical motivation and analyse the feasibility on a small
hand-built example.
|
|
| 11h30 | Interaction Grammar for the Persian Language Masood Ghayoomi abstract
In this research we want to use the Interaction Grammar to represent the construction of Persian noun phrases in trees. XMG is utilized for representing the constructions by using factorization and inherited hierarchy relations. Then with the help of XMG, we define IG by taking the advantages of polarities on the features, and tree descriptions for the various constructions that are introduced. Then we use Leopar for the graphical representations of the trees and parsing them. Finally we apply our test suites to the parser in order to check whether we have the correct parsing and representation of the phrases. The experimental result showed that we could parse the phrases successfully, even the most complex one which has various constructions in it.
|
|
| 12h00 | Lunch | |
| 14h00 | Building an English Interaction Grammar Jennifer Planul (Nancy) abstract
Interaction Grammar (IG) are a grammatical formalism based on the notion of polarity.
Polarities express the resource sensitivity of natural languages by modeling the distinction between saturated and unsaturated syntactic structures.
Syntactic composition is represented as a chemical reaction guided by the saturation of polarities. It is expressed in a model-theoritic framework where grammars are constraint systems using the notion of tree description and parsing appears as a process of building tree description models satisfying criteria of saturation and minimality.
Following the idea of multilinguality, we built an IG for English from the actual Guy Perrier's French IG with XMG, trying to keep shared classes.
I'll present our English IG and its results on the English TSNLP test suite, showing similarities and divergences found between the two languages.
|
|
| 14h30 | XTAG using XMG Katya Alahverdzhieva (Nancy) abstract
In this talk, I will present my Master's thesis, which is aimed at developing a core computational grammar for English within a framework specially designed for factorising tree-based grammar and grammars equipped with a syntax/semantic interface. I will begin by outlining the context for the specification of the grammar, namely the linguistic decisions that were taken and the language of description. This includes familiarising the audience with large-scale tree-adjoing grammars (our starting point for the current grammar development) and the language of description (which provides a high level of abstraction and allows for meta-grammar specification). Further on, I will report on the adopted methodology, which was scaled up by implementing an actual grammar. I will conclude by reporting on the results and the phenomena covered. I will also outline some possible directions for future work.
|
|
| 15h00 | Break | |
| 15h30 | Acoustic Properties of Focus in English Interrogatives in Native and Non-Native Speech Nadiya Yampolska (Nancy) |
|
| 16h00 | Toward evaluation of acoustic-to-articulatory speech inversion using a 3D articulograph and a 2D articulatory model Mat Wilson (Nancy) |
|
| 16h30 | How to give a research talk Patrick Blackburn |
| Error Mining in Linguistic Resources | 8h30 | Mining the concept of Error Mining Eric De La Clergerie (Alpage) abstract
I will show how a simple error mining algorithm running on corpus
parsing results and coupled with an adapted interface may help to
discover errors that are always present in large linguistic resources
such as lexica and grammars. This work has since been completed by
preliminary experiments to suggest corrections for potential lexical
errors. The error mining algorithm can also be extended to deal with
more complex cases of errors, for instance on form, lemma or category
sequences that may suggest grammatical errors rather than lexical ones.
Finally I will say a few words about the adaptation of the algorithm for
knowledge acquisition tasks and the relationships of the algorithm with
Expectation Maximization (EM) algorithm.
|
| 9h30 | Computer-Aided Correction and Extension of a Syntactic
Wide-Coverage Lexicon Lionel Nicolas (Sophia Antipolis) abstract
Parsers based on manually created resources, namely a grammar and a
morphologic and syntactic lexicon, rely on the quality of these
resources. Thus, increasing the parser coverage and precision usually
implies improving these two resources. If done manually, such a task is
very difficult: because it is time consuming and complex, and because
knowing which resource is the true culprit for a given mistake
is not always obvious.
Some techniques bring a convenient way to automatically identify forms
having potentially erroneous entries in a lexicon. We have integrated
and extended such kind of technique in a wider process in order to
automatically provide lexical corrections, thanks to the grammar
ability to tell how these forms could be used as part of correct parses.
We present in this paper an implementation of this process and discuss the
main results we have obtained on a syntactic wide-coverage French
lexicon.
|
|
| 10h00 | Break | |
| 10h30 | Error Mining for Linguistic Knowledge Acquisition with DELPH-IN Grammars Yi Zhang (Saarbruecken) abstract
In this talk, I will present the applications of error-mining based
techniques in two linguistic knowledge acquisition tasks with a
large-scale HPSG for English. First, in the context of general purpose
lexical acquisition, I will report the practical application of
error-mining for the detection of missing lexical entries on the
British National Corpus. As an extension to the standard error-mining
technique, I will further report our recent chart-mining experiments
for verb-particle acquisition, which extract indications of linguistic
regularities from intermediate parsing results.
|
|
| 11h00 | Enhancing Performance of Lexicalised Grammars: A Case Study with a German HPSG Grammar Kostadin Cholakov (Saarbruecken) abstract
In this presentation we discuss linguistically oriented and motivated machine learning methods whose aim is to improve robustness of lexicalised grammars in real life applications. We use efficient error mining techniques to show clearly that the main hindrance to the robustness of systems using grammars like the aforementioned ones is low lexical coverage. To this effect, we develop linguistically-driven methods that use detailed morphosyntactic information to automatically improve the lexical coverage of such grammars maintaining their linguistic quality.We demonstrate these methods in a case study with a German grammar, developed within the framework of HPSG, and prove their efficiency and usability by applying them on real-life test data.
|
|
| 11h30 | Mining Multiword Units for Performance Improvement of Lexicalised Grammars Valia Kordoni (Saarbruecken) abstract
In the first part of the talk I focus on the linguistic properties of
Multiword Expressions (MWEs), taking a closer look at their lexical,
syntactic, as well as semantic characteristics.
In the second part of the talk I focus on methods for the automatic
acquisition of MWEs for robust grammar engineering. First I investigate
the hypothesis that MWEs can be detected by the distinct statistical
properties of their component words, regardless of their type, comparing
various statistical measures, a procedure which leads to extremely
interesting conclusions. I then investigate the influence of the size
and quality of different corpora, using the BNC and the Web search
engines Google and Yahoo. I conclude that, in terms of language usage,
web generated corpora are fairly similar to more carefully built
corpora, like the BNC, indicating that the lack of control and balance
of these corpora are probably compensated by their size. Finally, I show
a qualitative evaluation of the results of automatically adding
extracted MWEs to existing linguistic resources. I argue that the
process of the automatic addition of extracted MWEs to existing
linguistic resources improves qualitatively, if a more compositional
approach to the grammar/lexicon automated extension is adopted.
|
|
| 12h00 | Test Suite Generation Sylvain Schmitz (TALARIS) abstract
Just like other error mining techniques, mining for over-generation
issues in a grammar requires a test suite in order to detect failures.
We present work in progress that aims to generate such a test suite
directly from the grammar, in order to explore its quirks and corner
cases in an orderly manner.
|
|
| 12h30 | Lunch | |
| Semantic and Inference | 14h00 | Relating Nessie and Abstract Categorial Grammars Sebastien Hinderer abstract
Nessie is a tool that computes semantic representations for
complx syntactic units. It does so thanks to a lexicon that stores
representations of words (described by lambda-terms) and an
abstract syntax tree that guides the semantic construction process.
Abstract Categorial Grammars (ACGs), on the other hand, are an abstracon
of the syntax-semantics interface. They allow one to map a syntactic
structure to a semantic representation.
We will introduce these two notions and list some properties of ACGs
explaining why they are interesting.
The heart of the presentation will consist in showing that the languages
Nessie can recognize are exactly those recognized by an ACG of order 2.
We will conclude by deducing some interesting properties this result
confers to Nessie.
|
| 14h30 | Rewriting and Recognising Textual Entailment Paul Bedaride |
|
| 15h00 | Tacit sensing in situated dialogue Luciana Benotti abstract
When interlocutors are engaged in situated dialogue, their informational
states evolve not only through dialogue acts but also through physical
and sensing acts. All these acts can be performed either explicitly or
tacitly during the interaction. However, even when acts are performed
tacitly, often their execution can be inferred from subsequent acts.
I present a model for the inference of tacit sensing acts in a
non-traditional conversational system (a text-adventure game) using a
non-traditional automated planner (an Artificial Intelligence planning
system that is able to find plans when knowledge is incomplete).
|
|
| 15h30 | Clause type identification vs. information packaging Corinna Anderson abstract
This talk will explore the connections between the obligatory marking of clause types in natural language for interpretation (i.e. declarative, relative, interrogative, etc), and discourse-based variation in constituent order that does not affect the semantic content or type (i.e. information packaging).
Most of the data will come from Nepali, an Indo-Aryan (Indo-European) language in which interrogative and relative clauses are marked independently (in both morphology and syntax), and all types of "wh-movement" appear to be optional. Finite interrogative and relative clauses in Nepali are distinguished by the type of "wh-phrase" that appears within the clause, and by the typical behavior of this element. The basic generalizations apply to a wide range of unrelated languages with similar properties, which would benefit from a common analysis.
I propose that the lack of parallelism between relative and interrogative clauses in such languages can be straightforwardly explained by considering questions of information packaging (Topic and Focus, in particular) alongside syntactic requirements.
|
|
| 16h00 | Break | |
| 16h30 | Carl Pollard |
| Tree-Adjoining Grammars | 9h00 | Introducing GerTT, a young TT-MCTAG for German Laura Kallmeyer, Timm Lichte (Tuebingen) abstract
GerTT is a TAG-based grammar for German, that is currently being implemented
using an extension of multicomponent TAG (MCTAG), called TT-MCTAG (MCTAG
with shared nodes and tree tuples). In our talk, we will motivate this
formalism of choice and sketch the analyses of several aspects of German
syntax. In particular, we will concentrate on coherent constructions and
scrambling and on complementation. We will also point out some limitations
of these analyses. Furthermore, the integration of lexical ressources will
also be considered. Finally, we will mention the undespecified semantics for
semantic processing, that is included in GerTT.
|
| 9h45 | TuLiPA: a syntactic and semantic parsing environment for tree-based grammars Yannick Parmentier abstract
This talk introduces the Tuebingen Linguistic Parsing Architecture
(TuLiPA). TuLiPA distinguishes itself from other parsing environments
via the use of Range Concatenation Grammar as a pivot formalism. The
modularity brought by using a pivot formalism facilitates the extension
of the parsing system to other syntactic formalisms (currently TuLiPA
supports Tree-Adjoining Grammar, Multi-Component Tree-Adjoining Grammar
with Tree Tuples, and Range Concatenation Grammar). In this context, we
will present TuLiPA's modular architecture, along with two extensions
implemented recently: a lexical disambiguation module à la (Bonfante et
al, 2004) and a semantic calculus module à la (Gardent and Kallmeyer,
2003). We will also give an overview of the environment for designing
grammars built on TuLiPA. TuLiPA is a joint work with Johannes Dellert,
Kilian Evang, Laura Kallmeyer, Timm Lichte and Wolfgang Maier.
|
|
| 10h30 | Break | |
| 11h00 | Coordination and Control processing using Multi-Component TAG grammar: is this realistic? Djamé Seddah (Université Paris IV Sorbonne) abstract
In this presentation, we introduce a formalization of various elliptical coordination structures within the Multi-Component Tree Adjunct Grammar framework.
Numerous authors describe elliptic coordination as parallel constructions where symetric derivations can be observed from a desired predicate-argument structure analysis. We show that most famous coordinate structures, including zeugma constructions and coordination of unlike category, can be analyzed simply with the addition of a simple synchronous mechanism to the MCTAG framework . This mechanism consists of two steps :
First we associate an unrealized elementary tree to a realized one inside a tree set, then we add shared links between derivation nodes. Thus using the extended domain of locality provided by MCTAGs most of coordinate structures which cannot be analyzed by Lexicalized Tree adjunct Gramar are given a proper MC-TAG analysis.
We will discuss possible implemenation using tree factorization and discuss some possible extensions to this analysis to deal with control-verb phenomenon.
|
|
| 11h45 | Experiments with statistic parsing of French BenoÎt Crabbé, Marie Candito (Paris 7) |
|
| 12h30 | Lunch | |
| Interaction Grammars | 14h00 | Interaction Grammars: theory and practice Bruno Guillaume abstract
In this talk, we will give an overview of the IG (Interaction Grammars)
formalism. The specificities of the formalism (polarization and
underspecified structures) will be illustrated through examples.
The parser based on IG and developped by the Calligramme team will also
be presented.
|
| 14h45 | Filtrage (supertagging) et robustesse Jonathan Marchand abstract
In lexicalized grammars like the IGs, lexical disambiguation is an
important issue. In this context, I'll show an original method for
filtering lexical selections based on polarities and how it's used in
LEOPAR. Next, to deal with agrammaticality, I'll present a parsing
technique for detecting small chunks in a sentence.
|
|
| 15h30 | Break | |
| 16h00 | Coordination in IG : syntax and supertagging Joseph Le Roux abstract
In this talk, we will present a simple account of coordination in
Interaction Grammars that can be extended to non-constituent
coordinations and cluster-argument coordinations. Furthermore, we will
show how this account can give a simple method that improves polarity
filtering for supertagging.
|
|
| 16h45 | Semantic in IG Mathieu Morey |