Estonian Dialogue Corpus (EDiC)
Projects
Estonian Science Foundation
- GMTAT9124 (2012-2014) Modelling a Conversational Agent and the
Estonian Dialogue Corpus (Suhtlusagendi
modelleerimine ja Eesti dialoogikorpus)
- GMTAT7503 (2008-2011) Conversation Strategies in a Conversation
Model
(Suhtlusstrateegiad suhtlusmudelis)
-
GMTAT5685 (2004-2007) Modelling a Conversational Agent
(Konversatsiooniagendi
modelleerimine: eestikeelse dialoogi automaattöötluse teoreetilised
ja rakenduslikud probleemid)
-
GMTAT4555 (2001-2003) Modelling Estonian Dialogue on the
Computer (Eestikeelse dialoogi
modelleerimine arvutil)
Estonian Ministry of Education and Research
National Programme for Estonian Language Technology
- (2011-2013) Pragmatic Analyzer of Estonian Dialogue (Eestikeelse
dialoogi pragmaatika analüsaator)
-
(2009-2010) Intelligent user interface for data bases (Intelligentne
kasutajaliides andmebaasidele)
-
(2006-2008) Interactive information retrieval system
for Estonian (Eestikeelne infodialoog
arvutiga)
National Programme for Estonian and National Memory
-
(2004-2007) Interactive information retrieval system
for Estonian (Eestikeelne infodialoog
arvutiga)
-
(2003) Dialogue corpus as a basis of user interface (Dialoogikorpus kui
eestikeelse kasutajaliidese alus)
National Programme for Estonian and National Culture
-
(2002) Building an annotated dialogue corpus on the basis of the corpus
of spoken Estonian (Märgendatud dialoogikorpuse loomine eesti suulise
kõne korpuse baasil)
Content of the EDiC
Spoken human-human dialogues from the Corpus of Spoken Estonian
of the University of Tartu
March 2012: 1182 transcribed texts, 246 000 running words
(dialogue acts annotated)
1. Information dialogues: 1137
Phone conversations 1026
(doctor-patient 6,
institution 12,
bus information 24,
directory inquiries 581,
sale 48,
store 31,
outpatients' office 90,
travel agency 97,
library 10,
taxi 24,
service 75,
university 6,
other 22)
Face-to-face conversations 111
(institution 1,
bus information 1,
store 63,
outpatients' office 1,
travel agency 10,
library 1,
guiding on street 19,
service 12,
other 3)
2. Argumentation dialogues: 45
(phone 37,
face-to-face 8)
Written dialogues collected during computer simulations with
the
Wizard of Oz method (dialogue acts annotated):
22 dialogues, 2500 running words (collected in 2001, Maret Valdisoo),
95 dialogues, 12 000 running words (2009-2011, Siiri Pärkson)
Human-computer interactions collected
with the dialogue systems
Reisiagent, Teatriagent,
Kinoagent
(Margus Treumuth)
Corpus Annotation
-
Morphological Analysis
-
Syntactic Analysis
-
Dialogue Acts
-
Communicative Strategies (annotated by Liina Eskor in 20 WOZ dialogues and
20 human-human
spoken dialogues) Example
Corpus Workbench -
password restricted (Margus Treumuth): choose a sub-corpus, choose a
dialogue, show a a
dialogue on a time axle, frequency table of words, morphological
analysis, sequences of dialogue acts, automatic annotation of dialogue
acts.
People
|
PhD students |
Master's students |
Bachelor's students |
Mare Koit
Haldur Õim
Tiit Hennoste
Andriela Rääbis
Margus Treumuth
|
4th year
Olga Gerassimenko
Riina Kasterpalu
Krista Mihkels (Strandson)
Siiri Pärkson
3th year
2nd year
Liina Eskor
1st year
Raul Sirel
|
2nd year
Sven Aller
1st year
Anti Torp
|
Imre Purret
|
Former students
Joel Edenberg
Mark Fišel
Aleksei Ivanov
Katrin Jets
Taavet Kikas
Katrin Lomp
Helen Nigol
Anni Oja
Siim Orasmaa
Anton Ragni
Karol Toompalu
Tarmo Truu
Maret Valdisoo (Kullasaar)
Evely Vutt (Nurmsalu)
Some papers
(see Estonian Research Portal for more
publications)
Developing
Linguistic Corpora: a Guide to Good Practice.Ed. Martin Wynne.
Created February, 17, 2005
Last modified March, 15, 2012
< Group of Language
Technology
<
University of Tartu