Estonian Dialogue Corpus (EDiC)

Projects

Estonian Science Foundation

Estonian Ministry of Education and Research
National Programme for Estonian Language Technology
National Programme for Estonian and National Memory
National Programme for Estonian and National Culture


Content of the EDiC

  • Spoken human-human dialogues from the Corpus of Spoken Estonian of the University of Tartu
  • March 2012: 1182 transcribed texts, 246 000 running words (dialogue acts annotated)

    1. Information dialogues: 1137

  • Phone conversations 1026 (doctor-patient 6, institution 12, bus information 24, directory inquiries 581, sale 48, store 31, outpatients' office 90, travel agency 97, library 10, taxi 24, service 75, university 6, other 22)
  • Face-to-face conversations 111 (institution 1, bus information 1, store 63, outpatients' office 1, travel agency 10, library 1, guiding on street 19, service 12, other 3)

  • 2. Argumentation dialogues: 45 (phone 37, face-to-face 8)
  • Written  dialogues collected during computer simulations with the Wizard of Oz method (dialogue acts annotated): 22 dialogues, 2500 running words (collected in 2001, Maret Valdisoo), 95 dialogues, 12 000 running words (2009-2011, Siiri Pärkson)
  • Human-computer interactions  collected with  the dialogue systems Reisiagent, Teatriagent, Kinoagent (Margus Treumuth)

  • Corpus Annotation


    Corpus Workbench - password restricted (Margus Treumuth): choose a sub-corpus, choose a dialogue, show a a dialogue on a time axle, frequency table of words, morphological analysis, sequences of dialogue acts, automatic annotation of dialogue acts.

    People
    PhD students Master's students Bachelor's students
    Mare Koit 
    Haldur Õim 
    Tiit Hennoste
    Andriela Rääbis
    Margus Treumuth
    4th year
    Olga Gerassimenko
    Riina Kasterpalu
    Krista Mihkels (Strandson)
    Siiri Pärkson

    3th year

    2nd year
    Liina Eskor

    1st year
    Raul Sirel

    2nd year
    Sven Aller

    1st year
    Anti Torp

     

    Imre Purret

    Former students
    Joel Edenberg
    Mark Fišel
    Aleksei Ivanov
    Katrin Jets
    Taavet Kikas
    Katrin Lomp
    Helen Nigol
    Anni Oja
    Siim Orasmaa
    Anton Ragni
    Karol Toompalu
    Tarmo Truu
    Maret Valdisoo (Kullasaar)
    Evely Vutt (Nurmsalu)

    Some papers (see Estonian Research Portal for more publications)
    Developing Linguistic Corpora: a Guide to Good Practice.Ed.  Martin Wynne.
     


    Created February, 17, 2005
    Last modified March, 15, 2012
    < Group of Language Technology
    < University of Tartu