Ondrej Dusek

I'm an Assistant Professor at Charles University, Prague, working on various aspects of neural text generation, with focus on dialogue systems and factual accuracy. I previously did research at Heriot-Watt University in Edinburgh, working on natural language generation evaluation and improvement. My work focuses on making language generation systems more reliable and truthful while maintaining fluent outputs.

I lead research on data-to-text generation, dialogue response generation, and text style transfer. I'm particularly interested in methods for controlling generation output and ensuring semantic accuracy. I've contributed to creating several widely used datasets and evaluation methods, such as the E2E NLG Challenge dataset and automatic semantic accuracy metrics.

Publications

Do Large Language Models with Reasoning and Acting Meet the Needs of Task-Oriented Dialogue?

Michelle Elizabeth, Morgan Veyret, Miguel Couceiro, Ondrej Dusek, L. Rojas-Barahona

ABS HTML PDF

Faithful and Plausible Natural Language Explanations for Image Classification: A Pipeline Approach

Adam Wojciechowski, Mateusz Lango, Ondrej Dusek

Conference on Empirical Methods in Natural Language Processing 2024

ABS HTML PDF

Teaching LLMs at Charles University: Assignments and Activities

Jindrich Helcl, Zdeněk Kasner, Ondrej Dusek, Tomasz Limisiewicz, Dominik Macháček, Tomás Musil, Jindrich Libovický

TEACHINGNLP 2024

ABS HTML PDF

A Survey of Text Style Transfer: Applications and Ethical Implications

Sourabrata Mukherjee, Mateusz Lango, Zdeněk Kasner, Ondrej Dusek

arXiv.org 2024

ABS HTML PDF

Text Style Transfer: An Introductory Overview

Sourabrata Mukherjee, Ondrej Dusek

arXiv.org 2024

ABS HTML PDF

Are Large Language Models Actually Good at Text Style Transfer?

Sourabrata Mukherjee, Atul Kr. Ojha, Ondrej Dusek

International Conference on Natural Language Generation 2024

Ondrej Dusek

Publications

Do Large Language Models with Reasoning and Acting Meet the Needs of Task-Oriented Dialogue?

Faithful and Plausible Natural Language Explanations for Image Classification: A Pipeline Approach

Teaching LLMs at Charles University: Assignments and Activities

A Survey of Text Style Transfer: Applications and Ethical Implications

Text Style Transfer: An Introductory Overview

Are Large Language Models Actually Good at Text Style Transfer?

Multilingual Text Style Transfer: Datasets & Models for Indian Languages

Text Detoxification as Style Transfer in English and Hindi

Leak, Cheat, Repeat: Data Contamination and Evaluation Malpractices in Closed-Source LLMs

Beyond Traditional Benchmarks: Analyzing Behaviors of Open LLMs on Data-to-Text Generation

Balancing the Style-Content Trade-Off in Sentiment Transfer Using Polarity-Aware Denoising

LEEETs-Dial: Linguistic Entrainment in End-to-End Task-oriented Dialogue systems

Critic-Driven Decoding for Mitigating Hallucinations in Data-to-text Generation

With a Little Help from the Authors: Reproducing Human Evaluation of an MT Error Detector

Three Ways of Using Large Language Models to Evaluate Chat

Tackling Hallucinations in Neural Chart Summarization

Missing Information, Unresponsive Authors, Experimental Flaws: The Impossibility of Assessing the Reproducibility of Previous Human Evaluations in NLP

Are Large Language Models All You Need for Task-Oriented Dialogue?

TabGenie: A Toolkit for Table-to-Text Generation

Barriers and enabling factors for error analysis in NLG research

MooseNet: A Trainable Metric for Synthesized Speech with a PLDA Module

Mind the Labels: Describing Relations in Knowledge Graphs With Pretrained Models

Learning Interpretable Latent Dialogue Actions With Less Supervision

AARGH! End-to-end Retrieval-Generation for Task-Oriented Dialog

The Seventh Workshop on Search-Oriented Conversational Artificial Intelligence (SCAI'22)

AI Technologies for Machine Supervision and Help in a Rehabilitation Scenario

GEMv2: Multilingual NLG Benchmarking in a Single Line of Code

DialogueScript: Using Dialogue Agents to Produce a Script

Neural Pipeline for Zero-Shot Data-to-Text Generation

NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation

Report on the 6th workshop on search-oriented conversational AI (SCAI 2021)

MiRANews: Dataset and Benchmarks for Multi-Resource-Assisted News Summarization

Underreporting of errors in NLG output, and what to do about it

AggGen: Ordering and Aggregating while Generating

Shades of BLEU, Flavours of Success: The Case of MultiWOZ

THEaiTRE 1.0: Interactive Generation of Theatre Play Scripts

AuGPT: Auxiliary Tasks and Data Augmentation for End-To-End Dialogue with Pre-Trained Language Models

The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics

Evaluating Semantic Accuracy of Data-to-Text Generation with Natural Language Inference

Data-to-Text Generation with Iterative Text Editing

SpeedySpeech: Efficient Neural Speech Synthesis

One Model, Many Languages: Meta-learning for Multilingual Text-to-Speech

Expand and Filter: CUNI and LMU Systems for the WNGT 2020 Duolingo Shared Task

Fact-based Content Weighting for Evaluating Abstractive Summarisation

THEaiTRE: Artificial Intelligence to Write a Theatre Play

Semantic Noise Matters for Neural Natural Language Generation

Neural Generation for Czech: Data and Baselines

Automatic Quality Estimation for Natural Language Generation: Ranting (Jointly Rating and Ranking)

User Evaluation of a Multi-dimensional Statistical Dialogue System

Evaluating the State-of-the-Art of End-to-End Natural Language Generation: The E2E NLG Challenge

Neural Response Ranking for Social Conversation: A Data-Efficient Approach

Improving Context Modelling in Multimodal Dialogue Generation

A Knowledge-Grounded Multimodal Search-Based Conversational Agent

Findings of the E2E NLG Challenge

Better Conversations by Modeling, Filtering, and Optimizing for Coherence and Diversity

RankME: Reliable Human Ratings for Natural Language Generation

An Ensemble Model with Ranking for Social Dialogue

Referenceless Quality Estimation for Natural Language Generation

Why We Need New Evaluation Metrics for NLG

Data-driven Natural Language Generation: Paving the Road to Success

The E2E Dataset: New Challenges For End-to-End Generation

Novel Methods for Natural Language Generation in Spoken Dialogue Systems

Czech restaurant information dataset for NLG

Verb sense disambiguation in Machine Translation

Moses & Treex Hybrid MT Systems Bestiary

CzEng 1.6: Enlarged Czech-English Parallel Corpus with Processing Tools Dockered

A Context-aware Natural Language Generator for Dialogue Systems

Vystadial 2016 – Czech data

Sequence-to-Sequence Generation for Spoken Dialogue via Deep Syntax Trees and Strings

Alex Context NLG Dataset

New Language Pairs in TectoMT

Using Parallel Texts and Lexicons for Verbal Word Sense Disambiguation

Training a Natural Language Generator From Unaligned Data

Bilingual English-Czech Valency Lexicon Linked to a Parallel Corpus

Alex: A Statistical Dialogue Systems Framework

A Factored Discriminative Spoken Language Understanding for Spoken Dialogue Systems

HamleDT: Harmonized multi-language dependency treebank

Adaptation of machine translation for multilingual information retrieval in the medical domain