Skip to content

2020, WMT, Document Level NMT of Low-Resource Languages with Back Translation #94

@Sepideh-Ahmadian

Description

@Sepideh-Ahmadian

Paper
Document Level NMT of Low-Resource Languages with Back Translation

Introduction
This paper discusses a submission to the WMT 2020 shared task on similar language translation, focusing on low-resource language pairs such as Marathi-Hindi. The authors explore the use of document-level neural machine translation (NMT), which incorporates contextual information across sentences to improve translation quality.

Main Problem
The main problem is the scarcity of parallel data for low-resource language pairs, such as Marathi-Hindi, which limits the effectiveness of neural machine translation (NMT).

Illustrative Example
Not mentioned

Input
A sentence in Marathi

Output
A sentence in Hindi

Motivation
The authors were motivated by the lack of sufficient parallel data for low-resource languages like Marathi-Hindi, which makes it difficult to train accurate NMT systems.

Related works and their gaps
There are similar works for low-resource language pair translation Pourdamghani and Knight (2017); Lakew et al. (2018); Costa-jussa` (2017). The paper addresses the gap in sentence-level NMT models, which fail to capture cross-sentence context. Previous work often overlooked document-level context in low-resource settings.

Contribution of this paper
The main contributions include:
Proposing a document-level NMT system for low-resource languages, which incorporates context-aware hierarchical attention networks (HANs). And using backtranslation to augment monolingual data for training NMT models. They claim that their result outperforms the other results in the field.

Proposed methods
Not included

Experiments
Dataset WMT20 similar language translation task for the Marathi-Hindi language pair.

Implementation
They have used the following tokenizer but they did not provide their resource for their source.
https://github.com/anoopkunchukuttan/ indic_nlp_library

Gaps this work
The availability of document-level data may be limited in other low-resource languages. Given that back translation is involved, it's unclear how effectively it is being handled, as working with low-resource data could impact the quality of the back translation.

Metadata

Metadata

Labels

literature-reviewSummary of the paper related to the work

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions