First-derivative saliency (Explainability) method for a text perturbation attack

Abstract:

Neural networks have been used in various NLP tasks to achieve impressive results when compared to most of the machine learning models. However, when it comes to explaining the model or rationalising the output, it has always fallen short when compared to machine learning models. To address this, Li et al. proposed many ways of visualizing and understanding neural models in NLP. The first derivative input saliency method measures how much each word has contributed to the final decision. This helps in identifying important words which influences the output. Furthermore, as an attack method, we replace those important words using GloVe embeddings to flip the classes showing how efficiently the first derivative input saliency method identifies the important words as well as how vulnerable neural networks could be. This article was written as part of the Explainability Methods for Neural Networks seminar at Saarland University that took place in WS2020/2021. The article primarily focuses on the paper Visualizing and Understanding Neural Models in NLP by Li et al.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
README.md		README.md
Saliency method for a text attack.pdf		Saliency method for a text attack.pdf
Saliency method for text perturbation.ipynb		Saliency method for text perturbation.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

First-derivative saliency (Explainability) method for a text perturbation attack

About

Uh oh!

Releases

Packages

Languages

janvis04/First-derivative-saliency-method-for-a-text-perturbation-attack

Folders and files

Latest commit

History

Repository files navigation

First-derivative saliency (Explainability) method for a text perturbation attack

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages