Towards Distilled Language Model Interpretability

Code for distilation for LLMs and building post-hoc explanations.

We aimed to compare several post-hoc explanations methods as they are applied to BERT and distilled versions of BERT across a variety of tasks (GLUE benchmark), to see if post-hoc explanations are preserved during distillation. To be able to use BERT and its distilled variants on the GLUE tasks, we first finetuned BERT and its distilled variants on different GLUE metrics. We were then able to run a number of post-hoc explanations and then compare the token-level or word-level attributions between a distilled BERT model and BERT. We also perform sentence perturbations to evaluate robustness, comparing evaluation metrics to the original base model for each GLUE task. After that, we design our own distillation method, attention weight alignment, that optimizes for preserving post-hoc explanations as a method of aligning a student model with the teacher mode.

build_explanation.py: builds LIME, SHAP, and integrated gradients explanations for BERT-based models
model_robustness.py: insert perturbations into GLUE datasets to evaluate model robustness
distillation/: distilation module forked from Huggingface, where we align the attention weights of the student to match that of the teacher

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
distillation		distillation
explanation_results		explanation_results
model_robustness_results		model_robustness_results
model_robustness_results_final		model_robustness_results_final
shap_utils		shap_utils
utils		utils
README.md		README.md
bert_experiments.py		bert_experiments.py
build_explanation.py		build_explanation.py
calculate_distance.py		calculate_distance.py
compare_explanations.py		compare_explanations.py
compare_model_robustness.py		compare_model_robustness.py
compare_robustness.py		compare_robustness.py
explanation_robustness.py		explanation_robustness.py
integrated_grad.py		integrated_grad.py
load_glue.py		load_glue.py
main.py		main.py
model_robustness.py		model_robustness.py
run_lime.py		run_lime.py
run_shap.py		run_shap.py
train_and_eval.py		train_and_eval.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Towards Distilled Language Model Interpretability

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

lithafnium/LLM-Distillation-Explanations

Folders and files

Latest commit

History

Repository files navigation

Towards Distilled Language Model Interpretability

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages