To Demonstrate the Effectiveness of Distillation with Weight Transfer

Purpose:

Following the current trend, I decided to prove my understanding of the distillation of an LLM model. So basically in distillation, you make a smaller student model to mimic a larger fully trained teacher model, saving time and cost. In this process, there is an optional but highly effective technique called weight transfer that better guarantees performance close to the teacher model.

In this project, I will demonstrate the effectiveness of the weight transfer technique. The demonstration will stop at the weight transfer process, as this is sufficient for the purpose of this demonstration. As such, it will not go into further training or fine-tuning.

Description:

It was quite a challenge to find a model that is small enough to carry out the distillation on a 10-year-old CPU-only laptop. After suggestions from ChatGPT, the "google/t5-large-ssm-nq" model was chosen. It has about 770M parameters and 24 layers.

Steps involved:

Fine-tuning the original t5-large-ssm-nq to respond to two questions: "Who are you?" and "What version are you?" The response to both will be "I am a T5 Large SSM NQ model."
t5_large_ssm_nq_finetuning.ipynb
The fine-tuned model will become the teacher model used in the distillation process.
t5_large_ssm_nq_distill.ipynb
The student model is tested by
t5_inference.ipynb

Contributors

ChatGPT - Whatever free version is available at the time: The coding machine!

Disclaimer

This project is provided "as is" and without any warranty. Use it at your own risk.

License

This project is open-source under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md
downloadmodel.ipynb		downloadmodel.ipynb
t5_inference.ipynb		t5_inference.ipynb
t5_large_ssm_nq_distill.ipynb		t5_large_ssm_nq_distill.ipynb
t5_large_ssm_nq_finetuning.ipynb		t5_large_ssm_nq_finetuning.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

To Demonstrate the Effectiveness of Distillation with Weight Transfer

Purpose:

Description:

Steps involved:

Contributors

Disclaimer

License

About

Uh oh!

Releases

Packages

Languages

babadue/The-Truth-Is-Out-There

Folders and files

Latest commit

History

Repository files navigation

To Demonstrate the Effectiveness of Distillation with Weight Transfer

Purpose:

Description:

Steps involved:

Contributors

Disclaimer

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages