Skip to content

babadue/The-Truth-Is-Out-There

Repository files navigation

To Demonstrate the Effectiveness of Distillation with Weight Transfer

Purpose:

Following the current trend, I decided to prove my understanding of the distillation of an LLM model. So basically in distillation, you make a smaller student model to mimic a larger fully trained teacher model, saving time and cost. In this process, there is an optional but highly effective technique called weight transfer that better guarantees performance close to the teacher model.

In this project, I will demonstrate the effectiveness of the weight transfer technique. The demonstration will stop at the weight transfer process, as this is sufficient for the purpose of this demonstration. As such, it will not go into further training or fine-tuning.

Description:

It was quite a challenge to find a model that is small enough to carry out the distillation on a 10-year-old CPU-only laptop. After suggestions from ChatGPT, the "google/t5-large-ssm-nq" model was chosen. It has about 770M parameters and 24 layers.

Steps involved:

  1. Fine-tuning the original t5-large-ssm-nq to respond to two questions: "Who are you?" and "What version are you?" The response to both will be "I am a T5 Large SSM NQ model."
    t5_large_ssm_nq_finetuning.ipynb

  2. The fine-tuned model will become the teacher model used in the distillation process.
    t5_large_ssm_nq_distill.ipynb

  3. The student model is tested by
    t5_inference.ipynb

Contributors

ChatGPT - Whatever free version is available at the time: The coding machine!

Disclaimer

This project is provided "as is" and without any warranty. Use it at your own risk.

License

This project is open-source under the MIT License.

About

LLM distillation with weight transfer.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published