A Paper about pruning has emerge from the course. #28
peremartra
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
As you know, I'm working on the pruning section of the course. One of the notebooks focused on implementing width pruning on MLP layers in models with a GLU structure, such as Llama-3.2.
This was a novel approach, and I published a preprint explaining the advantages of reducing the expansion that occurs in MLP layers. I found that an expansion rate of 140% is an optimal balance between performance and power consumption.
I hope you enjoy the paper: https://osf.io/preprints/osf/qgxea_v1
This is just the first version, and much work still needs to be done. I'm running more experiments with additional benchmarks and models. I'll be sharing the results, and if they add value to the course, I'll include the notebooks as well!
So, maybe the course can help us in become AI Researches in the LLM Field ;-).
Beta Was this translation helpful? Give feedback.
All reactions