SlimTrainer and Adalite

Description

SlimTrainer and Adalite allow for full parameter 16-bit finetuning of language models up to 7B on a single 24GB GPU. The optimizer uses the backpropagation fusing technique from LOMO, but uses a custom optimizer instead of using simple SGD. The small batch size and extreme memory requirements extensive exploration of potential optimizer variants, resulting in a custom optimizer, Adalite, based on Adafactor and LAMB.


GitHub repository below
Link
We care about your privacy so we do not store nor use any cookie unless it is stricly necessary to make the website to work
Got it
Learn more