Multi-Domain Expert Layers (MDEL) Training

Abstract

Open sourcing AI models can lead to increased innovation, accessibility, transparency, and community building. However we need a mechanism to train more capable models in an efficient and modular way.

The proposed method that we call Multi-Domain Expert Layers (MDEL) training for open source language models involves branching from a base model, training each branch independently on a specific domain for specific layers, and merging the trained models at the end. Additionally, the specific layers are kept as experts, with a classifier used as a router to activate the experts during inference. This approach makes it possible to easily increase expertise of a model, to independently train more "adapters", and to reuse previously trained experts and models without retraining, resulting in a modular and efficient system.


Read Google doc, going through various methods including DEMIX, Task Level MoE and BTM
Link
We care about your privacy so we do not store nor use any cookie unless it is stricly necessary to make the website to work
Got it
Learn more