PITTI - Article - Multi-Domain Expert Layers (MDEL) Training

Multi-Domain Expert Layers (MDEL) Training

Date : 2023-03-31

Abstract

Open sourcing AI models can lead to increased innovation, accessibility, transparency, and community building. However we need a mechanism to train more capable models in an efficient and modular way.

The proposed method that we call Multi-Domain Expert Layers (MDEL) training for open source language models involves branching from a base model, training each branch independently on a specific domain for specific layers, and merging the trained models at the end. Additionally, the specific layers are kept as experts, with a classifier used as a router to activate the experts during inference. This approach makes it possible to easily increase expertise of a model, to independently train more "adapters", and to reuse previously trained experts and models without retraining, resulting in a modular and efficient system.

Read Google doc, going through various methods including DEMIX, Task Level MoE and BTM

Link

Evaluation of Sports Performance: Cognitive Biases, Vectors an...