Abstract
Open sourcing AI models can lead to increased innovation, accessibility, transparency, and community building. However we need a mechanism to train more capable models in an efficient and modular way.
The proposed method that we call Multi-Domain Expert Layers (MDEL) training for open source language models involves branching from a base model, training each branch independently on a specific domain for specific layers, and merging the trained models at the end. Additionally, the specific layers are kept as experts, with a classifier used as a router to activate the experts during inference. This approach makes it possible to easily increase expertise of a model, to independently train more "adapters", and to reuse previously trained experts and models without retraining, resulting in a modular and efficient system.