TAG-MoE: Task-Aware Gating for Unified Generative Mixture-of-Experts

Xu, Yu; Yan, Hongbin; Cao, Juan; Cheng, Yiji; Hang, Tiankai; He, Runze; Yin, Zijin; Zhang, Shiyi; Zhang, Yuxin; Li, Jintao; Wang, Chunyu; Lu, Qinglin; Lee, Tong-Yee; Tang, Fan

We present TAG-MoE. By injecting high-level task semantic intent into the local routing decisions of the MoE gating network, we enable the diffusion transformer model to handle diverse generative tasks effectively, resolving task interference in unified models.

Abstract

Unified image generation and editing models suffer from severe task interference in dense diffusion transformers architectures, where a shared parameter space must compromise between conflicting objectives (e.g., local editing v.s. subject-driven generation). While the sparse Mixture-of-Experts (MoE) paradigm is a promising solution, its gating networks remain task-agnostic, operating based on local features, unaware of global task intent. This task-agnostic nature prevents meaningful specialization and fails to resolve the underlying task interference.

In this paper, we propose a novel framework to inject semantic intent into MoE routing. We introduce a Hierarchical Task Semantic Annotation scheme to create structured task descriptors (e.g., scope, type, preservation). We then design Predictive Alignment Regularization to align internal routing decisions with the task's high-level semantics. This regularization evolves the gating network from a task-agnostic executor to a dispatch center. Our model effectively mitigates task interference, outperforming dense baselines in fidelity and quality, and our analysis shows that experts naturally develop clear and semantically correlated specializations.

Method

Our unified framework employs a Multimodal Diffusion Transformer (MM-DiT) with MoE layers for efficient, dynamic task handling. We introduce hierarchical task semantic annotation and a novel semantic-aligned router. This router guides the MoE's specialization by aligning its routing decisions with these explicit task semantics in an interpretable manner.

We design a novel semantic-aligned gating network to force the model's internal routing strategy (encoded as a routing signature "g") to predict the task's macroscopic semantics (encoded as a semantic embedding "s"). This predictive alignment serves as a bridge, connecting local routing decisions with global task intent.

Results

Qualitative Comparison

No Reference Editing

Material Editing

Controllable Generation

In-Context Generation

Subject Reference Generation

Style Reference Generation

BibTeX

@misc{xu2026tagmoetaskawaregatingunified,
      title={TAG-MoE: Task-Aware Gating for Unified Generative Mixture-of-Experts}, 
      author={Yu Xu and Hongbin Yan and Juan Cao and Yiji Cheng and Tiankai Hang and Runze He and Zijin Yin and Shiyi Zhang and Yuxin Zhang and Jintao Li and Chunyu Wang and Qinglin Lu and Tong-Yee Lee and Fan Tang},
      year={2026},
      eprint={2601.08881},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2601.08881}, 
}