Mixture of Experts

No Comments

The mixtures of Experts need a lot of data.

Q: Why not just average?

A: MoE will choose a special model focused on special cases(data), which is different from boosting algorithm.

We need to design an error function to encourage specialization.

$$E=\sum p_i(t-y_i)^2$$

\(p_i\) here is the probability of the “manager” picking expert i for this case.

Reference

  1. Neural Networks for Machine Learning by University of Toronto: https://www.youtube.com/watch?v=d_GVvIBlWtI

Categories: 未分类

发表评论

邮箱地址不会被公开。 必填项已用*标注

*

code