Mixture of Experts

The mixtures of Experts need a lot of data.

Q: Why not just average?

A: MoE will choose a special model focused on special cases(data), which is different from boosting algorithm.

We need to design an error function to encourage specialization.

$$E=\sum p_i(t-y_i)^2$$

$$p_i$$ here is the probability of the “manager” picking expert i for this case.

Reference

1. Neural Networks for Machine Learning by University of Toronto: https://www.youtube.com/watch?v=d_GVvIBlWtI

Categories: 未分类