Mixture of Experts

No Comments

The mixtures of Experts need a lot of data.

Q: Why not just average?

A: MoE will choose a special model focused on special cases(data), which is different from boosting algorithm.

We need to design an error function to encourage specialization.

$$E=\sum p_i(t-y_i)^2$$

\(p_i\) here is the probability of the “manager” picking expert i for this case.

Reference

  1. Neural Networks for Machine Learning by University of Toronto: https://www.youtube.com/watch?v=d_GVvIBlWtI

Categories: 未分类

Sharing data between threads

No Comments

Modify data

invariant, definition can be found here.

threads modifying data may break invariant(see the example of changing the doubly linked list)

problematic race condition: typically occur where completing an operation requires modification of two or more distinct pieces of data

data race(will be introduced in the future)

Solutions:

  • only the thread performing a modification can see the intermediate states where the invariants are broken. (mutex)
  • Change the data structure, makes it an indivisible change(lock-free programming).
  • Handle the update as a transaction(like database)

Protecting shared data with mutexes

access the data structure as mutually exclusive — use mutex

The mutex has its own problems: deadlock, protecting too much or too little data.

Besides, pointers may ruin the data protection. Programmers should follow: Don’t pass pointers and references to protected data outside the scope of the lock, whether by returning them from a function, storing them in externally visible memory, or passing them as arguments to user-supplied functions.

Example: a stack shared by multiple threads, empty() and top()

TODO: some options to avoid race conditions

thread-safe stack: see listing 3.5, but need to watch out:

  1. delete some operator/functions
  2. add a mutable class variable mutex
  3. lock_guard in every operator function

std::lock—a function that can lock two or more mutexes at once without risk of deadlock

hierarchical lock

std::unique_lock, more flexible than lock_guard(eg. try_to_lock), automatically unlock(will judge by itself)

std::adopt_lock vs std::defer_lock

std::unique_lock contains a flag to indicate the ownership of the mutex, which increases the cost of this class.

std::call_once and std::once_flag to make sure initialization is done once

std::shared_timed_mutex(C++14), std::shared_mutex(C++17)

Categories: 未分类

Some concepts in Machine Learning

No Comments
ConceptExplanation
Logitshttps://stackoverflow.com/questions/41455101/what-is-the-meaning-of-the-word-logits-in-tensorflow
Cross-entropy
Softmax
MSEMean Squared Error Loss
text corpus
context variablethe encoder transforms an input sequence of variable length into a fixed-shape context variable c
c, and encodes the input sequence information in this context variable
Ablationhttps://en.wikipedia.org/wiki/Ablation_(artificial_intelligence)
pretexttask being solved is not
of genuine interest, but is solved only for the true purpose
of learning a good data representation
pretext taskthe self-supervised learning task solved to learn visual representations, with the aim of using the learned representations or model weights obtained in the process, for the downstream task.

Other terms and their explanations: https://developers.google.com/machine-learning/glossary/

Categories: 未分类