Home

Awesome

cudnnMultiHeadAttention

This is a draft implementation of the formula softmax(QK^T/sqrt(d_k))V.

The reference paper is "Attention is All You Need" (https://arxiv.org/abs/1706.03762).