Home

Awesome

gpt-3-for-pascal

The paper Language Models are Few-Shot Learners shows a neural network architecture based on a stack of transformer decoder modules. The table 2.1 from this paper shows a number of GPT-3 based architectures:

<p><img src="img/GPT-3.jpg"></img></p>

As per table above, the GPT-3 Small is composed by:

With CAI Neural API, a neural network model similar to GPT-3 Small can be implemented with:

  var
    CntLayer: integer;
  begin
    Result := THistoricalNets.Create();
    Result.AddLayer([
      TNNetInput.Create(pContextSize, 1, 1),
      TNNetTokenAndPositionalEmbedding.Create(pVocabSize, pEmbedDim),
      TNNetPointwiseConvLinear.Create({hidden dimensions=}768),
      TNNetSignedSquareRoot1.Create()
    ]);
    for CntLayer := 1 to {Layers=}12 do
    begin
      Result.AddTransformerBlockCAI( {Heads=}12, {intermediate dimensions=}4*768, {NoForward=}true, {HasNorm=}true, false);
    end;
    Result.AddLayer([
      TNNetPointwiseConvLinear.Create(pVocabSize, 1),
      TNNetPointwiseSoftMax.Create(1)
    ]);
  end;

At this point in time, there is no efficient GPU implementation for pascal. Although the model name is GPT-3 Small, training this model requires large amounts of RAM and CPU power. In the case that you would like to see the magic (training) happen in front of your eyes with the Tiny Stories Dataset, this model can be run very slowly on google colab High RAM CPU based environment: GPT-3 Small for Pascal Open In Colab.

Although it is called GPT-3 Small, training it may be too resource demanding for a quick experiment. Simpler NLP examples are more practical for average computers.