Brandon Sandhu

How generative pre-trained transformer (GPT) models work

Written by Brandon Sandhu on 04/04/2024

A (very brief) overview

We begin by introducing some basic notions. A neural network is a method in artificial intelligence that allows a computer to process data in a way inspired by the human brain. Subsequently, deep learning uses multi-layered neural networks, called deep neural networks, to stimulate the decision making power of a human brain.

Generative Pre-trained Transformers (GPT)

This model generates new text in response to a user's input text.

It is a machine learning algorithm that uses deep learning methods and a large database of pre defined text to pre-train itself for a specific task. For example, ChatGPT has been pre-trained to be a natural language processing model. One can conceptualise this as a collection of parameters that are fine-tuned and adjusted during the pre-training phase to enhance natural language processing. Generally, having more parameters results in a more accurate NLP model, but it also incurs higher operational costs.

The transformer is a specific type of neural network that recursively transforms an input sequence into an output sequence. They achieve this by learning context and tracking relationships between sequence components.

A well-known application of transformers

DALL·E: is a 12-billion parameter GPT model, trained to generate images from text descriptions using a training dataset consisting of pairs of images and corresponding text descriptions. We provided an input description of

A brussel sprout in the form of a cartoon monster with one eye and white teeth. It has thin arms and thin legs. It has a large rich green iris.

This yielded the following output:

Generated using the DALL·E 2 model

For anyone who has watched Monsters Inc, it should be clear I was hinting at a description of Mike Wazowski. The GPT model is not trained to know movies specificially, yet was able to product something relatively close to that of Mike.

How does the transformer used by GPT-3 work

GPT-3 is a large natural language model released by OpenAI in 2020. Its goal is to take some input text and produce a prediction for what word comes next in the passage. This prediction takes form of a probability distribution over a set of words that could follow in the given passage. It follows, one could iterate this process and construct a passage with arbitrary length. This approach is near enough how ChatGPT generates its responses.

Embedding

$$ f(x) = \begin{cases} 1 & \text{if $x = 0$} \\ 0 & \text{otherwise} \end{cases} $$ When $ f(x) \in \mathbb{Q}[x] $ then

How generative pre-trained transformer (GPT) models work

A (very brief) overview

Generative Pre-trained Transformers (GPT)

A well-known application of transformers

How does the transformer used by GPT-3 work

Embedding

Attention

Multi-Layer Perceptrons (MLPs)

Unembedding