Written by Brandon Sandhu on 04/04/2024
We begin by introducing some basic notions. A neural network is a method in artificial intelligence that allows a computer to process data in a way inspired by the human brain. Subsequently, deep learning uses multi-layered neural networks, called deep neural networks, to stimulate the decision making power of a human brain.
This model generates new text in response to a user's input text.
It is a machine learning algorithm that uses deep learning methods and a large database of pre defined text to pre-train itself for a specific task. For example, ChatGPT has been pre-trained to be a natural language processing model. One can conceptualise this as a collection of parameters that are fine-tuned and adjusted during the pre-training phase to enhance natural language processing. Generally, having more parameters results in a more accurate NLP model, but it also incurs higher operational costs.
The transformer is a specific type of neural network that recursively transforms an input sequence into an output sequence. They achieve this by learning context and tracking relationships between sequence components.
DALL·E: is a 12-billion parameter GPT model, trained to generate images from text descriptions using a training dataset consisting of pairs of images and corresponding text descriptions. We provided an input description of
A brussel sprout in the form of a cartoon monster with one eye and white teeth. It has thin arms and thin legs. It has a large rich green iris.
This yielded the following output:
Generated using the DALL·E 2 model
For anyone who has watched Monsters Inc, it should be clear I was hinting at a description of Mike Wazowski. The GPT model is not trained to know movies specificially, yet was able to product something relatively close to that of Mike.
GPT-3 is a large natural language model released by OpenAI in 2020. Its goal is to take some input text and produce a prediction for what word comes next in the passage. This prediction takes form of a probability distribution over a set of words that could follow in the given passage. It follows, one could iterate this process and construct a passage with arbitrary length. This approach is near enough how ChatGPT generates its responses.
$$ f(x) = \begin{cases} 1 & \text{if $x = 0$} \\ 0 & \text{otherwise} \end{cases} $$ When \( f(x) \in \mathbb{Q}[x] \) then