AI Glossary

Token

The smallest unit of text a model processes. It can be a word, part of a word, or even a symbol.

The

dragon

rests

in

agony

.

Text is split into small units the model can read. Each token is linked to a number, it’s like its own ID.

Tokenization

The process of breaking text into small units (tokens) a model can understand. Each token can be a word, subword, or character.

OpenAI Tokenizer, interactive tool to visualize text tokenization

Havethecouragetofollowyourheartandintuition

OpenAI Tokenizer, interactive tool to visualize text tokenization

When you send a text to a model, it’s the very first step before anything else happens.

Embedding

The model turns each token into numbers that represent its meaning. Tokens with similar meanings have vectors (points positioned in space) that are close together.

Wikipedia's 'Vector space', explanation of vector spaces

the

button

slides

when

hovered

and

the

layout

adapts

on

mobile

devices

Wikipedia's 'Vector space', explanation of vector spaces

Turns tokens into points in space, grouped by meaning. It happens right after tokenization.

Context window

The limit of how much text a model can consider at once. It reads and reasons only within this window, measured in tokens.

How are these computers all going to work together? They’re probably going to work together a lot like people do. Sometimes they’re going to work together really well, and other times they’re not going to work together so well.

What’s happened, there’s been a few installations where people have hooked these things together. The one installation that stands out is at Xerox Palo Alto Research Center, or PARC, for short. And they hooked about a hundred computers together on what’s called a local area network, which is just a cable that carries all this information back and forth. […]

Then an interesting thing happened. There were twenty people interested in volleyball. So a volleyball distribution list evolved, and then, when the volleyball game next week was changed, you’d write a quick memo and send it to the volleyball distribution list. Then there was a Chinese food cooking list. And before long, there were more lists than people.

And it was a very, very interesting phenomenon, because I think that that’s exactly what’s going to happen as we start to tie these things [computers] together: they’re going to facilitate communication and facilitate bringing people together in the special interests that they have.

And we’re about five years away from really solving the problems of hooking these computers together in the office. And we’re about ten to fifteen years away from solving the problems of hooking them together in the home. A lot of people are working on it, but it’s a pretty fierce problem.

Now, Apple’s strategy is really simple. What we want to do is put an incredibly great computer in a book that you carry around with you, that you can learn how to use in twenty minutes. That’s what we want to do. And we want to do it this decade. And we really want to do it with a radio link in it so you don’t have to hook up to anything—you’re in communication with all these larger databases and other computers. We don’t know how to do that now. It’s impossible technically.

We’re trying to get away from programming. We’ve got to get away from programming because people don’t want to program computers. People want to use computers.

We [at Apple] feel that, for some crazy reason, we’re in the right place at the right time to put something back. And what I mean by that is, most of us didn’t make the clothes we’re wearing, and we didn’t cook or grow the food that we eat, and we’re speaking a language that was developed by other people, and we use a mathematics that was developed by other people. We are constantly taking.

And the ability to put something back into the pool of human experience is extremely neat. I think that everyone knows that in the next ten years we have the chance to really do that. And we [will] look back—and while we’re doing it, it’s pretty fun, too—we will look back and say, “God, we were a part of that!”

We started with nothing. So whenever you start with nothing, you can always shoot for the moon. You have nothing to lose. And the thing that happens is—when you sort of get something, it’s very easy to go into cover-your-ass mode, and then you become conservative and vote for Ronnie. So what we’re trying to do is to realize the very amazing time that we’re in and not go into that mode.

I can’t tell you why you need a home computer right now. I mean, people ask me, “Why should I buy a computer in my home?”

And I say, “Well, to learn about it, to run some fun simulations. If you’ve got some kids, they should probably know about it in terms of literacy. They can probably get some good educational software, especially if they’re younger.

You can hook up to the source and, you know, do whatever you’re going to do. Meet women, I don’t know. But other than that, there’s no good reason to buy one for your house right now. But there will be. There will be.”

I don’t think finance is what drives people at Apple. I don’t think it’s money, but feeling like you own a piece of the company, and this is your damn company, and if you see something … We always tell people, “You work for Apple first and your boss second.” We feel pretty strongly about that.

When you have a million people using something, then that’s when creativity really starts to happen on a very rapid scale. […] We need some revolutions like [the] Lisa [computer], but we also then need to get millions of units out there and let the world innovate—because the world’s pretty good at innovating, we’ve found.

The model can process a limited number of tokens at once. It varies a lot depending on the model.

Latent space

An internal map where the model organizes what it has learned. Each point represents a concept, and similar ideas group close together.

Each dot is an embedding, placed near others with similar meaning. It’s how the model organizes concepts to relate them efficiently.

Neural network

A network of connected layers that learn from examples. Each layer refines the data, and together they learn patterns used to recognize images, understand language, or process sounds.

TensorFlow Playground, visual demo of how neural networks learn through layers

Each layer transforms the information a bit, finding patterns and meaning. So by the end, the network can turn a question into the right answer.

Parameter

Values the model learns during training that determine how strongly different parts of the network connect and respond. Together, they define how the model understands and generates information.

Understanding Model Parameters: 8B vs 70B Explained, explanation of model parameter counts

Each dot represents a learned value that shapes how the model understands data. More parameters typically mean a smarter, more flexible model (not always true).

Model

A system that has learned from data and can now use that knowledge to predict, generate, or understand new information.

models.dev, directory of AI models and tools

Model

Because trees stop producing chlorophyll as light decreases.

A bright orange sunrise above calm mountains.

'Bonjour' means 'Hello.'

The article explains how private companies are advancing space exploration.

Partly cloudy with mild temperatures around 18°C.

Because trees stop producing chlorophyll as light decreases.

A bright orange sunrise above calm mountains.

'Bonjour' means 'Hello.'

The article explains how private companies are advancing space exploration.

Partly cloudy with mild temperatures around 18°C.

models.dev, directory of AI models and tools

A neural network trained on tons of examples so it can predict or generate new things.

Transformer

A type of neural network that looks at every word in a sequence at once. Unlike earlier models that read step by step, it learns how words relate across the whole text, allowing it to understand context much more effectively.

Attention Is All You Need (Vaswani et al., 2017)

Why

does

the

sky

turn

orange

when

the

sun

sets

Attention Is All You Need (Vaswani et al., 2017)

Understands relationships between words across a whole sentence. Like a super-fast reader.

Attention

A mechanism inside Transformers that decides which words to focus on when processing a sentence. Each word looks at others and assigns more weight to the ones that matter most for understanding.

Why

does

the

sky

turn

orange

when

the

sun

sets

Helps the model decide which words to focus on for meaning. So it doesn’t treat all words equally but picks out what really matters.

Pre-training

The first learning stage where a model trains on vast text data to learn patterns, context, and general knowledge.

dataset

pattern

vector

embedding

gradient

context

sequence

probability

attention

feature

signal

batch

layer

activation

noise

objective

corpus

sample

loss

metric

iteration

pattern

vector

embedding

gradient

context

sequence

probability

attention

feature

signal

batch

layer

activation

noise

objective

corpus

sample

loss

metric

iteration

alignment

concept

meaning

vector

embedding

gradient

context

sequence

probability

attention

feature

signal

batch

layer

activation

noise

objective

corpus

sample

loss

metric

iteration

alignment

concept

meaning

structure

embedding

gradient

context

sequence

probability

attention

feature

signal

batch

layer

activation

noise

objective

corpus

sample

loss

metric

iteration

alignment

gradient

context

sequence

probability

attention

feature

signal

batch

layer

activation

noise

objective

corpus

sample

loss

metric

iteration

alignment

concept

meaning

structure

syntax

semantic

knowledge

context

sequence

probability

attention

feature

signal

batch

layer

activation

noise

objective

corpus

sample

loss

metric

iteration

alignment

concept

meaning

structure

syntax

semantic

knowledge

generalize

token

dataset

pattern

vector

embedding

gradient

context

sequence

probability

attention

feature

signal

batch

layer

activation

noise

objective

corpus

sample

loss

metric

iteration

pattern

vector

embedding

gradient

context

sequence

probability

attention

feature

signal

batch

layer

activation

noise

objective

corpus

sample

loss

metric

iteration

alignment

concept

meaning

vector

embedding

gradient

context

sequence

probability

attention

feature

signal

batch

layer

activation

noise

objective

corpus

sample

loss

metric

iteration

alignment

concept

meaning

structure

embedding

gradient

context

sequence

probability

attention

feature

signal

batch

layer

activation

noise

objective

corpus

sample

loss

metric

iteration

alignment

gradient

context

sequence

probability

attention

feature

signal

batch

layer

activation

noise

objective

corpus

sample

loss

metric

iteration

alignment

concept

meaning

structure

syntax

semantic

knowledge

context

sequence

probability

attention

feature

signal

batch

layer

activation

noise

objective

corpus

sample

loss

metric

iteration

alignment

concept

meaning

structure

syntax

semantic

knowledge

generalize

token

Helps the model adapt more quickly and effectively to specific tasks later without starting from scratch.

Fine-tuning

Training a pre-trained model on new, specific data so it adapts to a particular task or tone. It keeps what it already knows but learns to apply it in a focused way.

the

color

of

this

image

is

good

Teaches the model a new skill without forgetting what it already knows. Here, it’s adapting to design vocabulary.

Reinforcement learning

A training method where the model improves through feedback. It tries actions, receives rewards or penalties from humans or another model, and learns to make better decisions over time.

Training language models to follow instructions with human feedback (Ouyang et al., 2022)

Model

Response

Feedback

Training language models to follow instructions with human feedback (Ouyang et al., 2022)

Improves the model through trial, error, and feedback until it gets better results.

Chain of thought

Step-by-step reasoning the model writes to reach an answer. It helps the model break complex problems into smaller, more manageable steps.

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models (Wei et al., 2022)

How wide should a poster be if it follows the golden ratio and its height is 60 cm?

The golden ratio ≈ 1.618.

Width = height × 1.618.

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models (Wei et al., 2022)

Shows how the model thinks through a problem before answering.

Inference

The stage where a trained model uses what it has learned to generate a response. It predicts the next token step by step until the answer is complete.

Why do leaves change color in autumn?

Basically what’s happening behind the scenes when you use an AI product.

RAG (Retrieval-Augmented Generation)

A method that lets a model look up information before answering. It retrieves relevant data from external sources, then uses that context to write a more complete answer.

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks (Lewis et al., 2020)

Query

Documents

Model

Answer

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks (Lewis et al., 2020)

When you use an AI that can search the web, it lets the model pull fresh info before it writes an answer.

Agent

Agents are autonomous systems that use tools and feedback loops to accomplish tasks.

Building effective agents, Engineering at Anthropic

Agent

Goal