Mind by Santiago Ortiz

Look into the machine's mind

Understanding the inner works of a Large Language Model (LLM) such as Chatgpt could be extremely challenging. But one can learn a lot about it just by running experiments. An LLM is, in essence, a function that associates to any text a probabilities distribution for the next word (actually something slightly smaller: a token). And by repeating the same prompt thousands of times it’s possible to obtain a statistical picture of the probabilities associated to each possible next word. Visualizing this data is the equivalent of performing neuroimaging, like an MRI, to a machine.

Using the chatgpt api, I ran the same completion prompt "Intelligence is " thousands of times (setting the temperature quite high, at 1.6, for more diverse responses). Given a text, a Large Language Model assigns a probability for the word (token) to come, and it just repeats this process until a completion is…well, complete.

• Semantic Space Visualization (on the left)
Each text (a prompt completion or a sub-sequence) has an embedding: a position in a 1536-dimensions space. For each response there's a trajectory through this space that corresponds to each sub-sequence of words, example: "Intelligence is " → "Intelligence is the" → "Intelligence is the ability" → "Intelligence is the ability to" → … → full completion.

Because I cannot visualize a 1536-dimensions space (yet), I use a popular technique called Principal Components Analysis, which compresses a highly dimensional space into few dimensions, while preserving as much information as it can. I visualize all the completion trajectories in this space.

What you see in the cube is a tree of trajectories that bifurcate. All start with "Intelligence is " and progress towards longer and less probable sub-sequences of responses. It's a different representation of the same tree being visualized on the right (both visualizations communicate).

• The Tree Visualization (on the right)
It visualizes all collected completions. It also represents the calculated probability of a word following a text (because the sample is small, this is only a good approximation for the initial levels of the tree), so "Intelligence is the " will be followed by "ability" ~75% of the times, at 1.6 temperature. If temperature was lower this probability would rise, until achieving certainty at temperature=0.

#