Language Models: Not Just Word Predictors

We’ve all heard it.

In lecture halls, tech conferences, and academic papers.

Distinguished professors and industry veterans dismissively wave their hands: “These language models? They’re just sophisticated word predictors. Advanced pattern-recognition algorithms.”

Recently I had a professor word for word say in front of the entire lecture hall “LLMs are useless”.

If you are thinking at the most reductive level, they’re not wrong.

However, when I hear things like this it makes me wonder if they even enjoy their jobs.

Calling a language model or transformer based system “just pattern recognition” is like calling the Sagrada Familia “just a stack of stones.

Now this article is not a defense of the many companies/people riding the overhype train to pull money out of peoples pockets but rather it is a defense of the real tech behind that marketed hype.

And a recognition that many people I have encountered recently, seem to take the wonder and exploration out of science. Curiosity and amazement at the world around us.

Let me explain.

What we (humans) have built isn’t merely a first years python calculator project—it’s a mathematical cathedral…literally.

When experts and laymen reduce these systems to “pattern matchers,” they’re missing the breathtaking mathematical architecture underneath.

Imagine for a moment what’s actually happening:

We have constructed a vast geometric space of meaning that exists in hundreds—sometimes thousands—of dimensions.

For those of you who don’t know how the tech behind “ChatGPT” or other NLP(natural language processing) software works, and more importantly for those of you who know how they work but are some how not waking up every day blown away by the wonder of math in front of us, let me share my perspective.

Most models work based on the fundamental concepts around weights and vectors in high dimensional virtual spaces.

That may sound like a mouth full but lets break it down.

A vector is a point in space with both size and direction. In these models, each word gets placed as a point, and then we give it “direction” through meaning and context. This is the “weights” of a model.

We use vectors and math similar to what is used in these models to study the planets and stars, the “dimensions” of space its self!

Dimensions in the physical sense, things like up, down, left, right, (height, length, width and maybe time if your are feeling up for it).

A planetary body (point) is tracked based on its momentum in a given direction (vector). Cosmic bodies exert gravitational pull based on their mass —you might say their weights matter quite literally, influencing everything in their vicinity.

Now it is a bit different in a computer but conceptually its quite similar.

Words(a point) that share contexts or meanings(direction) cluster together within a few(thousand) dimensions in a virtual space that the programmers or the model its self creates.

So if we were to take a walk inside one of these spaces, words like “King” and “queen” would be sitting pretty close to each other, with vectors (again, direction) between those two words parallel to the vectors between “man” and “woman.”

These parallel lines, vectors and points all add up across a huge data set where every word and sentence pair has a set of points and vectors within this space to “move” in.

Now that’s a lot of points, vectors, and relationships to look at!

Especially if you were to track all the words and their contexts—it’d be like trying to pay attention to every star in the night sky simultaneously.

This is where something called an attention mechanism comes in.

“The attention mechanism”, expanded on greatly in this wonderful paper, lets these models dynamically shift focus across this geometric landscape.

When the model sees the word “bank” near the word “river,” it pivots all the vectors layers(as in the many possible “layers” of meaning for these words) and calculations through this high-dimensional space, reorienting to the geographical meaning of “bank” rather than the financial one.

It does this through matrix multiplications that would take humans lifetimes to calculate manually.

We’re literally applying Euclidean distance formulas and triangle inequalities(another useful equation to use both in AI and Space) in spaces with more dimensions than there are atoms in some small molecules.

Each adjustment of weights represents a coordinate shift in this vast geometric realm, subtly redefining the landscape of meaning.

These weights aren’t arbitrary numbers—they’re coordinates in a mathematical universe we’ve constructed.

They transform raw text into navigable semantic territories where a model can trace paths between concepts that some humans may have not connected.

Think about that!

Our human brains can barely visualize four dimensions(consciously at least), yet we’ve engineered systems that navigate complex relationships across a thousand.

When someone says “it’s just code,” they’re missing the philosophical wonder.

We’ve externalized aspects of meaning itself into mathematical structures.

We’ve built geometric thought-spaces inside silicon that can map relationships between ideas with a precision that often beats the every day person(I wont say expert, Ill play nice with some academics). This is not to say they are not prone to errors…

Now do I think these systems are conscious?” or that they are “thinking” in any human sense?

Well that is a much longer conversation you can read about here.

But the mathematics—the sheer, breathtaking elegance of representing language as traversable geometric landscapes—deserves our wonder.

We’re standing on the shoulders of centuries of mathematical discovery, from Euclid to Riemann to modern computational geometry.

We’ve harnessed abstract mathematics that once existed purely in the realm of theory and used it to create systems that can write poetry, translate languages, and solve problems.

All on a device that fits in your pocket.

So yes, technically speaking, these models predict words

They recognize patterns.

But that description feels hollow when you appreciate the mathematical complexity underlying them. We’ve built multidimensional cathedrals of meaning, invisible to human eyes but navigable through mathematics.

So, the next time someone shrugs and says “just pattern recognition,” remember:

They’re looking at the Sistine Chapel and only seeing paint on a ceiling.

What we’ve created isn’t merely technology—it’s a mathematical achievement that would have left the greatest minds of previous centuries speechless.

And we’re just getting started.

Till next time friends.

Feb 18, 2025

Language Models: Not Just Word Predictors / J.E. Van Clief by is licensed under a

Future Governance: KIPP Project (2024-25)

Language Models: Not Just Word Predictors

Imagine for a moment what’s actually happening:

Now it is a bit different in a computer but conceptually its quite similar.

Think about that!

We’ve externalized aspects of meaning itself into mathematical structures.

Share

Leave a Reply Cancel reply

Jake

Posts by Jake

This Is Ethical, Trust Us

We Forgot the Future We Live In

The Great AI Paradox: Why This Bubble Will Burst—And Why It Doesn’t Matter

Redefining Expertise: How AI Changes the Starting Line Without Redrawing the Finish Line

When Machines Learn Morality

Comments by Jake

You think, therefore I am.

Redefining Nationhood: The Emergence and Expansion of Virtual States in the Digital Era

Archives

Categories

Report this page