Markov Chains

May 9, 2024

I dug this up from an older version of my website, where I experimented with Markov chains—an old ancestor to current machine learning.

It's fascinating to see how these simple stochastic models have laid the groundwork for modern AI. Markov chains, named after the mathematician Andrey Markov, are mathematical systems that hop from one "state" (a situation or set of values) to another. They are based on the assumption that future states depend only on the current state, not on the sequence of events that preceded it.

Markov chains predict the next state of a system based solely on its current state, not the sequence of events that preceded it. They're particularly fascinating when you dive into the mathematics, which involves predicting state changes using matrix multiplication.

For example, consider a simple scenario using the words "this" and "that". Each letter represents a state, and the transitions between letters can be modeled with probabilities. Given "t" starts a word, the next letter will be "h" with certainty, followed by a split chance between "i" and "a". By counting these outcomes, we establish a predictive model: starting with "th", the following letter could be 'a' or 'i', each with equal probability, based on our dataset.

On my old website there was an input where you could enter a seed word and it would spit out a sentence.

This dataset helps our model output variations like "this", "that", "thit", and "thas". This simple yet powerful method of matrix multiplication underlies much of the predictive capability in more complex systems today.

Fast forward to today, and the landscape of AI has transformed with the introduction of models like GPT-4, developed by OpenAI. Unlike Markov chains, which use a simple state transition system, transformers use a sophisticated mechanism called 'attention' to weigh the relevance of different parts of the input data. This allows them to generate human-like text that understands context at a much deeper level.

What are Transformers?

Transformers changed machine learning by focusing on this 'attention' mechanism rather than simply predicting the next element. They assess the entire input sequence and determine which parts are most relevant for generating accurate and contextually appropriate outputs. This approach is vastly more effective for tasks that require understanding of complex patterns, like natural language processing or image recognition.

Python Example of a Transformer:

Here’s a an example of how a modern transformer model operates:

from transformers import pipeline

# Initialize a simple text generation model
generator = pipeline('text-generation', model='gpt-2')

# Generate text
generated_text = generator("Today, the weather is", max_length=30)
print(generated_text[0]['generated_text'])

While they are simplistic, their underlying principle—that the future is dependent only on the present—paved the way for the evolution of predictive models in AI. This idea, expanded and refined, eventually contributed to the development of neural networks, which mimic the way human brains operate.

Today's AI is dominated by models known as transformers, which, unlike their predecessors, focus on the mechanism of 'attention'—deciding which parts of input data are most relevant. This concept is crucial for tasks that involve understanding or generating human-like text. GPT-4, a model developed by OpenAI, is a prime example of this technology. It utilizes a very complex architecture based on transformers to generate text that can be astonishingly human-like.

Chad Linden - Blog

Markov Chains

More Stories

How AI is Shaping Software Development

A SIMPLE PI CAR WITH WEB INTERFACE