Skip to content

Generative Model using Markov's Chain Algorithm to Analyse a Corpus of Text to Learn Statistical Patterns of Word Sequences & Use Those Patterns to Generate New, Original Text

Notifications You must be signed in to change notification settings

WillKirkmanM/predictor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Predictor

Generative Model using Markov's Chain Algorithm to Analyse a Corpus of Text to Learn Statistical Patterns of Word Sequences & Use Those Patterns to Generate New, Original Text

How It Works

The program is built around a Markov Chain model. The model works in two phases:

  1. Training (Build Phase):

    • The program reads a source text (the "corpus").
    • It scans the text and breaks it down into sequences of words called prefixes. The length of these prefixes is determined by the prefixLen constant (e.g., a length of 2 means it looks at pairs of words).
    • For each prefix, it records the word that immediately follows it (the suffix).
    • It builds a map where each key is a prefix and the value is a list of all possible suffixes that have appeared after that prefix in the corpus. For example: {"A computer": ["is", "system"]}.
  2. Generation (Generate Phase):

    • The program starts with a random prefix from the ones it learned.
    • It randomly selects one of the possible suffixes for that prefix to be the next word.
    • The prefix is then updated by "sliding" it one word forward (dropping the first word and adding the newly chosen word).
    • This process repeats until the desired number of words has been generated, creating a new block of text.

When running, you will see output in your terminal, first confirming that the model has been trained, and then showing the newly generated text.

Customisation

You can easily customise the behavior of the text predictor by changing the constants and variables in main.go:

  • Change the Corpus: Modify the corpus constant in the main() function. You can paste any text you like. For larger texts, consider reading from an external file.
  • Adjust Prefix Length: Change the prefixLen constant at the top of the file. A larger number (e.g., 3) will produce text that is more coherent but less varied, as it relies on longer learned phrases. A smaller number (e.g., 1) will be more random.
  • Change Output Length: In the main() function, change the number passed to model.Generate(). For example, model.Generate(100) will generate 100 words.

About

Generative Model using Markov's Chain Algorithm to Analyse a Corpus of Text to Learn Statistical Patterns of Word Sequences & Use Those Patterns to Generate New, Original Text

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages