N-gram language models

Imagine you're writing a sentence and trying to predict the next word. You can do this by looking at the previous words you've written. That's exactly what N-gram models do!

1. What is an N-gram?

An N-gram is simply a sequence of N words that appear together in a text.

Unigram (1-gram) → single words. Example: "I", "love", "cats".
Bigram (2-gram) → two words. Example: "I love", "love cats".
Trigram (3-gram) → three words. Example: "I love cats".
4-gram, 5-gram, etc. → longer sequences.

The larger N is, the more context you consider when predicting the next word.

2. How Do N-gram Models Work?

Count how often word sequences appear
- Example dataset: "I love cats. I love dogs. I love pizza."
- Bigrams found: ("I love", "love cats", "love dogs", "love pizza")
- Trigrams found: ("I love cats", "I love dogs", "I love pizza")
Calculate probabilities
- If you see "I love", what’s the most likely next word?
- Count occurrences:
  - "love cats" → appears 1 time
  - "love dogs" → appears 1 time
  - "love pizza" → appears 1 time
- Probability:
  - "love cats" = 1/3 (33.3%)
  - "love dogs" = 1/3 (33.3%)
  - "love pizza" = 1/3 (33.3%)
- If you start with "I love", the model randomly picks the next word based on these probabilities.
Predict the next word based on previous words
- If the sentence is "I love", the model might predict "cats", "dogs", or "pizza".

3. Where are N-gram Models Used?

✅ Text Prediction → Suggesting the next word while typing.

✅ Speech Recognition → Converting speech to text more accurately.

✅ Spell Checking → Identifying likely word sequences.

✅ Machine Translation → Helping translate phrases accurately.

4. Limitations of N-gram Models

🚫 Lack of long-term context → A bigram only looks at 1 word behind; a trigram only looks at 2. They don’t "remember" the whole sentence.

🚫 Data sparsity → If a phrase is rare, the model struggles to predict it.