Imagine you're writing a sentence and trying to predict the next word. You can do this by looking at the previous words you've written. That's exactly what N-gram models do!
An N-gram is simply a sequence of N words that appear together in a text.
"I"
, "love"
, "cats"
."I love"
, "love cats"
."I love cats"
.The larger N is, the more context you consider when predicting the next word.
"I love cats. I love dogs. I love pizza."
("I love", "love cats", "love dogs", "love pizza")
("I love cats", "I love dogs", "I love pizza")
"I love"
, what’s the most likely next word?"love cats"
→ appears 1 time"love dogs"
→ appears 1 time"love pizza"
→ appears 1 time"love cats"
= 1/3 (33.3%)"love dogs"
= 1/3 (33.3%)"love pizza"
= 1/3 (33.3%)"I love"
, the model randomly picks the next word based on these probabilities."I love"
, the model might predict "cats"
, "dogs"
, or "pizza"
.✅ Text Prediction → Suggesting the next word while typing.
✅ Speech Recognition → Converting speech to text more accurately.
✅ Spell Checking → Identifying likely word sequences.
✅ Machine Translation → Helping translate phrases accurately.
🚫 Lack of long-term context → A bigram only looks at 1 word behind; a trigram only looks at 2. They don’t "remember" the whole sentence.
🚫 Data sparsity → If a phrase is rare, the model struggles to predict it.