Input → [Forget Gate] → [Memory Cell] → [Output Gate] → Output
⬆️ ⬇️
[Input Gate] → (Updates memory)
Imagine you’re reading a long book. If you were like a regular Recurrent Neural Network (RNN), you would only remember the last few sentences and forget important details from the first chapter. 😕
But if you were an LSTM, you’d have a notebook 📒 to write down important things and remember them for later! That’s what makes LSTMs special – they choose what to remember and what to forget over long periods.
🔴 RNNs suffer from the vanishing gradient problem – meaning they lose important past information when processing long sequences (like books, conversations, stock trends, etc.).
🟢 LSTMs fix this by using a memory cell 📒 that remembers long-term dependencies and learns what’s important to keep or forget.
LSTMs have three special gates 🚪 that control memory:
👉 Decides what old information should be forgotten.
Example: If reading a book, you may forget unnecessary character details.
👉 Decides what new information should be stored in memory.
Example: If you learn a new plot twist, you write it down in your notebook.
👉 Decides what information to output right now.
Example: When summarizing a book, you recall key events and ignore irrelevant details.
At every step tt, the LSTM updates its memory using these formulas: