Input → [Forget Gate] → [Memory Cell] → [Output Gate] → Output
                 ⬆️            ⬇️
             [Input Gate] → (Updates memory)

Imagine you’re reading a long book. If you were like a regular Recurrent Neural Network (RNN), you would only remember the last few sentences and forget important details from the first chapter. 😕

But if you were an LSTM, you’d have a notebook 📒 to write down important things and remember them for later! That’s what makes LSTMs special – they choose what to remember and what to forget over long periods.

1. Why Do We Need LSTMs?

Problem with RNNs: They Forget!

🔴 RNNs suffer from the vanishing gradient problem – meaning they lose important past information when processing long sequences (like books, conversations, stock trends, etc.).

🟢 LSTMs fix this by using a memory cell 📒 that remembers long-term dependencies and learns what’s important to keep or forget.

2. How Do LSTMs Work?

LSTMs have three special gates 🚪 that control memory:

1️⃣ Forget Gate 🗑️ (What to Forget?)

👉 Decides what old information should be forgotten.

Example: If reading a book, you may forget unnecessary character details.

2️⃣ Input Gate ✍️ (What to Remember?)

👉 Decides what new information should be stored in memory.

Example: If you learn a new plot twist, you write it down in your notebook.

3️⃣ Output Gate 🗣️ (What to Use?)

👉 Decides what information to output right now.

Example: When summarizing a book, you recall key events and ignore irrelevant details.

3. The LSTM Formula (Simplified)

At every step tt, the LSTM updates its memory using these formulas: