Definition:
Temperature scales the model's raw output logits (scores before applying softmax) to adjust the randomness of token selection.
How It Works:
The softmax function converts logits into probabilities using this formula:
$$ p_i = \frac{e^{\frac{\text{logit}i}{T}}}{\sum{j} e^{\frac{\text{logit}_j}{T}}} $$
Low Temperature (e.g., 0.1 - 0.5):
When $T$ is close to zero, the division makes the largest logits dominate even more. This produces a sharp probability distribution where the token with the highest score is almost always chosen. The generation becomes highly deterministic (almost greedy).
High Temperature (e.g., 1.0 and above):
As $T$ increases, the differences between logits become less pronounced. The distribution flattens, making it more likely to choose less-probable tokens. This increases diversity and creativity but may also lead to less coherent output.
Typical Range:
Definition:
Top-p sampling, or nucleus sampling, restricts the pool of candidate tokens to the smallest set whose cumulative probability meets or exceeds a threshold $p$.
How It Works:
Effect on Generation: