Temperature

Definition:

Temperature scales the model's raw output logits (scores before applying softmax) to adjust the randomness of token selection.

How It Works:

The softmax function converts logits into probabilities using this formula:

$$ p_i = \frac{e^{\frac{\text{logit}i}{T}}}{\sum{j} e^{\frac{\text{logit}_j}{T}}} $$

Typical Range:


Top-p (Nucleus Sampling)

Definition:

Top-p sampling, or nucleus sampling, restricts the pool of candidate tokens to the smallest set whose cumulative probability meets or exceeds a threshold $p$.

How It Works:

  1. Sort Tokens: After applying temperature scaling, tokens are sorted by their probability.
  2. Cumulative Sum: Starting from the most probable token, sum their probabilities until the total exceeds $p$.
  3. Sampling Pool: Only tokens within this set are considered for sampling.

Effect on Generation: