Sep 24, 2023
RAIN: Aligning LLMs Without Finetuning
A short note on RAIN, a rewindable inference technique for making pretrained LLM outputs more helpful and harmless without weight updates.
RAIN is a technique to align a pretrained LLM to generate helpful and harmless content without finetuning, RLHF, RLAIF, or similar methods.
What problem does it solve?
Pre-trained LLMs out of the box can generate any kind of text that can be biased, harmful, false, etc. So far, these issues have usually been mitigated by techniques like instruction tuning and RLHF, which involve updating the model's weights to align with human principles.
RAIN is a technique that doesn't require any model weight updates or training.
How does it work?
RAIN stands for Rewindable Autoregressive Inference. The idea is for the model to generate multiple possible token sets, kind of like n-grams, in a tree-like manner. Each candidate token set is then evaluated by the LLM in terms of its helpfulness and harmlessness via a secondary prompt.
In case a token set is below a threshold score, we rewind back to a parent token and start exploring other feasible child token sets.
Implications
While this approach is pretty innovative, it does mean that inference speed is slower (4x for Llama on their benchmarked dataset), making it impractical. However, the important takeaway from this paper is that LLMs are capable of aligning themselves without any supervision.
Link to the paper: RAIN